2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-24 02:47:14 +00:00

221 Commits

Author SHA1 Message Date
Ben Pfaff
c3827f619a datapath: Make adding and attaching a vport a single step.
For some time now, Open vSwitch datapaths have internally made a
distinction between adding a vport and attaching it to a datapath.  Adding
a vport just means to create it, as an entity detached from any datapath.
Attaching it gives it a port number and a datapath.  Similarly, a vport
could be detached and deleted separately.

After some study, I think I understand why this distinction exists.  It is
because ovs-vswitchd tries to open all the datapath ports before it tries
to create them.  However, changing it to create them before it tries to
open them is not difficult, so this commit does this.

The bulk of this commit, however, changes the datapath interface to one
that always creates a vport and attaches it to a datapath in a single step,
and similarly detaches a vport and deletes it in a single step.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-03 14:41:38 -08:00
Ben Pfaff
d76f09ea77 coverage: Make the coverage counters catalog program-specific.
Until now, the collection of coverage counters supported by a given OVS
program was not specific to that program.  That means that, for example,
even though ovs-dpctl does not have anything to do with mac_learning, it
still has a coverage counter for it.  This is confusing, at best.

This commit fixes the problem on some systems, in particular on ones that
use GCC and the GNU linker.  It uses the feature of the GNU linker
described in its manual as:

    If an orphaned section's name is representable as a C identifier then
    the linker will automatically see PROVIDE two symbols: __start_SECNAME
    and __end_SECNAME, where SECNAME is the name of the section.  These
    indicate the start address and end address of the orphaned section
    respectively.

Systems that don't support these features retain the earlier behavior.

This commit also fixes the annoyance that files that include coverage
counters must be listed on COVERAGE_FILES in lib/automake.mk.

This commit also fixes the annoyance that modifying any source file that
includes a coverage counter caused all programs that link against
libopenvswitch.a to relink, even programs that the source file was not
linked into.  For example, modifying ofproto/ofproto.c (which includes
coverage counters) caused tests/test-aes128 to relink, even though
test-aes128 does not link again ofproto.o.
2010-11-30 10:30:30 -08:00
Ben Pfaff
f4e2e60be4 netdev-linux: Remove counter double-increments.
A few coverage counters were incremented both in netdev generic code and
in netdev_linux code.  This commit drops the increments from the
lower-level code.

(This is not an actual bug because these counters are used only for
logging.)
2010-11-30 10:30:30 -08:00
Ethan Jackson
a339aa8162 netdev-linux: HFSC in linux
This commit implements the Hierarchical Fair Service Curve queuing
discipline in linux. HFSC performs better at high bandwidth and
implements min-rate proportional sharing of excess bandwidth.  Only
a simplified configuration interface is exposed to the user.  This
can be expand to allow more tweaking in the future.
2010-11-11 12:32:20 -08:00
Ben Pfaff
d98e600755 vlog: Make client supply semicolon for VLOG_DEFINE_THIS_MODULE.
It's kind of odd for VLOG_DEFINE_THIS_MODULE to supply its own semicolon,
so this commit switches to the more common form.
2010-10-29 09:48:47 -07:00
Ben Pfaff
23a98ffed7 netdev-linux: Always check tc_make_request() for NULL return value.
Bug #3912.
2010-10-22 14:51:50 -07:00
Ben Pfaff
f8da634725 netdev-linux: Remove unused data in htb_tc_load(). 2010-10-22 14:51:50 -07:00
Ethan Jackson
4ecf12d501 netdev-linux: Make queue 0 the default QOS policy
This patch defines, by convention, queue 0 as the default queue in
a particular QOS.  Thus, if queue 0 is defined, all traffic going
through the relevant interface will be enqueued in it. If queue 0
is not defined then ovs will send the traffic directly through the
interface without applying any policy to it.
2010-10-21 16:03:03 -07:00
Justin Pettit
da3827b551 netdev: Enforce a floor "linux-htb" min-rate 2010-10-08 14:30:31 -07:00
Justin Pettit
015c93a49a netdev: Don't divide by zero when "linux-htb" zero min-rate is used
A "min-rate" of zero for the "linux-htb" QoS type would cause a divide
by zero exception.  This patch prevents that by just returning zero.  A
later patch will try to enforce reasonable values for "min-rate".

Bug #3745
2010-10-08 14:22:36 -07:00
Ben Pfaff
b8dcf5e9c5 netdev: Pass class structure, instead of type, to "create" function.
This opens up the possibility of storing private data at a relative offset
to the class structure, instead of having to keep a separate table.
2010-10-06 13:49:07 -07:00
Ben Pfaff
d5590e7e41 netdev-linux: Fix off-by-one error dumping queue stats.
Linux kernel queue numbers are one greater than OpenFlow queue numbers, for
HTB anyhow.  The code to dump queues wasn't compensating for this, so this
commit fixes it up.
2010-10-01 10:40:00 -07:00
Ben Pfaff
4e8e4213a8 Switch many macros from using CONTAINER_OF to using OBJECT_CONTAINING.
These macros require one fewer argument by switching, which makes code
that uses them shorter and more readable.
2010-10-01 10:25:29 -07:00
Ben Pfaff
93b13be8e6 netdev-linux: Use hash table instead of sparse array for QoS classes.
The main advantage of a sparse array over a hash table is that it can be
iterated in numerical order.  But the OVS implementation of sparse arrays
is quite expensive in terms of memory: on a 32-bit system, a sparse array
with exactly 1 nonnull element has 512 bytes of overhead.  In this case,
the sparse array's property of iteration in numerical order is not
important, so this commit converts it to a hash table to save memory.
2010-10-01 10:25:10 -07:00
Ben Pfaff
2a022368f4 Avoid shadowing local variable names.
All of these changes avoid using the same name for two local variables
within a same function.  None of them are actual bugs as far as I can tell,
but any of them could be confusing to the casual reader.

The one in lib/ovsdb-idl.c is particularly brilliant: inner and outer
loops both using (different) variables named 'i'.

Found with GCC -Wshadow.
2010-09-20 09:39:54 -07:00
Joe Perches
d295e8e97a treewide: Remove trailing whitespace
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30 13:23:08 -07:00
Jesse Gross
d1eb60ccff datapath: Abstract tunneling implementation from GRE.
Much of the code in the GRE implementation is not specific to the
GRE protocol but is actually common to all types of tunnels.  In
order to support future types of tunnels, move this code into a
common library.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-24 15:17:29 -04:00
Ben Pfaff
5136ce492c vlog: Introduce VLOG_DEFINE_THIS_MODULE for declaring vlog module in use.
Adding a macro to define the vlog module in use adds a level of
indirection, which makes it easier to change how the vlog module must be
defined.  A followup commit needs to do that, so getting these widespread
changes out of the way first should make that commit easier to review.
2010-07-21 15:47:09 -07:00
Ben Pfaff
17ee3c1ffd netdev-linux: Avoid minor number 0 in traffic control.
Linux traffic control handles with minor number 0 refer to qdiscs, not
to classes.  This commit deals with this by using a conversion function:
OpenFlow queue 0 maps to minor 1, queue 1 to minor 2, and so on.
2010-07-20 11:26:58 -07:00
Ben Pfaff
3c4de644d2 netdev-linux: Dump all queues, not just direct children of the root.
A netdev-linux traffic control implementation has to dump all of a port's
traffic classes in a couple of different situations.  start_queue_dump()
is supposed to do that.  But it was specifying TC_H_ROOT as tcm_parent,
which only dumped classes that were direct children of the root.  This
commit changes tcm_parent to 0, which obtains all traffic classes
regardless of parent.
2010-07-20 11:26:58 -07:00
Ben Pfaff
c1c9c9c4b6 Implement QoS framework.
ovs-vswitchd doesn't declare its QoS capabilities in the database yet,
so the controller has to know what they are.  We can add that later.

The linux-htb QoS class has been tested to the extent that I can see that
it sets up the queues I expect when I run "tc qdisc show" and "tc class
show".  I haven't tested that the effects on flows are what we expect them
to be.  I am sure that there will be problems in that area that we will
have to fix.
2010-06-17 15:04:12 -07:00
Ben Pfaff
ff4ed3c9a1 netdev-linux: Create rtnetlink socket up front instead of on demand.
This simplifies a bit of existing code since it is known that an rtnetlink
socket will always be available.  It will simplify additional code in
upcoming commits.
2010-06-17 10:30:19 -07:00
Ben Pfaff
6912370445 netlink: Drop sock parameter from nl_msg_put_(ge)nlmsghdr().
These two functions use their "sock" parameter only to figure out the
nlmsg_pid to put in the nlmsghdr.  But that field can be filled in just
as well right before sending the message.  Since our functions for sending
Netlink messages always modify the nlmsghdr anyhow (to fill in the length),
there is little benefit to filling in the nlmsg_pid in advance.  The cost,
on the other hand, is having to pass another argument to functions that
already have too many.  So this commit removes the argument.
2010-06-17 10:30:18 -07:00
Jesse Gross
f4b6076aca netdev-vport: Use vport set_stats instead of internal dev.
In certain cases we require the ability to provide stats that are
added to the values collected by the kernel (currently only used
by bond fake devices).  Internal devices previously implemented
this directly but now that their stats are now handled by the vport
layer the functionality has been moved there.  This removes the
userspace code to set the stats and replaces it with a mechanism
to access the equivalent functionality in the vport layer.
2010-06-10 14:30:51 -07:00
Jesse Gross
7fbef77a30 netdev-linux: Add capability to get stats from vport layer.
The vport layer has the ability to track stats using 64-bit counters,
even if the kernel is only 32-bit.  This first attempts to collect
stats from these counters if they are available and otherwise falls
back to the normal Linux interfaces.
2010-06-10 14:30:51 -07:00
Jesse Gross
61b999dd6f netdev-linux: Give tap FD to first opener.
Tap devices can have two FDs that allow transmit and receive from
different perspectives.  We previously would always share one of
the FDs among all openers.  However, this is confusing to some
users (primarily the DHCP client) which expect tap devices to behave
like any other device.  Now we give the tap FD to the first opener,
which knows that it has opened a tap device, and a normal system FD
to everyone else for consistency.
2010-06-01 17:27:45 -07:00
Jesse Gross
92df599cb2 netdev-linux: Fix tap device stats.
For tap and internal devices we swap the transmit and receive stats
to appear consistent with other devices.  However, the check whether
to store the stats in a temporary location before the swap did not
include tap devices, which lead to the use of uninitialized memory
when the swap occured.
2010-06-01 17:27:45 -07:00
Jesse Gross
4d10512c91 netdev-linux: Quiet down ingress policing.
If we attempt to remove ingress policing and receive "invalid
argument" it means that policing isn't compiled into the kernel.
If it isn't compiled in then accept that policing has been
successfully removed.
2010-05-19 14:12:27 -07:00
Jesse Gross
2158888d8d patch: Remove veth driver.
Now that we have a new patch implementation, remove the veth driver
and its userspace components.  Then rename 'patchnew' to 'patch'.
The new implementation is a drop-in replacement for the old one.
2010-05-18 12:57:25 -07:00
Ben Pfaff
6f42c8ea9a netdev-linux: Optimize removing policing from an interface.
It is very expensive to start a subprocess and, especially, to wait for it
to complete.  This replaces the most common subprocess operation in
netdev_linux_set_policing() by a Netlink socket operation, which is much
faster.

Without this and the other netdev-linux commits, my 1000-interface test
case runs in 1 min 48 s.  With them, it runs in 25 seconds.
2010-05-05 14:00:50 -07:00
Ben Pfaff
80a86fbed4 netdev-linux: Cache policing values.
Without this and the following netdev-linux commits, my 1000-interface test
case runs in 1 min 48 s.  With them, it runs in 25 seconds.
2010-05-05 14:00:50 -07:00
Ben Pfaff
8e46022197 netdev-linux: Factor out removing policing.
This is duplicated code that the following commit will rewrite.
2010-05-05 14:00:50 -07:00
Ben Pfaff
a5af30fbaa netdev-linux: Factor out obtaining an RTNL socket.
Another function needs this same functionality in an upcoming commit, so
factor this into a new function get_rtnl_sock().
2010-05-05 14:00:50 -07:00
Ben Pfaff
8722022c0c Update fake bond devices' statistics with the sum of bond slaves' stats.
Needed by XAPI to accurately report bond statistics.

Ugh.

Bug NIC-63.
2010-04-19 11:12:27 -07:00
Jesse Gross
6f643e4946 tunneling: Remove old GRE implementation.
The new GRE implementation provides a complete drop in replacement
for the old Linux based implementation.  Therefore, remove the
old implementation and rename "grenew" to "gre".
2010-04-19 09:11:58 -04:00
Jesse Gross
658797c83a netdev-linux: Don't free a member of a struct.
We allocate struct netdev_linux which contains struct netdev but
free the netdev.  In practice this makes no difference because the
netdev is the first member of the struct but we should be correct
anyways.
2010-04-19 09:11:57 -04:00
Jesse Gross
15b3596a41 netdev-linux: Check notifications are for netdev-linux device.
When receiving a change notification from rtnetlink we checked whether
a netdev of that name existed and if so tried to handle it.  This also
checks that the type of the device is one handled by netdev-linux.
2010-04-19 09:11:57 -04:00
Justin Pettit
8aed4223e0 netdev: Add support for "patch" type
This commit introduces a new netdev type called "patch".  A patch is a
pair of interfaces, in which frames sent through one of the devices
pop out of the other.  This is useful for linking together datapaths.

A patch's only argument on creation is "peer", which specifies the other
side of the patch.  A patch must be created in pairs, so a second netdev
must be created with the "name" and "peer" values reversed.

The current implementation is built using veth devices.  Further, it's
limited to the veth devices which support configuration through sysfs.
This limits the ability to use a "patch" on 2.6.18 kernels using the
veth device we include (read: flavors of XenServer 5.5).  In the not too
distant future, the implementation will be modified to use the new
kernel port abstraction introduced by Jesse Gross's forthcoming GRE
work.  At that point, patch devices will work on any Linux platform
supported by OVS.
2010-04-15 03:50:28 -07:00
Jesse Gross
468991ad6c gre: Add support for path MTU discovery.
This allows path MTU discovery to properly work when used with
bridging.  While there was previously support for PMTUD it used
the kernel's IP stack.  This works fine for routing but when
bridging it is possible that a complete network is operating over
the bridge that the kernel has no knowledge of and the ICMP
fragmentation needed packets are lost.

When a packet arrives that is above the MTU of the tunnel, an
ICMP message is synthesized and send back on the device that the
original packet came from.  This does not rely on the kernel IP
stack and is therefore independent of the routing table.  Both
IPv4 and IPv6 are supported, including over VLANs.  Other types
of packets that are over the MTU are encapsulated and the outer
packets are fragmented.

This entire functionality is a layer violation since bridging
operates at layer 2 and fragmentation is a function of layer 3.
For this reason it is possible to disable PMTUD, which will
provide complete transparency but will cause the outer IP packets
to be fragmented.
2010-03-05 16:32:05 -05:00
Jesse Gross
8ab4016b36 gre: Allow ToS on outer packet to be configured.
When creating a GRE tunnel, it is now possible to either set the
ToS of the outer packet to a fixed value or copy it from the inner
packet.
2010-03-05 16:31:27 -05:00
Jesse Gross
a9a4b30c00 gre: Always set TTL on outer packet to 64.
Currently the TTL is copied from the inner packet of the tunnel to
the outer packet if the inner packet is IP.  This is good if your
GRE packets might make it into the input of your device but bad
if you want to be fully transparent.

This also resolves an inconsistency between tunnels set up using
the ioctl and using Netlink.  The ioctl version would force PMTUD
on if a fixed TTL is set as a backup way to prevent loops but it
never made it over to the newer Netlink code so obviously no one
cares too much about it.  This removes it to provide consistency
and transparency.

Basically, don't create loops and you will be happy.
2010-03-05 16:31:27 -05:00
Ben Pfaff
c69ee87c10 Merge "master" into "next".
The main change here is the need to update all of the uses of UNUSED in
the next branch to OVS_UNUSED as it is now spelled on "master".
2010-02-11 11:11:23 -08:00
Ben Pfaff
67a4917b07 Rename UNUSED macro to OVS_UNUSED to avoid naming conflict.
Requested by Jean Tourrilhes <jt@hpl.hp.com>.
2010-02-11 10:59:47 -08:00
Ben Pfaff
b62aeed2ab netdev-linux: Avoid fiddling with indeterminate data.
If we are using netlink to get stats and get_ifindex() fails, then for
an internal network device we will then swap around a bunch of
indeterminate (uninitialized) data values.  That won't hurt anything--the
caller will still set them to all-1-bits due to the error--but it still
seems wrong.  So this commit avoid it.

Found using Clang (http://clang-analyzer.llvm.org/).
2010-02-11 10:34:45 -08:00
Jesse Gross
46415c9085 netdev-linux: Use the netdev list of devices instead of cachemap.
We previously maintained a list of open devices inside of the
linux netdev.  Since the netdev library now maintains this list,
it is better to use that list instead of our own.
2010-01-18 18:26:44 -05:00
Jesse Gross
49a6a1636f netdev-linux: Avoid potential issues with unset FD.
Never close the file descriptor if it is 0, since it is never a
valid FD in this context.  Also initialize the FD to -1 so that
it is never set to a valid but incorrect value.
2010-01-18 18:23:14 -05:00
Jesse Gross
139faa3116 netdev-linux: Properly store netdev_dev pointer for RTNL callbacks.
We were storing a struct netdev_dev_linux ** instead of a
netdev_dev_linux * in the cache map.  This prevented the cache
from being invalidated on changes such as link status.
2010-01-16 09:50:35 -05:00
Justin Pettit
d5cdde1f96 netdev: Increase default ingress policing burst size
The default burst rate was 10Kb.  This increases it to 1000kb, since
we were having problems getting traffic through at 10kb.  A better value
probably exists between these two points, but that will require
additional experimentation.
2010-01-15 19:02:13 -08:00
Ben Pfaff
88258e0034 netdev-linux: Don't close(0) when closing an ordinary netdev.
Calling close(0) at random points is bad.  It means that the next call to
socket() or open() returns fd 0.  Then the next time a netdev gets closed,
that socket or file fd gets closed too, and you end up with weird "Bad
file descriptor" errors.

Found by installing the following as lib/unistd.h in the source tree:

#ifndef UNISTD_H
#define UNISTD_H 1

#include <stdlib.h>
#include_next <unistd.h>

#undef close
#define close(fd) rpl_close(fd)

static inline int rpl_close(int fd)
{
    if (!fd) {
        abort();
    }
    return (close)(fd);
}

#endif
2010-01-15 15:35:38 -08:00
Jesse Gross
5b7448ed80 netdev-linux: Cleanup tap netdev.
TAP devices need to be treated slightly differently from other other
devices because they cannot be opened multiple times.  Instead we
open them once and share the file descriptor.  This means that if
the netdev is opened multiple times one reader can drain the buffers
of another.  While this is a deviation from the normal convention,
it does not impact current or planned users.

In addition, this cleans up some confusion between the file
descriptor for tap devices versus other FD's.
2010-01-15 11:34:34 -05:00