2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-23 14:57:06 +00:00
Commit Graph

110 Commits

Author SHA1 Message Date
Justin Pettit
a4af00400a Merge branch 'master' into next
Conflicts:
	COPYING
	datapath/datapath.h
	lib/automake.mk
	lib/dpif-provider.h
	lib/dpif.c
	lib/hmap.h
	lib/netdev-provider.h
	lib/netdev.c
	lib/stream-ssl.h
	ofproto/executer.c
	ofproto/ofproto.c
	ofproto/ofproto.h
	tests/automake.mk
	utilities/ovs-ofctl.c
	utilities/ovs-vsctl.in
	vswitchd/ovs-vswitchd.conf.5.in
	xenserver/etc_init.d_vswitch
	xenserver/etc_xensource_scripts_vif
	xenserver/opt_xensource_libexec_interface-reconfigure
2010-02-05 17:14:55 -08:00
Jesse Gross
a778696338 datapath: Set datapath device MTU to minimum of MTU of ports.
The MTU of the local port should be no larger than the minimum of
the MTUs of the ports attached to the bridge, overwise packets may be
dropped.  We already prevent changes to the MTU that would violate
this constraint but don't actuallly proactively set the MTU.  This
changes makes everything consistent and matches the behavior of
the bridge.
2010-02-02 14:56:40 -05:00
Jesse Gross
8cdaca99b5 datapath: Fix compilation on newer old-style Xen kernels.
Some ports of Xen (such as Debian Lenny's) use the old style
Xen checksumming fields on newer kernels.  Normally the code that
deals with those fields isn't used at all on newer kernels.  This
updates the checksumming pointer code with some changes from Lenny
Xen since it is cleaner and works well with our existing compatibility
layer.

CC:pspreadborough@comcast.net
2010-01-26 18:09:39 -05:00
Jesse Gross
a605732386 datapath: Handle packets with precomputed checksums.
On older kernels (< 2.6.19) CHECKSUM_HW can mean either that the
checksum has already been computed by hardware or that the checksum
needs to be computed by hardware, depending on whether we are on
the transmit or receive path.  Unfortunately since we are in the
middle of these two paths it is impossible to tell which is the
case.  Code after us assumes that CHECKSUM_HW means that the
checksum needs to be computed and will panic if there already is
a checksum.  On these kernels we mark these packets as CHECKSUM_NONE
before handing them off.

Without this change using certain NICs will cause panics.
2010-01-26 17:17:21 -05:00
Ben Pfaff
49c36903d6 Merge "sflow" into "master".
No conflicts, but lib/dpif.c needed a few changes since struct dpif's
member "class" was renamed to "dpif_class" in master since sflow was
branched off.
2010-01-25 10:52:28 -08:00
Ben Pfaff
53d3bbbc09 datapath: Clean up vswitch_skb_checksum_setup().
vswitch_skb_checksum_setup() can be defined in datapath.h as a no-op
when defined(CONFIG_XEN) && defined(HAVE_PROTO_DATA_VALID) is false.

Also, skb_checksum_setup(), which was defined similarly, can be dropped
now, since it was unused.
2010-01-25 10:24:38 -08:00
Ben Pfaff
56fd8edf80 sflow: Fix sFlow sampling structure.
According to Neil McKee, in an email archived at
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-January/000934.html:

    The containment rule is that a given sflow-datasource (sampler or
    poller) should be scoped within only one sflow-agent (or
    sub-agent).  So the issue arrises when you have two
    switches/datapaths defined on the same host being managed with
    the same IP address: each switch is a separate sub-agent, so they
    can run independently (e.g. with their own sequence numbers) but
    they can't both claim to speak for the same sflow-datasource.
    Specifically, they can't both represent the <ifindex>:0
    data-source.  This containment rule is necessary so that the
    sFlow collector can scale and combine the results accurately.

    One option would be to stick with the <ifindex>:0 data-source but
    elevate it to be global across all bridges, with a global
    sample_pool and a global sflow_agent.  Not tempting.  Better to
    go the other way and allow each interface to have it's own
    sampler, just as it already has it's own poller.  The ifIndex
    numbers are globally unique across all switches/datapaths on the
    host, so the containment is now clean.  Datasource <ifindex>:5
    might be on one switch, whille <ifindex>:7 can be on another.
    Other benefits are that 1) you can support the option of
    overriding the default sampling-rate on an interface-by-interface
    basis, and 2) this is how most sFlow implementations are coded,
    so there will be no surprises or interoperability issues with any
    sFlow collectors out there.

This commit implements the approach suggested by Neil.

This commit uses an atomic_t to represent the sampling pool.  This is
because we do want access to it to be atomic, but we expect that it will
"mostly" be accessed from a single CPU at a time.  Perhaps this is a bad
assumption; we can always switch to another form of synchronization later.

CC: Neil McKee <neil.mckee@inmon.com>
2010-01-20 14:33:28 -08:00
Ben Pfaff
72b0630028 Initial implementation of sFlow.
Tested very slightly with "ping" and "sflowtool -t | tcpdump -r -".
2010-01-04 13:08:37 -08:00
Ian Campbell
f7fed00035 datapath: Use HAVE_PROTO_DATA_VALID when defining vswitch_skb_checksum_setup
The purpose of the non-empty variant of vswitch_skb_checksum_setup is to
synchronise the proto_data_valid and proto_csum_blank fields into the
standard skb csum/ip_summed fields, therefore it is more correct to key
off of HAVE_PROTO_DATA_VALID.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
2009-11-19 10:25:57 -08:00
Justin Pettit
de3f65ea52 datapath: Cleanup tab/space issues in datapath 2009-11-16 18:48:29 -08:00
Jesse Gross
d65349ea28 Merge citrix branch into master. 2009-11-10 15:12:01 -08:00
Jesse Gross
18fdbe16de datapath: Allow TCP flags to be cleared.
When querying flow stats allow the TCP flags to be reset.  Since
the datapath ORs together all flags that have previously been
seen it is otherwise impossible to determine the set of flags from
after a particular time.
2009-11-06 14:05:14 -08:00
Jesse Gross
6d91c2fb6c datapath: Allow TCP flags to be cleared.
When querying flow stats allow the TCP flags to be reset.  Since
the datapath ORs together all flags that have previously been
seen it is otherwise impossible to determine the set of flags from
after a particular time.
2009-11-02 16:12:21 -08:00
Ben Pfaff
d6fbec6de0 Spell verb form of "set up" correctly throughout the tree. 2009-10-26 14:41:32 -07:00
Ben Pfaff
3f355f47f8 Merge "citrix" into "master".
This merge took a little bit of care due to two issues:

    - Crossport of "interface-reconfigure" fixes from master back to
      citrix that had happened and needed to be canceled out of the merge.

    - New script "refresh-xs-network-uuids" added on citrix branch that
      needed to be moved from /root/vswitch/scripts to
      /usr/share/vswitch/scripts.
2009-10-22 17:43:28 -07:00
Ben Pfaff
5562d3ccd2 datapath: Ignore return value from rtnl_notify().
In Linux 2.6.30, the rtnl_notify() return type was changed from int to
void along with the following commit message:

    This patch also modifies the rtnetlink code to ignore the return
    value of rtnl_notify() in all callers. The function rtnl_notify()
    (before this patch) returned the error of the unicast notification
    which makes rtnl_set_sk_err() reports errors to all listeners. This
    is not of any help since the origin of the change (the socket that
    requested the echoing) notices the ENOBUFS error if the notification
    fails and should resync itself.

Thus there's no point in checking the return value, even in older versions
of the kernel, and so this commit changes our code to ignore it, even
on older kernel versions.  We also update the rtnl_notify() wrapper macros
to make the return type void on older kernel versions.

This has not been tested, just built.

Thanks to Mikio for spurring me to try building with Linux 2.6.29 and
2.6.30.
2009-10-15 10:24:36 -07:00
Ben Pfaff
1b378b99f6 datapath: Fix warning on 64-bit builds. 2009-10-15 10:24:36 -07:00
Ben Pfaff
7c40efc9d3 datapath: Factor out code for getting and setting listen mask.
This fixes GCC warnings on 64-bit architectures caused by storing an "int"
in the "void *" f->private_data field.
2009-10-15 10:24:36 -07:00
Jean Tourrilhes
a4fbb689b0 datapath: Fix validation of ODPAT_SET_VLAN_PCP actions.
The VLAN PCP mask is in the rightmost bits of the vlan_pcp member but we
were checking for it in its position in the VLAN tag field instead.

Slightly modified from Jean's original patch by adding and using the
VLAN_PCP_SHIFT macro.
2009-10-08 10:41:48 -07:00
Ben Pfaff
22d24ebf66 datapath: Fix mutual exclusion with bridge on Linux 2.6.27+.
Linux 2.6.27 introduces a new mechanism for sharing STP packets among
kernel modules, which means that the code in datapath.c to avoid loading
when the Linux bridging module is also loaded has false positives.  So
fall back on these newer kernels to a less reliable way of avoiding the
bridge module, but one that does not have false positives.

CC: Jean Tourrihles <jt@hpl.hp.com>
2009-09-15 15:51:01 -07:00
Ben Pfaff
cb5087cadd datapath: Fix WARN_ON sending GSO packets to userspace in Linux 2.6.22+.
Until now, when dp_output_control() queued a GSO packet to userspace, it
would first compute the checksum for the whole GSO packet, then break the
packet into segments.  However this had two drawbacks:

    1. The checksum had to be recomputed for each segment, wasting time.
    2. Linux 2.6.22 and later would emit a warning in skb_gso_segment()
       because the checksum was precomputed.

This commit changes dp_output_control() to instead break the packet into
segments, then compute the checksum across each of the segments
individually.  This fixes both drawbacks.

This commit has seen light testing on Xen's 2.6.27.  It has been build
tested on a few different kernel versions.
2009-09-14 12:20:00 -07:00
Ben Pfaff
0d3b8a34d6 datapath: Fix comments. 2009-09-14 12:20:00 -07:00
Justin Pettit
8398cf7efc Merge commit 'origin/citrix' 2009-09-04 22:23:29 -07:00
Justin Pettit
a393b897f2 datapath: Don't drop MTU-sized VLAN packets from userspace
Before transimitting a packet, the datapath checks that the packet
length is not greater than the MTU.  It determines the length based on
the 'protocol' field in the skb.  If 'protocol' is ETH_P_8021Q, it reduces
the packet length as stored in the 'len' field by four bytes, which
is the size of a VLAN tag header.  Unfortunately, packets that arrived
from userspace were not having the 'protocol' field set, which would
cause MTU-sized packets to be dropped.  This commit sets the 'protocol'
field appropriately.

Thanks to Ben Pfaff for the help diagnosing this issue.

NIC-17 and NIC-26
2009-09-04 22:19:15 -07:00
Ben Pfaff
f1acd62b54 Merge citrix branch into master. 2009-09-02 10:14:53 -07:00
Ben Pfaff
6fa58f7a15 datapath: Use hash table more tolerant of collisions for flow table.
The hash table used until now in the kernel datapath for storing the flow
table provides only two slots that a given flow can occupy.  If both of
those slots are already full, for a given flow, then that flow cannot be
added at all and its packets must be handled entirely in userspace, taking
a performance hit.  The code does attempt to compensate for this by making
the flow table rather large: 8 slots per flow actually in the flow table.
In practice, this is usually good enough, but some of the tests that we
have run show bad enough performance degradation or even timeouts of
various kinds that we want to implement something better.

This commit replaces the existing hash table by one with a completely
different design in which buckets are flexibly sized and can accept any
number of collisions.  By use of suitable levels of indirection, this
design is both simple and RCU-compatible.  I did consider other schemes,
but none of the ones that I came up with shared both of those two
properties.

This commit also adds kerneldoc comments for all of the flow table
non-static functions and data structures.

This has been lightly tested for correctness.  It has not been tested for
performance.

Bug #1656.  Bug #1851.
2009-09-01 10:36:42 -07:00
Ben Pfaff
2c7807ac4f datapath: Remove WARN_ON_ONCE(1) now that this code has been exercised.
The code on one side of this #if fork was difficult to test until Xen
upgraded to a new enough kernel that it would exercise it.  Later Xen
kernels are now available and this code path has been tested, at least to
some extent, so remove the warning.

Thanks to Ian Campbell <Ian.Campbell@citrix.com> for pointing out the
warning.
2009-09-01 10:12:12 -07:00
Justin Pettit
3c71830aef dpif: Address portability issues in dpif-netdev
There were a number of Linux assumptions in dpif-netdev that were not
necessary.  This commit cleans those up to aid portability.
2009-08-25 14:12:01 -07:00
Justin Pettit
00908dc27a Merge commit 'origin/citrix' 2009-08-25 13:23:11 -07:00
Justin Pettit
4f0b85d66b datapath: Return EFBIG instead of EXFULL when no room in flow table
The EXFULL errno is only defined in Linux.  While this datapath is
Linux-specific, the userspace that interacts with it is not.
2009-08-25 13:17:26 -07:00
Ben Pfaff
1d87357a13 Merge citrix into master. 2009-08-19 16:08:18 -07:00
Ben Pfaff
8fef8c7121 Merge citrix into master.
This was a somewhat difficult merge since there was a fair amount of
superficially divergent development on the two branches, especially in the
datapath.

This has been build-tested against XenServer 5.5.0 and XenServer 5.7.0
build 15122.  It has been booted and connected to XenCenter on 5.5.0.

The merge revealed a couple of outstanding bugs, which will be fixed on
citrix and then merged back into master.
2009-08-19 13:03:46 -07:00
Ian Campbell
b2f460c72d datapath: Only call skb_checksum_setup on 2.6.18 && Xen.
For newer kernels the checksum setup is done at the point the skb is
received in netback or netfront so there is no more need to sprinkle
skb_checksum_setup calls throughout the kernel.
2009-08-18 16:09:32 -07:00
Ben Pfaff
b0c32774c9 datapath: Improve comments. 2009-08-18 12:36:46 -07:00
Ben Pfaff
0515ceb3e8 datapath: Update sysfs links when network devices are renamed.
We create symlinks from /sys/class/net/<bridgename>/brif/<devname> to
/sys/class/net/<devname>/brport, but until now we have never updated the
links when network devices are renamed.  This commit fixes this problem.

(Only the <devname> in /sys/class/net/<bridgename>/brif/<devname> needs to
be updated.  Symlinks within sysfs have stable targets; that is, no matter
how the object that a sysfs symlink points to moves around, the link is
still maintained correctly.)
2009-08-06 16:57:06 -07:00
Ben Pfaff
58c342f617 datapath: Fix OOPS when dp_sysfs_add_if() fails.
Until now, when dp_sysfs_add_if() failed, the caller ignored the failure.
This is a minor problem, because everything else should continue working,
without sysfs entries for the interface, in theory anyhow.  In actual
practice, the error exit path of dp_sysfs_add_if() does a kobject_put(),
and that kobject_put() calls release_nbp(), so that the new port gets
freed.  The next reference to the new port (usually in an ovs-vswitchd call
to the ODP_PORT_LIST ioctl) will then use the freed data and probably OOPS.

The fix is to make the datapath code, as opposed to the sysfs code,
responsible for creating and destroying the net_bridge_port kobject.  The
dp_sysfs_{add,del}_if() functions then just attach and detach the kobject
to sysfs and their cleanup routines no longer need to destroy the kobject
and indeed we don't care whether dp_sysfs_add_if() really succeeds.

This commit also makes the same transformation to the datapath's ifobj,
for consistency.

It is easy to trigger the OOPS fixed by this commit by adding a network
device A to a datapath, then renaming network device A to B, then renaming
network device C to A, then adding A to the datapath.  The last attempt to
add A will fail because a file named /sys/class/net/<datapath>/brif/A
already exists from the time that C was added to the datapath under the
name A.

This commit also adds some compatibility infrastructure, because it moves
code out of #ifdef SUPPORT_SYSFS and it otherwise wouldn't build.
2009-08-06 16:57:06 -07:00
Ben Pfaff
2ba9026e2f datapath: Rename brc_sysfs_* to dp_sysfs_*.
These files and names are now part of the datapath, not brcompat, so name
them appropriately so as not to confuse anyone.
2009-08-06 16:57:06 -07:00
Ben Pfaff
2e7dd8eca8 datapath: Move sysfs support from brcompat_mod into openvswitch_mod.
In the past problems have arisen due to the different ways that datapaths
are created and destroyed in the three different cases:

	1. sysfs supported, brcompat_mod loaded.
	2. sysfs supported, brcompat_mod not loaded.
	3. sysfs not supported.

The brcompat_mod loaded versus not loaded distinction is the stickiest
because we have to do all the calls into brcompat_mod through hook
functions, which in turn causes pressure to keep the number of hook
functions small and well-defined, which makes it really difficult to put
the hook call points at exactly the right place.  Witness, for example,
this piece of code in datapath.c:

        int dp_del_port(struct net_bridge_port *p)
        {
                ASSERT_RTNL();

        #ifdef SUPPORT_SYSFS
                if (p->port_no != ODPP_LOCAL && dp_del_if_hook)
                        sysfs_remove_link(&p->dp->ifobj, p->dev->name);
        #endif

The code inside the #ifdef is logically part of the brcompat_mod sysfs
support, but the author of this code (quite reasonably) didn't want to
add a hook function call there.  After all, what would you call the
hook function?  There's no obvious name from the dp_del_port() caller's
perspective.

All this argues that sysfs support should be in openvswitch_mod itself,
since it has to be tightly integrated, not bolted on.  So this commit
moves it there.

Now, this is not to say that openvswitch_mod should actually be
implementing bridge-compatible sysfs.  In the future, it probably should
not be; rather, it should implement something appropriate for Open vSwitch
datapaths instead.  But right now we have bridge-compatible sysfs, and so
that's what this commit moves.
2009-08-06 16:57:06 -07:00
Justin Pettit
1dcf111b1c datapath: Support jumbo frames in the datapath device
The datapath has no problems switching jumbo frames (frames with a payload
greater than 1500 bytes), but it has not supported sending and receiving
them to the device itself.  With this commit, the MTU can be set as large
as the minimum MTU size of the devices that are directly attached, or 1500
bytes if there are none.  This mimics the behavior of the Linux bridge.

Feature #1736
2009-08-03 13:29:22 -07:00
Ben Pfaff
3b01baa397 Merge citrix branch into master. 2009-07-16 11:54:37 -07:00
Ben Pfaff
923229363a datapath: Don't orphan packets in dp_dev transmit path.
Before commit 72ca14c1 "datapath: Fix race against workqueue in dp_dev and
simplify code," the dp_dev network device had a device queue, and we would
orphan packets before sticking them on the queue.  This screwed up socket
accounting a bit, but the effect was limited to the device queue length.

Now, after that commit, the dp_dev device has no device queue, but it still
orphans packets.  This screws up socket accounting a *lot*, because the
effect is now unlimited, since there is no queue to limit it.

The solution is to not orphan packets at all.  There is little need for it
now since packet transmission now happens immediately, not in a workqueue
whose execution may be delayed.

This should fix bug #1519, which tests "netperf -t UDP_STREAM" performance,
finding that an unrealistically high number of UDP packets could be sent
but that none at all were received.  The send rate is due to the orphaning,
the receive rate presumably because at least one out of approx. 65535/1500
= 44 fragments per full packet were dropped in each case.
2009-07-13 15:50:32 -07:00
Ben Pfaff
828bc1f072 datapath: Fix race in datapath creation.
Before we create the local port, we should allocate and assign the table.
Otherwise packets sent on the local port before we do so will cause an
OOPS.

This is a theoretical race that has not been observed in practice.
2009-07-08 14:13:15 -07:00
Ben Pfaff
72ca14c154 datapath: Fix race against workqueue in dp_dev and simplify code.
The dp_dev_destroy() function failed to cancel the xmit_queue work, which
allowed it to run after the device had been destroyed, accessing freed
memory.  However, simply canceling the work with cancel_work_sync() would
be insufficient, since other packets could get queued while the work
function was running.  Stopping the queue with netif_tx_disable() doesn't
help, because the final action in dp_dev_do_xmit() is to re-enable the TX
queue.

This issue led me to re-examine why the dp_dev needs to use a work_struct
at all.  This was implemented in commit 71f13ed0b "Send of0 packets from
workqueue, to avoid recursive locking of ofN device" due to a complaint
from lockdep about recursive locking.

However, there's no actual reason that we need any locking around
dp_dev_xmit().  Until now, it has accepted the standard locking provided
by the network stack.  But looking at the other software devices (veth,
loopback), those use NETIF_F_LLTX, which disables this locking, and
presumably do so for this very reason.  In fact, the lwn article at
http://lwn.net/Articles/121566/ hints that NETIF_F_LLTX, which is otherwise
discouraged in the kernel, is acceptable for "certain types of software
device."

So this commit switches to using NETIF_F_LLTX for dp_dev and gets rid
of the work_struct.

In the process, I noticed that veth and loopback also take advantage of
a network device destruction "hook" using the net_device "destructor"
member.  Using this we can automatically get called on network device
destruction at the point where rtnl_unlock() is called.  This allows us
to stop stringing the dp_devs that are being destroyed onto a list so
that we can free them, and thus simplifies the code along all the paths
that call dp_dev_destroy().

This commit gets rid of a call to synchronize_rcu() (disguised as a call
to synchronize_net(), which is a macro that expands to synchronize_rcu()),
so it probably speeds up deleting ports, too.
2009-07-08 14:13:15 -07:00
Ben Pfaff
6fba0d0b82 datapath: Fix use-after-free error in datapath destruction.
When we create a datapath we do this:

	1. Create local port.
	2. Call add_dp hook.
	3. Allow userspace to add more ports.

When we deleted a datapath we were doing this:

	1. Call del_dp hook
	2. Delete all the ports.

Unfortunately step 1 destroys dp->ifobj, then dp_del_port on any port other
than the local port in step 2 tries to reference dp->ifobj through a call
to sysfs_remove_link().

This commit fixes the problem by changing datapath deletion to mirror
creation:

	1. Delete all the ports but the local port.
	2. Call dp_del hook.
	3. Delete local port.

Commit 010082639 "datapath: Add sysfs support for all (otherwise supported)
Linux versions" makes this problem obvious on a 2.6.25+ kernel configured
with slab debugging, because on such kernels the ifobj is a pointer to a
slab object that is freed by the del_dp hook function (when brcompat_mod
is loaded).  This bug may be just as present on older kernels, but there
the ifobj is part of struct datapath, not a pointer, and thus it is much
harder to trigger.

Bug #1465.
2009-07-08 14:13:15 -07:00
Ben Pfaff
334b374988 datapath: Remove redundant synchronize_rcu() call.
There is no benefit to synchronizing twice, and it might cost us a lot of
time.
2009-07-08 14:13:15 -07:00
Ben Pfaff
f4ba4c4f95 datapath: Change ODP_PORT_LIST semantics.
Until now, ODP_PORT_LIST has reported the number of ports actually copied
out.  It's better for the caller, however, if it reports the number of
ports that were available to be copied out.
2009-07-06 09:07:24 -07:00
Ben Pfaff
e86c8696eb datapath: Make openvswitch_ioctl() have a single point of exit.
This makes it easier to insert debug printk() calls in a single place if
necessary, and conforms at least as well with Linux kernel style.
2009-07-06 09:07:24 -07:00
Ben Pfaff
330a8abb28 datapath: Fix ODP_PORT_DEL handling of bad user memory read. 2009-07-06 09:07:24 -07:00
Ben Pfaff
1619019101 datapath: Style fix. 2009-07-06 09:07:24 -07:00
Ben Pfaff
f1aa2072c8 datapath: Get rid of query operations for single flows. 2009-07-06 09:07:24 -07:00