2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-07 13:40:45 +00:00
Commit Graph

87 Commits

Author SHA1 Message Date
Jesse Gross
8819fac72b datapath: Don't expect bottom-halves to be disabled.
We currently document that BHs need to be disabled when handling
received packets.  However, this isn't actually generally the
case (usually preemption is disabled but not BHs).  Only one place
actually relies on BHs being disabled so fix that and update the
documentation of our expectations.
2010-05-14 15:10:45 -07:00
Jesse Gross
1c075d0aff datapath: Disable bottom-halves where necessary.
Places that update per-cpu stats without locking need to have bottom
halves disabled.  Otherwise we can be running in process context and
in the middle of an operation and be interrupted by a softirq.
2010-05-14 15:10:45 -07:00
Jesse Gross
1d7241c729 datapath: Use spin_lock_bh() consistently.
We are never called in hardirq context - only process or softirq.
Therefore it is not necessary to disable interrupts with
spin_lock_irqsave(), so use spin_lock_bh() everywhere.
2010-05-14 15:10:44 -07:00
Ben Pfaff
3fbd517acf datapath: Add 32-bit compatibility ioctls.
When a 32-bit userspace program runs on a 64-bit kernel, data structures
that contain members whose sizes or alignments change from 32- to 64-bit
must be translated when they are passed to ioctls.  This commit adds such
support for openvswitch_mod.

We should really reconsider some parts of the Open vSwitch ioctl interface
to avoid needing as much translation as we do.

Lightly tested with 32-bit userspace on sparc64.
2010-05-13 15:29:51 -07:00
Ben Pfaff
776f10ce0f datapath: Avoid __copy_to/from_user(), __get/put_user() functions.
The advantages of the double-underscore variants of copy_to_user(),
copy_from_user(), get_user(), and put_user() are pretty marginal, at best,
in the places where we are using them, and it's not always obvious that we
are making the right calls to access_ok() beforehand.  So switch to the
safe variants without double underscores.

Suggested-by: Jesse Gross <jesse@nicira.com>
2010-05-13 15:29:50 -07:00
Ben Pfaff
44e05ecafe datapath: Prepare to support 32-bit compatibility ioctls.
This commit prepares the core of datapath.c and vport.c to reduce the
amount of new code duplication when the following commit adds support for
32-bit compatibility ioctls.  It breaks a number of functions apart into
pairs of functions: one that copies data to and from userspace and another
that does the real work.

This change is a pure refactoring that should not change behavior.
2010-05-13 15:29:49 -07:00
Ben Pfaff
6d7568dc38 datapath: Avoid possibility of negative 'n_flows' in struct odp_flowvec.
do_flowvec_ioctl() was checking for too-big 'n_flows' but not negative
'n_flows'.  We could add that check too, but 'n_flows' should never be
negative so it's better to just use an unsigned type.
2010-05-13 15:29:46 -07:00
Jesse Gross
f4267e344a datapath: Enable offloading on internal devices.
Enables checksum offloading, scatter/gather, and TSO on internal
devices.  While these optimizations were not previously enabled on
internal ports we already could receive these types of packets from
Xen guests.  This has the obvious performance benefits when these
packets can be passed directly to hardware.

There is also a more subtle benefit for GRE on Xen.  GRE packets
pass through OVS twice - once before encapsulation and once after
encapsulation, moving through an internal device in the process.
If it is a SG packet (as is common on Xen), a copy was necessary
to linearize for the internal device.  However, Xen uses the
memory allocator to track packets so when the original packet is
freed after the copy netback notifies the guest that the packet
has been sent, despite the fact that it is actually sitting in the
transmit queue.  The guest then sends packets as fast as the CPU
can handle, overflowing the transmit queue.  By enabling SG on
the internal device, we avoid the copy and keep the accounting
correct.

In certain circumstances this patch can decrease performance for
TCP.  TCP has its own mechanism for tracking in-flight packets
and therefore does not benefit from the corrected socket accounting.
However, certain NICs do not like SG when it is not being used for
TSO (these packets can no longer be handled by TSO after GRE
encapsulation).  These NICs presumably enable SG even though they
can't handle it well because TSO requires SG.

Tested controllers (all 1G):
Marvell 88E8053 (large performance hit)
Broadcom BCM5721 (small performance hit)
Intel 82571EB (no change)
2010-05-07 11:29:11 -07:00
Jesse Gross
d8b5d43a04 datapath: Don't hold dp_mutex when setting internal devs MTU.
We currently acquire dp_mutex when we are notified that the MTU
of a device attached to the datapath has changed so that we can
set the internal devices to the minimum MTU.  However, it is not
required to hold dp_mutex because we already have RTNL lock and it
causes a deadlock, so don't do it.

Specifically, the issue is that DP mutex is acquired twice: once in
dp_device_event() before calling set_internal_devs_mtu() and then
again in internal_dev_change_mtu() when it is actually being changed
(since the MTU can also be set directly).  Since it's not a recursive
mutex, deadlock.
2010-04-30 17:23:44 -07:00
Jesse Gross
e1c1de3922 datapath: Ensure packet length matches headers during checksum setup.
During the setup of checksumming pointers we need to make sure that
the transport headers are in the skb linear data area.  However, we
don't currently verify that the lengths in the packet headers are
within the size of the packet.  This makes that check before a
BUG() check does it for us.

CC: "Nick Couchman" <Nick.Couchman@seakr.com>
2010-04-29 18:14:03 -07:00
Ben Pfaff
968f7c8d77 datapath: Check device name length more carefully in create_dp().
I don't see any value in silently truncating device names.  Doing so will
sow confusion in userspace.  This commit makes too-long device names
return ENAMETOOLONG.
2010-04-27 10:45:28 -07:00
Ben Pfaff
092a872d6e datapath: Always null-terminate network device name in create_dp().
strncpy() does not null-terminate its output buffer if the source string's
length is at least as large as its 'count' argument.  We know that the
source and destination buffers are the same size and that the source buffer
is null-terminated, so just use strcpy().

This fixes a kernel BUG message that often occurred when strlen(devname)
was exactly IFNAMSIZ-1.  In such a case, if
internal_dev_port.devname[IFNAMSIZ-1] happened to be nonzero, it would
eventually fail the following check in alloc_netdev_mq():
	BUG_ON(strlen(name) >= sizeof(dev->name));

Bug #2722.
2010-04-27 10:43:24 -07:00
Ben Pfaff
1224e8fc1d datapath: Fix argument to strncpy_from_user().
The strncpy_from_user() function's 'count' argument is documented to
include the trailing null byte, but create_dp() did not include it.  This
commit adds it in.
2010-04-27 10:21:12 -07:00
Jesse Gross
2d2a0e309f datapath: Add skb_csum_help compatibility function.
Later kernel versions remove the direction argument from
skb_checksum_help.  This provides a compatibility function so we
can have consistent syntax across versions.

Since CHECKSUM_PARTIAL is the same as CHECKSUM_HW on older kernels
this allows a unified code path for computing checksums.
2010-04-19 09:11:57 -04:00
Jesse Gross
8d5ebd839b datapath: Genericize hash table.
Currently the flow hash table assumes that it is storing flows.
However, we will need additional types of hash tables in the
future so remove assumptions about flows and convert the datapath
to use the new table.
2010-04-19 09:11:57 -04:00
Jesse Gross
f2459fe7d9 datapath: Add generic virtual port layer.
Currently the datapath directly accesses devices through their
Linux functions.  Obviously this doesn't work for virtual devices
that are not backed by an actual Linux device.  This creates a
new virtual port layer which handles all interaction with devices.

The existing support for Linux devices was then implemented on top
of this layer as two device types.  It splits out and renames dp_dev
to internal_dev.  There were several places where datapath devices
had to handled in a special manner and this cleans that up by putting
all the special casing in a single location.
2010-04-19 09:11:57 -04:00
Jesse Gross
659586efcf tunneling: Add support for tunnel ID.
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel).  Also
add an action which allows the tunnel ID to be set for outgoing
packets.  At this point there aren't any tunnel implementations so
these fields don't have any effect.

The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID.  ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
2010-04-19 09:11:51 -04:00
Jesse Gross
3c5f6de385 datapath: Validate ToS when flow is added.
Check that the ToS is valid when the flow is added, not every time
it is used.
2010-03-15 15:44:41 -04:00
Jesse Gross
2de320799d datapath: Disable large receive offload.
LRO can play fast and loose with the packets that it merges, which
isn't very polite when you are bridging packets for other operating
systems.  This disables LRO on any underlying devices that are added
to the datapath, which is the same as what the bridge does.

Note that this does not disable GRO, which has a more strict set of
rules about what is merged and is therefore safe for bridging.  Both
are typically done in software anyways.
2010-03-05 16:31:26 -05:00
Jesse Gross
635c9298b9 datapath: Update hardware computed checksum on VLAN change.
The checksum computed by hardware on receive stored in skb->csum
when skb->ip_summed == CHECKSUM_COMPLETE is supposed to reflect
the contents of the packet starting at skb->data, which includes
the VLAN tag if there is one.  However, when we manipulate the
VLAN tag we don't update the checksum.  This leads to all kinds
of nasty warnings about broken hardware, not to mention we can't
take advantage of the checksum that was already computed.

This also fixes some issues with our private checksum type value
on some different kernels and after GSO.
2010-03-05 15:22:17 -05:00
Jesse Gross
a063b0dff0 datapath: Consistently maintain checksum offloading state.
When adding a VLAN tag it is necessary for us to setup checksum
pointers for offloaded packets manually.  However, this process
clobbers some of the fields that other components need to determine
the current status.  Here we mark the packet with its status upon
ingress in our own format that does not get clobbered and is
consistent across kernel versions.

Bug #2436
2010-02-28 15:45:00 -05:00
Justin Pettit
834377ea55 ofproto: Match on IP ToS/DSCP bits (OpenFlow 1.0)
OpenFlow 1.0 adds support for matching on IP ToS/DSCP bits.

NOTE: OVS at this point is not wire-compatible with OpenFlow 1.0 until
the final commit in this OpenFlow 1.0 set.
2010-02-20 02:22:28 -08:00
Justin Pettit
959a2ecdc8 ofproto: Match VLAN PCP and rewrite ToS bits (OpenFlow 0.9)
Starting in OpenFlow 0.9, it is possible to match on the VLAN PCP
(priority) field and rewrite the IP ToS/DSCP bits.  This check-in
provides that support and bumps the wire protocol number to 0x98.

NOTE: The wire changes come together over the set of OpenFlow 0.9 commits,
so OVS will not be OpenFlow-compatible with any official release between
this commit and the one that completes the set.
2010-02-20 02:22:26 -08:00
Ben Pfaff
c69ee87c10 Merge "master" into "next".
The main change here is the need to update all of the uses of UNUSED in
the next branch to OVS_UNUSED as it is now spelled on "master".
2010-02-11 11:11:23 -08:00
Ben Pfaff
35f7605b3f datapath: Mark functions "static".
Found by sparse (http://sparse.wiki.kernel.org/).
2010-02-10 16:54:48 -08:00
Ben Pfaff
2175540274 datapath: When adding a port, return the new port number to userspace.
'port' is a kernel-space copy of the odp_port and modifying it is useless.
'portp' is the userspace copy; modifying it is useful.

None of our current userspace users care about the port number and so we
never noticed.

Found by sparse (http://sparse.wiki.kernel.org/).
2010-02-10 16:54:48 -08:00
Jesse Gross
7dab847a19 Fix some regressions from the merge from master. 2010-02-08 13:31:33 -05:00
Justin Pettit
a4af00400a Merge branch 'master' into next
Conflicts:
	COPYING
	datapath/datapath.h
	lib/automake.mk
	lib/dpif-provider.h
	lib/dpif.c
	lib/hmap.h
	lib/netdev-provider.h
	lib/netdev.c
	lib/stream-ssl.h
	ofproto/executer.c
	ofproto/ofproto.c
	ofproto/ofproto.h
	tests/automake.mk
	utilities/ovs-ofctl.c
	utilities/ovs-vsctl.in
	vswitchd/ovs-vswitchd.conf.5.in
	xenserver/etc_init.d_vswitch
	xenserver/etc_xensource_scripts_vif
	xenserver/opt_xensource_libexec_interface-reconfigure
2010-02-05 17:14:55 -08:00
Jesse Gross
a778696338 datapath: Set datapath device MTU to minimum of MTU of ports.
The MTU of the local port should be no larger than the minimum of
the MTUs of the ports attached to the bridge, overwise packets may be
dropped.  We already prevent changes to the MTU that would violate
this constraint but don't actuallly proactively set the MTU.  This
changes makes everything consistent and matches the behavior of
the bridge.
2010-02-02 14:56:40 -05:00
Jesse Gross
8cdaca99b5 datapath: Fix compilation on newer old-style Xen kernels.
Some ports of Xen (such as Debian Lenny's) use the old style
Xen checksumming fields on newer kernels.  Normally the code that
deals with those fields isn't used at all on newer kernels.  This
updates the checksumming pointer code with some changes from Lenny
Xen since it is cleaner and works well with our existing compatibility
layer.

CC:pspreadborough@comcast.net
2010-01-26 18:09:39 -05:00
Jesse Gross
a605732386 datapath: Handle packets with precomputed checksums.
On older kernels (< 2.6.19) CHECKSUM_HW can mean either that the
checksum has already been computed by hardware or that the checksum
needs to be computed by hardware, depending on whether we are on
the transmit or receive path.  Unfortunately since we are in the
middle of these two paths it is impossible to tell which is the
case.  Code after us assumes that CHECKSUM_HW means that the
checksum needs to be computed and will panic if there already is
a checksum.  On these kernels we mark these packets as CHECKSUM_NONE
before handing them off.

Without this change using certain NICs will cause panics.
2010-01-26 17:17:21 -05:00
Ben Pfaff
49c36903d6 Merge "sflow" into "master".
No conflicts, but lib/dpif.c needed a few changes since struct dpif's
member "class" was renamed to "dpif_class" in master since sflow was
branched off.
2010-01-25 10:52:28 -08:00
Ben Pfaff
53d3bbbc09 datapath: Clean up vswitch_skb_checksum_setup().
vswitch_skb_checksum_setup() can be defined in datapath.h as a no-op
when defined(CONFIG_XEN) && defined(HAVE_PROTO_DATA_VALID) is false.

Also, skb_checksum_setup(), which was defined similarly, can be dropped
now, since it was unused.
2010-01-25 10:24:38 -08:00
Ben Pfaff
56fd8edf80 sflow: Fix sFlow sampling structure.
According to Neil McKee, in an email archived at
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-January/000934.html:

    The containment rule is that a given sflow-datasource (sampler or
    poller) should be scoped within only one sflow-agent (or
    sub-agent).  So the issue arrises when you have two
    switches/datapaths defined on the same host being managed with
    the same IP address: each switch is a separate sub-agent, so they
    can run independently (e.g. with their own sequence numbers) but
    they can't both claim to speak for the same sflow-datasource.
    Specifically, they can't both represent the <ifindex>:0
    data-source.  This containment rule is necessary so that the
    sFlow collector can scale and combine the results accurately.

    One option would be to stick with the <ifindex>:0 data-source but
    elevate it to be global across all bridges, with a global
    sample_pool and a global sflow_agent.  Not tempting.  Better to
    go the other way and allow each interface to have it's own
    sampler, just as it already has it's own poller.  The ifIndex
    numbers are globally unique across all switches/datapaths on the
    host, so the containment is now clean.  Datasource <ifindex>:5
    might be on one switch, whille <ifindex>:7 can be on another.
    Other benefits are that 1) you can support the option of
    overriding the default sampling-rate on an interface-by-interface
    basis, and 2) this is how most sFlow implementations are coded,
    so there will be no surprises or interoperability issues with any
    sFlow collectors out there.

This commit implements the approach suggested by Neil.

This commit uses an atomic_t to represent the sampling pool.  This is
because we do want access to it to be atomic, but we expect that it will
"mostly" be accessed from a single CPU at a time.  Perhaps this is a bad
assumption; we can always switch to another form of synchronization later.

CC: Neil McKee <neil.mckee@inmon.com>
2010-01-20 14:33:28 -08:00
Ben Pfaff
72b0630028 Initial implementation of sFlow.
Tested very slightly with "ping" and "sflowtool -t | tcpdump -r -".
2010-01-04 13:08:37 -08:00
Ian Campbell
f7fed00035 datapath: Use HAVE_PROTO_DATA_VALID when defining vswitch_skb_checksum_setup
The purpose of the non-empty variant of vswitch_skb_checksum_setup is to
synchronise the proto_data_valid and proto_csum_blank fields into the
standard skb csum/ip_summed fields, therefore it is more correct to key
off of HAVE_PROTO_DATA_VALID.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
2009-11-19 10:25:57 -08:00
Justin Pettit
de3f65ea52 datapath: Cleanup tab/space issues in datapath 2009-11-16 18:48:29 -08:00
Jesse Gross
d65349ea28 Merge citrix branch into master. 2009-11-10 15:12:01 -08:00
Jesse Gross
18fdbe16de datapath: Allow TCP flags to be cleared.
When querying flow stats allow the TCP flags to be reset.  Since
the datapath ORs together all flags that have previously been
seen it is otherwise impossible to determine the set of flags from
after a particular time.
2009-11-06 14:05:14 -08:00
Jesse Gross
6d91c2fb6c datapath: Allow TCP flags to be cleared.
When querying flow stats allow the TCP flags to be reset.  Since
the datapath ORs together all flags that have previously been
seen it is otherwise impossible to determine the set of flags from
after a particular time.
2009-11-02 16:12:21 -08:00
Ben Pfaff
d6fbec6de0 Spell verb form of "set up" correctly throughout the tree. 2009-10-26 14:41:32 -07:00
Ben Pfaff
3f355f47f8 Merge "citrix" into "master".
This merge took a little bit of care due to two issues:

    - Crossport of "interface-reconfigure" fixes from master back to
      citrix that had happened and needed to be canceled out of the merge.

    - New script "refresh-xs-network-uuids" added on citrix branch that
      needed to be moved from /root/vswitch/scripts to
      /usr/share/vswitch/scripts.
2009-10-22 17:43:28 -07:00
Ben Pfaff
5562d3ccd2 datapath: Ignore return value from rtnl_notify().
In Linux 2.6.30, the rtnl_notify() return type was changed from int to
void along with the following commit message:

    This patch also modifies the rtnetlink code to ignore the return
    value of rtnl_notify() in all callers. The function rtnl_notify()
    (before this patch) returned the error of the unicast notification
    which makes rtnl_set_sk_err() reports errors to all listeners. This
    is not of any help since the origin of the change (the socket that
    requested the echoing) notices the ENOBUFS error if the notification
    fails and should resync itself.

Thus there's no point in checking the return value, even in older versions
of the kernel, and so this commit changes our code to ignore it, even
on older kernel versions.  We also update the rtnl_notify() wrapper macros
to make the return type void on older kernel versions.

This has not been tested, just built.

Thanks to Mikio for spurring me to try building with Linux 2.6.29 and
2.6.30.
2009-10-15 10:24:36 -07:00
Ben Pfaff
1b378b99f6 datapath: Fix warning on 64-bit builds. 2009-10-15 10:24:36 -07:00
Ben Pfaff
7c40efc9d3 datapath: Factor out code for getting and setting listen mask.
This fixes GCC warnings on 64-bit architectures caused by storing an "int"
in the "void *" f->private_data field.
2009-10-15 10:24:36 -07:00
Jean Tourrilhes
a4fbb689b0 datapath: Fix validation of ODPAT_SET_VLAN_PCP actions.
The VLAN PCP mask is in the rightmost bits of the vlan_pcp member but we
were checking for it in its position in the VLAN tag field instead.

Slightly modified from Jean's original patch by adding and using the
VLAN_PCP_SHIFT macro.
2009-10-08 10:41:48 -07:00
Ben Pfaff
22d24ebf66 datapath: Fix mutual exclusion with bridge on Linux 2.6.27+.
Linux 2.6.27 introduces a new mechanism for sharing STP packets among
kernel modules, which means that the code in datapath.c to avoid loading
when the Linux bridging module is also loaded has false positives.  So
fall back on these newer kernels to a less reliable way of avoiding the
bridge module, but one that does not have false positives.

CC: Jean Tourrihles <jt@hpl.hp.com>
2009-09-15 15:51:01 -07:00
Ben Pfaff
cb5087cadd datapath: Fix WARN_ON sending GSO packets to userspace in Linux 2.6.22+.
Until now, when dp_output_control() queued a GSO packet to userspace, it
would first compute the checksum for the whole GSO packet, then break the
packet into segments.  However this had two drawbacks:

    1. The checksum had to be recomputed for each segment, wasting time.
    2. Linux 2.6.22 and later would emit a warning in skb_gso_segment()
       because the checksum was precomputed.

This commit changes dp_output_control() to instead break the packet into
segments, then compute the checksum across each of the segments
individually.  This fixes both drawbacks.

This commit has seen light testing on Xen's 2.6.27.  It has been build
tested on a few different kernel versions.
2009-09-14 12:20:00 -07:00
Ben Pfaff
0d3b8a34d6 datapath: Fix comments. 2009-09-14 12:20:00 -07:00
Justin Pettit
8398cf7efc Merge commit 'origin/citrix' 2009-09-04 22:23:29 -07:00