2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-13 14:07:02 +00:00
Commit Graph

199 Commits

Author SHA1 Message Date
Jesse Gross
fceb2a5bb2 datapath: Put return type on same line as arguments for functions.
In some places we would put the return type on the same line as
the rest of the function definition and other places we wouldn't.
Reformat everything to match kernel style.
2010-07-15 15:09:08 -07:00
Jesse Gross
1336993ca4 datapath: Make checksum offsets unsigned.
The offsets for checksum offsets should always be positive so make
that explicit by using unsigned ints.  This helps bug checks that
test if the offsets are greater than their upper limits.
2010-07-15 15:09:08 -07:00
Jesse Gross
f057cddae2 datapath: Make our checksumming more closely match skb_checksum_help().
Our code that handles checksumming does essentially the same thing
as skb_checksum_help() except it folds the process into copying to
userspace.  This makes the two functions more closely resemble
each other in structure, including adding a couple of BUG() checks.
This should have no functional change but makes comparision easier
when debugging.
2010-07-13 14:28:59 -07:00
Yu Zhiguo
656a0e37da datapath: fix header file include
linux/highmem.h should be included rather than asm/highmem.h,
otherwise openvswitch_mod cannot resolve kmap and kunmap on some arch.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
2010-06-25 09:34:26 -07:00
Jesse Gross
9fc10ed911 datapath: Don't compute checksums on partial packets.
If we are only copying part of a packet to userspace don't bother
computing the checksum on that part since it is meaningless.
Instead, fall back to the old method of checksumming before
copying to ensure the correct result.

This was supposed to be part of the previous commit but was left
off.
2010-06-18 17:11:44 -07:00
Jesse Gross
9cc8b4e4af datapath: Compute checksum while sending packets to userspace().
Currently we compute the checksums on packets being sent to
userspace first and then copy them to a userspace buffer.  However,
these two operations can be combined for a significant savings
because the packet data only has to be loaded once.  This also
allows GSO packets to save an extra copy.

This will likely have an impact on NIC-121 because it eliminates
the code path that triggers the issue.  However, it is not a fix
for the root cause.
2010-06-18 14:20:01 -07:00
Jesse Gross
a2377e444a datapath: Call vswitch_skb_checksum_setup() before doing GSO.
Since GSO computes checksums as it does segmentation, we need to
setup the checksum pointers before calling skb_gso_segment().  Failing
to do so can potentially result in warnings, incorrect checksums,
crashes, or redundant checksum computation.  In general we don't
hit this case because the code path is run during the first packet
in a flow, which is generally not a large GSO packet.

This was found during the investigation of NIC-121 but has no impact
on it because vswitch_skb_checksum_setup() is a no-op on 2.6.27-based
XenServer kernels.
2010-06-18 11:38:10 -07:00
Jesse Gross
780e620781 vport: Allow offsets to be set for stats.
Adds a method to set a group of stats to be added to the values
gathered normally.  This is needed for the fake bond device to
show the stats of its underlying slaves.  Also enables devices
that use the generic stats layer to define a get_stats() function
to provide additional error counts.
2010-06-10 14:29:32 -07:00
Jesse Gross
61e89cd6d6 vport: Rename userspace functions.
The vport library can be accessed from both userspace and the
kernel using different sets of functions.  These functions were
named similarly, so add _user to the userspace variants to
distinguish them.
2010-06-09 12:30:41 -07:00
Ben Pfaff
65d9d5f8b8 datapath: Fix ODP_PORT_GROUP_GET implementation.
The final argument to do_get_port_group() is supposed to be a user pointer
to the number of ports, to be updated with put_user(), but it was actually
a kernel pointer, so "ovs-dpctl dump-groups" and anything else that used
this ioctl would always fail with -EFAULT.  This commit fixes it.

Bug introduced in commit 44e05eca "datapath: Prepare to support 32-bit
compatibility ioctls" for normal ioctls and for compat ioctls at their
introduction in commit 3fbd517acf"datapath: Add 32-bit compatibility
ioctls."
2010-05-25 15:42:48 -07:00
Jesse Gross
7183d1ecce datapath: Use per_cpu_ptr instead of percpu_ptr.
percpu_ptr was removed in 2.6.30, so update the one remaining user
and take out the compatibility code.

Suggested-by: Ben Pfaff <blp@nicira.com>
2010-05-14 15:10:45 -07:00
Jesse Gross
9dca7bd50a datapath: Hold rcu_read_lock where we claim to.
Many of the vport operations require that either RTNL lock or
rcu_read_lock be held.  However, operations from userspace often
hold a different lock so grab rcu_read_lock as well.
2010-05-14 15:10:45 -07:00
Jesse Gross
8819fac72b datapath: Don't expect bottom-halves to be disabled.
We currently document that BHs need to be disabled when handling
received packets.  However, this isn't actually generally the
case (usually preemption is disabled but not BHs).  Only one place
actually relies on BHs being disabled so fix that and update the
documentation of our expectations.
2010-05-14 15:10:45 -07:00
Jesse Gross
1c075d0aff datapath: Disable bottom-halves where necessary.
Places that update per-cpu stats without locking need to have bottom
halves disabled.  Otherwise we can be running in process context and
in the middle of an operation and be interrupted by a softirq.
2010-05-14 15:10:45 -07:00
Jesse Gross
1d7241c729 datapath: Use spin_lock_bh() consistently.
We are never called in hardirq context - only process or softirq.
Therefore it is not necessary to disable interrupts with
spin_lock_irqsave(), so use spin_lock_bh() everywhere.
2010-05-14 15:10:44 -07:00
Ben Pfaff
3fbd517acf datapath: Add 32-bit compatibility ioctls.
When a 32-bit userspace program runs on a 64-bit kernel, data structures
that contain members whose sizes or alignments change from 32- to 64-bit
must be translated when they are passed to ioctls.  This commit adds such
support for openvswitch_mod.

We should really reconsider some parts of the Open vSwitch ioctl interface
to avoid needing as much translation as we do.

Lightly tested with 32-bit userspace on sparc64.
2010-05-13 15:29:51 -07:00
Ben Pfaff
776f10ce0f datapath: Avoid __copy_to/from_user(), __get/put_user() functions.
The advantages of the double-underscore variants of copy_to_user(),
copy_from_user(), get_user(), and put_user() are pretty marginal, at best,
in the places where we are using them, and it's not always obvious that we
are making the right calls to access_ok() beforehand.  So switch to the
safe variants without double underscores.

Suggested-by: Jesse Gross <jesse@nicira.com>
2010-05-13 15:29:50 -07:00
Ben Pfaff
44e05ecafe datapath: Prepare to support 32-bit compatibility ioctls.
This commit prepares the core of datapath.c and vport.c to reduce the
amount of new code duplication when the following commit adds support for
32-bit compatibility ioctls.  It breaks a number of functions apart into
pairs of functions: one that copies data to and from userspace and another
that does the real work.

This change is a pure refactoring that should not change behavior.
2010-05-13 15:29:49 -07:00
Ben Pfaff
6d7568dc38 datapath: Avoid possibility of negative 'n_flows' in struct odp_flowvec.
do_flowvec_ioctl() was checking for too-big 'n_flows' but not negative
'n_flows'.  We could add that check too, but 'n_flows' should never be
negative so it's better to just use an unsigned type.
2010-05-13 15:29:46 -07:00
Jesse Gross
f4267e344a datapath: Enable offloading on internal devices.
Enables checksum offloading, scatter/gather, and TSO on internal
devices.  While these optimizations were not previously enabled on
internal ports we already could receive these types of packets from
Xen guests.  This has the obvious performance benefits when these
packets can be passed directly to hardware.

There is also a more subtle benefit for GRE on Xen.  GRE packets
pass through OVS twice - once before encapsulation and once after
encapsulation, moving through an internal device in the process.
If it is a SG packet (as is common on Xen), a copy was necessary
to linearize for the internal device.  However, Xen uses the
memory allocator to track packets so when the original packet is
freed after the copy netback notifies the guest that the packet
has been sent, despite the fact that it is actually sitting in the
transmit queue.  The guest then sends packets as fast as the CPU
can handle, overflowing the transmit queue.  By enabling SG on
the internal device, we avoid the copy and keep the accounting
correct.

In certain circumstances this patch can decrease performance for
TCP.  TCP has its own mechanism for tracking in-flight packets
and therefore does not benefit from the corrected socket accounting.
However, certain NICs do not like SG when it is not being used for
TSO (these packets can no longer be handled by TSO after GRE
encapsulation).  These NICs presumably enable SG even though they
can't handle it well because TSO requires SG.

Tested controllers (all 1G):
Marvell 88E8053 (large performance hit)
Broadcom BCM5721 (small performance hit)
Intel 82571EB (no change)
2010-05-07 11:29:11 -07:00
Jesse Gross
d8b5d43a04 datapath: Don't hold dp_mutex when setting internal devs MTU.
We currently acquire dp_mutex when we are notified that the MTU
of a device attached to the datapath has changed so that we can
set the internal devices to the minimum MTU.  However, it is not
required to hold dp_mutex because we already have RTNL lock and it
causes a deadlock, so don't do it.

Specifically, the issue is that DP mutex is acquired twice: once in
dp_device_event() before calling set_internal_devs_mtu() and then
again in internal_dev_change_mtu() when it is actually being changed
(since the MTU can also be set directly).  Since it's not a recursive
mutex, deadlock.
2010-04-30 17:23:44 -07:00
Jesse Gross
e1c1de3922 datapath: Ensure packet length matches headers during checksum setup.
During the setup of checksumming pointers we need to make sure that
the transport headers are in the skb linear data area.  However, we
don't currently verify that the lengths in the packet headers are
within the size of the packet.  This makes that check before a
BUG() check does it for us.

CC: "Nick Couchman" <Nick.Couchman@seakr.com>
2010-04-29 18:14:03 -07:00
Ben Pfaff
968f7c8d77 datapath: Check device name length more carefully in create_dp().
I don't see any value in silently truncating device names.  Doing so will
sow confusion in userspace.  This commit makes too-long device names
return ENAMETOOLONG.
2010-04-27 10:45:28 -07:00
Ben Pfaff
092a872d6e datapath: Always null-terminate network device name in create_dp().
strncpy() does not null-terminate its output buffer if the source string's
length is at least as large as its 'count' argument.  We know that the
source and destination buffers are the same size and that the source buffer
is null-terminated, so just use strcpy().

This fixes a kernel BUG message that often occurred when strlen(devname)
was exactly IFNAMSIZ-1.  In such a case, if
internal_dev_port.devname[IFNAMSIZ-1] happened to be nonzero, it would
eventually fail the following check in alloc_netdev_mq():
	BUG_ON(strlen(name) >= sizeof(dev->name));

Bug #2722.
2010-04-27 10:43:24 -07:00
Ben Pfaff
1224e8fc1d datapath: Fix argument to strncpy_from_user().
The strncpy_from_user() function's 'count' argument is documented to
include the trailing null byte, but create_dp() did not include it.  This
commit adds it in.
2010-04-27 10:21:12 -07:00
Jesse Gross
2d2a0e309f datapath: Add skb_csum_help compatibility function.
Later kernel versions remove the direction argument from
skb_checksum_help.  This provides a compatibility function so we
can have consistent syntax across versions.

Since CHECKSUM_PARTIAL is the same as CHECKSUM_HW on older kernels
this allows a unified code path for computing checksums.
2010-04-19 09:11:57 -04:00
Jesse Gross
8d5ebd839b datapath: Genericize hash table.
Currently the flow hash table assumes that it is storing flows.
However, we will need additional types of hash tables in the
future so remove assumptions about flows and convert the datapath
to use the new table.
2010-04-19 09:11:57 -04:00
Jesse Gross
f2459fe7d9 datapath: Add generic virtual port layer.
Currently the datapath directly accesses devices through their
Linux functions.  Obviously this doesn't work for virtual devices
that are not backed by an actual Linux device.  This creates a
new virtual port layer which handles all interaction with devices.

The existing support for Linux devices was then implemented on top
of this layer as two device types.  It splits out and renames dp_dev
to internal_dev.  There were several places where datapath devices
had to handled in a special manner and this cleans that up by putting
all the special casing in a single location.
2010-04-19 09:11:57 -04:00
Jesse Gross
659586efcf tunneling: Add support for tunnel ID.
Add a tun_id field which contains the ID of the encapsulating tunnel
on which a packet was received (0 if not received on a tunnel).  Also
add an action which allows the tunnel ID to be set for outgoing
packets.  At this point there aren't any tunnel implementations so
these fields don't have any effect.

The matching is exposed to OpenFlow by overloading the high 32 bits
of the cookie as the tunnel ID.  ovs-ofctl is capable of turning
on this special behavior using a new "tun-cookie" command but this
command is intentially undocumented to avoid it being used without
a full understanding of the consequences.
2010-04-19 09:11:51 -04:00
Jesse Gross
3c5f6de385 datapath: Validate ToS when flow is added.
Check that the ToS is valid when the flow is added, not every time
it is used.
2010-03-15 15:44:41 -04:00
Jesse Gross
2de320799d datapath: Disable large receive offload.
LRO can play fast and loose with the packets that it merges, which
isn't very polite when you are bridging packets for other operating
systems.  This disables LRO on any underlying devices that are added
to the datapath, which is the same as what the bridge does.

Note that this does not disable GRO, which has a more strict set of
rules about what is merged and is therefore safe for bridging.  Both
are typically done in software anyways.
2010-03-05 16:31:26 -05:00
Jesse Gross
635c9298b9 datapath: Update hardware computed checksum on VLAN change.
The checksum computed by hardware on receive stored in skb->csum
when skb->ip_summed == CHECKSUM_COMPLETE is supposed to reflect
the contents of the packet starting at skb->data, which includes
the VLAN tag if there is one.  However, when we manipulate the
VLAN tag we don't update the checksum.  This leads to all kinds
of nasty warnings about broken hardware, not to mention we can't
take advantage of the checksum that was already computed.

This also fixes some issues with our private checksum type value
on some different kernels and after GSO.
2010-03-05 15:22:17 -05:00
Jesse Gross
a063b0dff0 datapath: Consistently maintain checksum offloading state.
When adding a VLAN tag it is necessary for us to setup checksum
pointers for offloaded packets manually.  However, this process
clobbers some of the fields that other components need to determine
the current status.  Here we mark the packet with its status upon
ingress in our own format that does not get clobbered and is
consistent across kernel versions.

Bug #2436
2010-02-28 15:45:00 -05:00
Justin Pettit
834377ea55 ofproto: Match on IP ToS/DSCP bits (OpenFlow 1.0)
OpenFlow 1.0 adds support for matching on IP ToS/DSCP bits.

NOTE: OVS at this point is not wire-compatible with OpenFlow 1.0 until
the final commit in this OpenFlow 1.0 set.
2010-02-20 02:22:28 -08:00
Justin Pettit
959a2ecdc8 ofproto: Match VLAN PCP and rewrite ToS bits (OpenFlow 0.9)
Starting in OpenFlow 0.9, it is possible to match on the VLAN PCP
(priority) field and rewrite the IP ToS/DSCP bits.  This check-in
provides that support and bumps the wire protocol number to 0x98.

NOTE: The wire changes come together over the set of OpenFlow 0.9 commits,
so OVS will not be OpenFlow-compatible with any official release between
this commit and the one that completes the set.
2010-02-20 02:22:26 -08:00
Ben Pfaff
c69ee87c10 Merge "master" into "next".
The main change here is the need to update all of the uses of UNUSED in
the next branch to OVS_UNUSED as it is now spelled on "master".
2010-02-11 11:11:23 -08:00
Ben Pfaff
35f7605b3f datapath: Mark functions "static".
Found by sparse (http://sparse.wiki.kernel.org/).
2010-02-10 16:54:48 -08:00
Ben Pfaff
2175540274 datapath: When adding a port, return the new port number to userspace.
'port' is a kernel-space copy of the odp_port and modifying it is useless.
'portp' is the userspace copy; modifying it is useful.

None of our current userspace users care about the port number and so we
never noticed.

Found by sparse (http://sparse.wiki.kernel.org/).
2010-02-10 16:54:48 -08:00
Jesse Gross
7dab847a19 Fix some regressions from the merge from master. 2010-02-08 13:31:33 -05:00
Justin Pettit
a4af00400a Merge branch 'master' into next
Conflicts:
	COPYING
	datapath/datapath.h
	lib/automake.mk
	lib/dpif-provider.h
	lib/dpif.c
	lib/hmap.h
	lib/netdev-provider.h
	lib/netdev.c
	lib/stream-ssl.h
	ofproto/executer.c
	ofproto/ofproto.c
	ofproto/ofproto.h
	tests/automake.mk
	utilities/ovs-ofctl.c
	utilities/ovs-vsctl.in
	vswitchd/ovs-vswitchd.conf.5.in
	xenserver/etc_init.d_vswitch
	xenserver/etc_xensource_scripts_vif
	xenserver/opt_xensource_libexec_interface-reconfigure
2010-02-05 17:14:55 -08:00
Jesse Gross
a778696338 datapath: Set datapath device MTU to minimum of MTU of ports.
The MTU of the local port should be no larger than the minimum of
the MTUs of the ports attached to the bridge, overwise packets may be
dropped.  We already prevent changes to the MTU that would violate
this constraint but don't actuallly proactively set the MTU.  This
changes makes everything consistent and matches the behavior of
the bridge.
2010-02-02 14:56:40 -05:00
Jesse Gross
8cdaca99b5 datapath: Fix compilation on newer old-style Xen kernels.
Some ports of Xen (such as Debian Lenny's) use the old style
Xen checksumming fields on newer kernels.  Normally the code that
deals with those fields isn't used at all on newer kernels.  This
updates the checksumming pointer code with some changes from Lenny
Xen since it is cleaner and works well with our existing compatibility
layer.

CC:pspreadborough@comcast.net
2010-01-26 18:09:39 -05:00
Jesse Gross
a605732386 datapath: Handle packets with precomputed checksums.
On older kernels (< 2.6.19) CHECKSUM_HW can mean either that the
checksum has already been computed by hardware or that the checksum
needs to be computed by hardware, depending on whether we are on
the transmit or receive path.  Unfortunately since we are in the
middle of these two paths it is impossible to tell which is the
case.  Code after us assumes that CHECKSUM_HW means that the
checksum needs to be computed and will panic if there already is
a checksum.  On these kernels we mark these packets as CHECKSUM_NONE
before handing them off.

Without this change using certain NICs will cause panics.
2010-01-26 17:17:21 -05:00
Ben Pfaff
49c36903d6 Merge "sflow" into "master".
No conflicts, but lib/dpif.c needed a few changes since struct dpif's
member "class" was renamed to "dpif_class" in master since sflow was
branched off.
2010-01-25 10:52:28 -08:00
Ben Pfaff
53d3bbbc09 datapath: Clean up vswitch_skb_checksum_setup().
vswitch_skb_checksum_setup() can be defined in datapath.h as a no-op
when defined(CONFIG_XEN) && defined(HAVE_PROTO_DATA_VALID) is false.

Also, skb_checksum_setup(), which was defined similarly, can be dropped
now, since it was unused.
2010-01-25 10:24:38 -08:00
Ben Pfaff
56fd8edf80 sflow: Fix sFlow sampling structure.
According to Neil McKee, in an email archived at
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-January/000934.html:

    The containment rule is that a given sflow-datasource (sampler or
    poller) should be scoped within only one sflow-agent (or
    sub-agent).  So the issue arrises when you have two
    switches/datapaths defined on the same host being managed with
    the same IP address: each switch is a separate sub-agent, so they
    can run independently (e.g. with their own sequence numbers) but
    they can't both claim to speak for the same sflow-datasource.
    Specifically, they can't both represent the <ifindex>:0
    data-source.  This containment rule is necessary so that the
    sFlow collector can scale and combine the results accurately.

    One option would be to stick with the <ifindex>:0 data-source but
    elevate it to be global across all bridges, with a global
    sample_pool and a global sflow_agent.  Not tempting.  Better to
    go the other way and allow each interface to have it's own
    sampler, just as it already has it's own poller.  The ifIndex
    numbers are globally unique across all switches/datapaths on the
    host, so the containment is now clean.  Datasource <ifindex>:5
    might be on one switch, whille <ifindex>:7 can be on another.
    Other benefits are that 1) you can support the option of
    overriding the default sampling-rate on an interface-by-interface
    basis, and 2) this is how most sFlow implementations are coded,
    so there will be no surprises or interoperability issues with any
    sFlow collectors out there.

This commit implements the approach suggested by Neil.

This commit uses an atomic_t to represent the sampling pool.  This is
because we do want access to it to be atomic, but we expect that it will
"mostly" be accessed from a single CPU at a time.  Perhaps this is a bad
assumption; we can always switch to another form of synchronization later.

CC: Neil McKee <neil.mckee@inmon.com>
2010-01-20 14:33:28 -08:00
Ben Pfaff
72b0630028 Initial implementation of sFlow.
Tested very slightly with "ping" and "sflowtool -t | tcpdump -r -".
2010-01-04 13:08:37 -08:00
Ian Campbell
f7fed00035 datapath: Use HAVE_PROTO_DATA_VALID when defining vswitch_skb_checksum_setup
The purpose of the non-empty variant of vswitch_skb_checksum_setup is to
synchronise the proto_data_valid and proto_csum_blank fields into the
standard skb csum/ip_summed fields, therefore it is more correct to key
off of HAVE_PROTO_DATA_VALID.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
2009-11-19 10:25:57 -08:00
Justin Pettit
de3f65ea52 datapath: Cleanup tab/space issues in datapath 2009-11-16 18:48:29 -08:00
Jesse Gross
d65349ea28 Merge citrix branch into master. 2009-11-10 15:12:01 -08:00