openvswitch

mirror of https://github.com/openvswitch/ovs synced 2025-10-07 13:40:45 +00:00

Author	SHA1	Message	Date
Jesse Gross	8819fac72b	datapath: Don't expect bottom-halves to be disabled. We currently document that BHs need to be disabled when handling received packets. However, this isn't actually generally the case (usually preemption is disabled but not BHs). Only one place actually relies on BHs being disabled so fix that and update the documentation of our expectations.	2010-05-14 15:10:45 -07:00
Jesse Gross	1c075d0aff	datapath: Disable bottom-halves where necessary. Places that update per-cpu stats without locking need to have bottom halves disabled. Otherwise we can be running in process context and in the middle of an operation and be interrupted by a softirq.	2010-05-14 15:10:45 -07:00
Jesse Gross	1d7241c729	datapath: Use spin_lock_bh() consistently. We are never called in hardirq context - only process or softirq. Therefore it is not necessary to disable interrupts with spin_lock_irqsave(), so use spin_lock_bh() everywhere.	2010-05-14 15:10:44 -07:00
Ben Pfaff	3fbd517acf	datapath: Add 32-bit compatibility ioctls. When a 32-bit userspace program runs on a 64-bit kernel, data structures that contain members whose sizes or alignments change from 32- to 64-bit must be translated when they are passed to ioctls. This commit adds such support for openvswitch_mod. We should really reconsider some parts of the Open vSwitch ioctl interface to avoid needing as much translation as we do. Lightly tested with 32-bit userspace on sparc64.	2010-05-13 15:29:51 -07:00
Ben Pfaff	776f10ce0f	datapath: Avoid __copy_to/from_user(), __get/put_user() functions. The advantages of the double-underscore variants of copy_to_user(), copy_from_user(), get_user(), and put_user() are pretty marginal, at best, in the places where we are using them, and it's not always obvious that we are making the right calls to access_ok() beforehand. So switch to the safe variants without double underscores. Suggested-by: Jesse Gross <jesse@nicira.com>	2010-05-13 15:29:50 -07:00
Ben Pfaff	44e05ecafe	datapath: Prepare to support 32-bit compatibility ioctls. This commit prepares the core of datapath.c and vport.c to reduce the amount of new code duplication when the following commit adds support for 32-bit compatibility ioctls. It breaks a number of functions apart into pairs of functions: one that copies data to and from userspace and another that does the real work. This change is a pure refactoring that should not change behavior.	2010-05-13 15:29:49 -07:00
Ben Pfaff	6d7568dc38	datapath: Avoid possibility of negative 'n_flows' in struct odp_flowvec. do_flowvec_ioctl() was checking for too-big 'n_flows' but not negative 'n_flows'. We could add that check too, but 'n_flows' should never be negative so it's better to just use an unsigned type.	2010-05-13 15:29:46 -07:00
Jesse Gross	f4267e344a	datapath: Enable offloading on internal devices. Enables checksum offloading, scatter/gather, and TSO on internal devices. While these optimizations were not previously enabled on internal ports we already could receive these types of packets from Xen guests. This has the obvious performance benefits when these packets can be passed directly to hardware. There is also a more subtle benefit for GRE on Xen. GRE packets pass through OVS twice - once before encapsulation and once after encapsulation, moving through an internal device in the process. If it is a SG packet (as is common on Xen), a copy was necessary to linearize for the internal device. However, Xen uses the memory allocator to track packets so when the original packet is freed after the copy netback notifies the guest that the packet has been sent, despite the fact that it is actually sitting in the transmit queue. The guest then sends packets as fast as the CPU can handle, overflowing the transmit queue. By enabling SG on the internal device, we avoid the copy and keep the accounting correct. In certain circumstances this patch can decrease performance for TCP. TCP has its own mechanism for tracking in-flight packets and therefore does not benefit from the corrected socket accounting. However, certain NICs do not like SG when it is not being used for TSO (these packets can no longer be handled by TSO after GRE encapsulation). These NICs presumably enable SG even though they can't handle it well because TSO requires SG. Tested controllers (all 1G): Marvell 88E8053 (large performance hit) Broadcom BCM5721 (small performance hit) Intel 82571EB (no change)	2010-05-07 11:29:11 -07:00
Jesse Gross	d8b5d43a04	datapath: Don't hold dp_mutex when setting internal devs MTU. We currently acquire dp_mutex when we are notified that the MTU of a device attached to the datapath has changed so that we can set the internal devices to the minimum MTU. However, it is not required to hold dp_mutex because we already have RTNL lock and it causes a deadlock, so don't do it. Specifically, the issue is that DP mutex is acquired twice: once in dp_device_event() before calling set_internal_devs_mtu() and then again in internal_dev_change_mtu() when it is actually being changed (since the MTU can also be set directly). Since it's not a recursive mutex, deadlock.	2010-04-30 17:23:44 -07:00
Jesse Gross	e1c1de3922	datapath: Ensure packet length matches headers during checksum setup. During the setup of checksumming pointers we need to make sure that the transport headers are in the skb linear data area. However, we don't currently verify that the lengths in the packet headers are within the size of the packet. This makes that check before a BUG() check does it for us. CC: "Nick Couchman" <Nick.Couchman@seakr.com>	2010-04-29 18:14:03 -07:00
Ben Pfaff	968f7c8d77	datapath: Check device name length more carefully in create_dp(). I don't see any value in silently truncating device names. Doing so will sow confusion in userspace. This commit makes too-long device names return ENAMETOOLONG.	2010-04-27 10:45:28 -07:00
Ben Pfaff	092a872d6e	datapath: Always null-terminate network device name in create_dp(). strncpy() does not null-terminate its output buffer if the source string's length is at least as large as its 'count' argument. We know that the source and destination buffers are the same size and that the source buffer is null-terminated, so just use strcpy(). This fixes a kernel BUG message that often occurred when strlen(devname) was exactly IFNAMSIZ-1. In such a case, if internal_dev_port.devname[IFNAMSIZ-1] happened to be nonzero, it would eventually fail the following check in alloc_netdev_mq(): BUG_ON(strlen(name) >= sizeof(dev->name)); Bug #2722.	2010-04-27 10:43:24 -07:00
Ben Pfaff	1224e8fc1d	datapath: Fix argument to strncpy_from_user(). The strncpy_from_user() function's 'count' argument is documented to include the trailing null byte, but create_dp() did not include it. This commit adds it in.	2010-04-27 10:21:12 -07:00
Jesse Gross	2d2a0e309f	datapath: Add skb_csum_help compatibility function. Later kernel versions remove the direction argument from skb_checksum_help. This provides a compatibility function so we can have consistent syntax across versions. Since CHECKSUM_PARTIAL is the same as CHECKSUM_HW on older kernels this allows a unified code path for computing checksums.	2010-04-19 09:11:57 -04:00
Jesse Gross	8d5ebd839b	datapath: Genericize hash table. Currently the flow hash table assumes that it is storing flows. However, we will need additional types of hash tables in the future so remove assumptions about flows and convert the datapath to use the new table.	2010-04-19 09:11:57 -04:00
Jesse Gross	f2459fe7d9	datapath: Add generic virtual port layer. Currently the datapath directly accesses devices through their Linux functions. Obviously this doesn't work for virtual devices that are not backed by an actual Linux device. This creates a new virtual port layer which handles all interaction with devices. The existing support for Linux devices was then implemented on top of this layer as two device types. It splits out and renames dp_dev to internal_dev. There were several places where datapath devices had to handled in a special manner and this cleans that up by putting all the special casing in a single location.	2010-04-19 09:11:57 -04:00
Jesse Gross	659586efcf	tunneling: Add support for tunnel ID. Add a tun_id field which contains the ID of the encapsulating tunnel on which a packet was received (0 if not received on a tunnel). Also add an action which allows the tunnel ID to be set for outgoing packets. At this point there aren't any tunnel implementations so these fields don't have any effect. The matching is exposed to OpenFlow by overloading the high 32 bits of the cookie as the tunnel ID. ovs-ofctl is capable of turning on this special behavior using a new "tun-cookie" command but this command is intentially undocumented to avoid it being used without a full understanding of the consequences.	2010-04-19 09:11:51 -04:00
Jesse Gross	3c5f6de385	datapath: Validate ToS when flow is added. Check that the ToS is valid when the flow is added, not every time it is used.	2010-03-15 15:44:41 -04:00
Jesse Gross	2de320799d	datapath: Disable large receive offload. LRO can play fast and loose with the packets that it merges, which isn't very polite when you are bridging packets for other operating systems. This disables LRO on any underlying devices that are added to the datapath, which is the same as what the bridge does. Note that this does not disable GRO, which has a more strict set of rules about what is merged and is therefore safe for bridging. Both are typically done in software anyways.	2010-03-05 16:31:26 -05:00
Jesse Gross	635c9298b9	datapath: Update hardware computed checksum on VLAN change. The checksum computed by hardware on receive stored in skb->csum when skb->ip_summed == CHECKSUM_COMPLETE is supposed to reflect the contents of the packet starting at skb->data, which includes the VLAN tag if there is one. However, when we manipulate the VLAN tag we don't update the checksum. This leads to all kinds of nasty warnings about broken hardware, not to mention we can't take advantage of the checksum that was already computed. This also fixes some issues with our private checksum type value on some different kernels and after GSO.	2010-03-05 15:22:17 -05:00
Jesse Gross	a063b0dff0	datapath: Consistently maintain checksum offloading state. When adding a VLAN tag it is necessary for us to setup checksum pointers for offloaded packets manually. However, this process clobbers some of the fields that other components need to determine the current status. Here we mark the packet with its status upon ingress in our own format that does not get clobbered and is consistent across kernel versions. Bug #2436	2010-02-28 15:45:00 -05:00
Justin Pettit	834377ea55	ofproto: Match on IP ToS/DSCP bits (OpenFlow 1.0) OpenFlow 1.0 adds support for matching on IP ToS/DSCP bits. NOTE: OVS at this point is not wire-compatible with OpenFlow 1.0 until the final commit in this OpenFlow 1.0 set.	2010-02-20 02:22:28 -08:00
Justin Pettit	959a2ecdc8	ofproto: Match VLAN PCP and rewrite ToS bits (OpenFlow 0.9) Starting in OpenFlow 0.9, it is possible to match on the VLAN PCP (priority) field and rewrite the IP ToS/DSCP bits. This check-in provides that support and bumps the wire protocol number to 0x98. NOTE: The wire changes come together over the set of OpenFlow 0.9 commits, so OVS will not be OpenFlow-compatible with any official release between this commit and the one that completes the set.	2010-02-20 02:22:26 -08:00
Ben Pfaff	c69ee87c10	Merge "master" into "next". The main change here is the need to update all of the uses of UNUSED in the next branch to OVS_UNUSED as it is now spelled on "master".	2010-02-11 11:11:23 -08:00
Ben Pfaff	35f7605b3f	datapath: Mark functions "static". Found by sparse (http://sparse.wiki.kernel.org/).	2010-02-10 16:54:48 -08:00
Ben Pfaff	2175540274	datapath: When adding a port, return the new port number to userspace. 'port' is a kernel-space copy of the odp_port and modifying it is useless. 'portp' is the userspace copy; modifying it is useful. None of our current userspace users care about the port number and so we never noticed. Found by sparse (http://sparse.wiki.kernel.org/).	2010-02-10 16:54:48 -08:00
Jesse Gross	7dab847a19	Fix some regressions from the merge from master.	2010-02-08 13:31:33 -05:00
Justin Pettit	a4af00400a	Merge branch 'master' into next Conflicts: COPYING datapath/datapath.h lib/automake.mk lib/dpif-provider.h lib/dpif.c lib/hmap.h lib/netdev-provider.h lib/netdev.c lib/stream-ssl.h ofproto/executer.c ofproto/ofproto.c ofproto/ofproto.h tests/automake.mk utilities/ovs-ofctl.c utilities/ovs-vsctl.in vswitchd/ovs-vswitchd.conf.5.in xenserver/etc_init.d_vswitch xenserver/etc_xensource_scripts_vif xenserver/opt_xensource_libexec_interface-reconfigure	2010-02-05 17:14:55 -08:00
Jesse Gross	a778696338	datapath: Set datapath device MTU to minimum of MTU of ports. The MTU of the local port should be no larger than the minimum of the MTUs of the ports attached to the bridge, overwise packets may be dropped. We already prevent changes to the MTU that would violate this constraint but don't actuallly proactively set the MTU. This changes makes everything consistent and matches the behavior of the bridge.	2010-02-02 14:56:40 -05:00
Jesse Gross	8cdaca99b5	datapath: Fix compilation on newer old-style Xen kernels. Some ports of Xen (such as Debian Lenny's) use the old style Xen checksumming fields on newer kernels. Normally the code that deals with those fields isn't used at all on newer kernels. This updates the checksumming pointer code with some changes from Lenny Xen since it is cleaner and works well with our existing compatibility layer. CC:pspreadborough@comcast.net	2010-01-26 18:09:39 -05:00
Jesse Gross	a605732386	datapath: Handle packets with precomputed checksums. On older kernels (< 2.6.19) CHECKSUM_HW can mean either that the checksum has already been computed by hardware or that the checksum needs to be computed by hardware, depending on whether we are on the transmit or receive path. Unfortunately since we are in the middle of these two paths it is impossible to tell which is the case. Code after us assumes that CHECKSUM_HW means that the checksum needs to be computed and will panic if there already is a checksum. On these kernels we mark these packets as CHECKSUM_NONE before handing them off. Without this change using certain NICs will cause panics.	2010-01-26 17:17:21 -05:00
Ben Pfaff	49c36903d6	Merge "sflow" into "master". No conflicts, but lib/dpif.c needed a few changes since struct dpif's member "class" was renamed to "dpif_class" in master since sflow was branched off.	2010-01-25 10:52:28 -08:00
Ben Pfaff	53d3bbbc09	datapath: Clean up vswitch_skb_checksum_setup(). vswitch_skb_checksum_setup() can be defined in datapath.h as a no-op when defined(CONFIG_XEN) && defined(HAVE_PROTO_DATA_VALID) is false. Also, skb_checksum_setup(), which was defined similarly, can be dropped now, since it was unused.	2010-01-25 10:24:38 -08:00
Ben Pfaff	56fd8edf80	sflow: Fix sFlow sampling structure. According to Neil McKee, in an email archived at http://openvswitch.org/pipermail/dev_openvswitch.org/2010-January/000934.html: The containment rule is that a given sflow-datasource (sampler or poller) should be scoped within only one sflow-agent (or sub-agent). So the issue arrises when you have two switches/datapaths defined on the same host being managed with the same IP address: each switch is a separate sub-agent, so they can run independently (e.g. with their own sequence numbers) but they can't both claim to speak for the same sflow-datasource. Specifically, they can't both represent the <ifindex>:0 data-source. This containment rule is necessary so that the sFlow collector can scale and combine the results accurately. One option would be to stick with the <ifindex>:0 data-source but elevate it to be global across all bridges, with a global sample_pool and a global sflow_agent. Not tempting. Better to go the other way and allow each interface to have it's own sampler, just as it already has it's own poller. The ifIndex numbers are globally unique across all switches/datapaths on the host, so the containment is now clean. Datasource <ifindex>:5 might be on one switch, whille <ifindex>:7 can be on another. Other benefits are that 1) you can support the option of overriding the default sampling-rate on an interface-by-interface basis, and 2) this is how most sFlow implementations are coded, so there will be no surprises or interoperability issues with any sFlow collectors out there. This commit implements the approach suggested by Neil. This commit uses an atomic_t to represent the sampling pool. This is because we do want access to it to be atomic, but we expect that it will "mostly" be accessed from a single CPU at a time. Perhaps this is a bad assumption; we can always switch to another form of synchronization later. CC: Neil McKee <neil.mckee@inmon.com>	2010-01-20 14:33:28 -08:00
Ben Pfaff	72b0630028	Initial implementation of sFlow. Tested very slightly with "ping" and "sflowtool -t \| tcpdump -r -".	2010-01-04 13:08:37 -08:00
Ian Campbell	f7fed00035	datapath: Use HAVE_PROTO_DATA_VALID when defining vswitch_skb_checksum_setup The purpose of the non-empty variant of vswitch_skb_checksum_setup is to synchronise the proto_data_valid and proto_csum_blank fields into the standard skb csum/ip_summed fields, therefore it is more correct to key off of HAVE_PROTO_DATA_VALID. Signed-off-by: Ian Campbell <ian.campbell@citrix.com>	2009-11-19 10:25:57 -08:00
Justin Pettit	de3f65ea52	datapath: Cleanup tab/space issues in datapath	2009-11-16 18:48:29 -08:00
Jesse Gross	d65349ea28	Merge citrix branch into master.	2009-11-10 15:12:01 -08:00
Jesse Gross	18fdbe16de	datapath: Allow TCP flags to be cleared. When querying flow stats allow the TCP flags to be reset. Since the datapath ORs together all flags that have previously been seen it is otherwise impossible to determine the set of flags from after a particular time.	2009-11-06 14:05:14 -08:00
Jesse Gross	6d91c2fb6c	datapath: Allow TCP flags to be cleared. When querying flow stats allow the TCP flags to be reset. Since the datapath ORs together all flags that have previously been seen it is otherwise impossible to determine the set of flags from after a particular time.	2009-11-02 16:12:21 -08:00
Ben Pfaff	d6fbec6de0	Spell verb form of "set up" correctly throughout the tree.	2009-10-26 14:41:32 -07:00
Ben Pfaff	3f355f47f8	Merge "citrix" into "master". This merge took a little bit of care due to two issues: - Crossport of "interface-reconfigure" fixes from master back to citrix that had happened and needed to be canceled out of the merge. - New script "refresh-xs-network-uuids" added on citrix branch that needed to be moved from /root/vswitch/scripts to /usr/share/vswitch/scripts.	2009-10-22 17:43:28 -07:00
Ben Pfaff	5562d3ccd2	datapath: Ignore return value from rtnl_notify(). In Linux 2.6.30, the rtnl_notify() return type was changed from int to void along with the following commit message: This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Thus there's no point in checking the return value, even in older versions of the kernel, and so this commit changes our code to ignore it, even on older kernel versions. We also update the rtnl_notify() wrapper macros to make the return type void on older kernel versions. This has not been tested, just built. Thanks to Mikio for spurring me to try building with Linux 2.6.29 and 2.6.30.	2009-10-15 10:24:36 -07:00
Ben Pfaff	1b378b99f6	datapath: Fix warning on 64-bit builds.	2009-10-15 10:24:36 -07:00
Ben Pfaff	7c40efc9d3	datapath: Factor out code for getting and setting listen mask. This fixes GCC warnings on 64-bit architectures caused by storing an "int" in the "void *" f->private_data field.	2009-10-15 10:24:36 -07:00
Jean Tourrilhes	a4fbb689b0	datapath: Fix validation of ODPAT_SET_VLAN_PCP actions. The VLAN PCP mask is in the rightmost bits of the vlan_pcp member but we were checking for it in its position in the VLAN tag field instead. Slightly modified from Jean's original patch by adding and using the VLAN_PCP_SHIFT macro.	2009-10-08 10:41:48 -07:00
Ben Pfaff	22d24ebf66	datapath: Fix mutual exclusion with bridge on Linux 2.6.27+. Linux 2.6.27 introduces a new mechanism for sharing STP packets among kernel modules, which means that the code in datapath.c to avoid loading when the Linux bridging module is also loaded has false positives. So fall back on these newer kernels to a less reliable way of avoiding the bridge module, but one that does not have false positives. CC: Jean Tourrihles <jt@hpl.hp.com>	2009-09-15 15:51:01 -07:00
Ben Pfaff	cb5087cadd	datapath: Fix WARN_ON sending GSO packets to userspace in Linux 2.6.22+. Until now, when dp_output_control() queued a GSO packet to userspace, it would first compute the checksum for the whole GSO packet, then break the packet into segments. However this had two drawbacks: 1. The checksum had to be recomputed for each segment, wasting time. 2. Linux 2.6.22 and later would emit a warning in skb_gso_segment() because the checksum was precomputed. This commit changes dp_output_control() to instead break the packet into segments, then compute the checksum across each of the segments individually. This fixes both drawbacks. This commit has seen light testing on Xen's 2.6.27. It has been build tested on a few different kernel versions.	2009-09-14 12:20:00 -07:00
Ben Pfaff	0d3b8a34d6	datapath: Fix comments.	2009-09-14 12:20:00 -07:00
Justin Pettit	8398cf7efc	Merge commit 'origin/citrix'	2009-09-04 22:23:29 -07:00

1 2

87 Commits