openvswitch

mirror of https://github.com/openvswitch/ovs synced 2025-10-15 14:17:18 +00:00

Author	SHA1	Message	Date
Ben Pfaff	b0c32774c9	datapath: Improve comments.	2009-08-18 12:36:46 -07:00
Ben Pfaff	0515ceb3e8	datapath: Update sysfs links when network devices are renamed. We create symlinks from /sys/class/net/<bridgename>/brif/<devname> to /sys/class/net/<devname>/brport, but until now we have never updated the links when network devices are renamed. This commit fixes this problem. (Only the <devname> in /sys/class/net/<bridgename>/brif/<devname> needs to be updated. Symlinks within sysfs have stable targets; that is, no matter how the object that a sysfs symlink points to moves around, the link is still maintained correctly.)	2009-08-06 16:57:06 -07:00
Ben Pfaff	58c342f617	datapath: Fix OOPS when dp_sysfs_add_if() fails. Until now, when dp_sysfs_add_if() failed, the caller ignored the failure. This is a minor problem, because everything else should continue working, without sysfs entries for the interface, in theory anyhow. In actual practice, the error exit path of dp_sysfs_add_if() does a kobject_put(), and that kobject_put() calls release_nbp(), so that the new port gets freed. The next reference to the new port (usually in an ovs-vswitchd call to the ODP_PORT_LIST ioctl) will then use the freed data and probably OOPS. The fix is to make the datapath code, as opposed to the sysfs code, responsible for creating and destroying the net_bridge_port kobject. The dp_sysfs_{add,del}_if() functions then just attach and detach the kobject to sysfs and their cleanup routines no longer need to destroy the kobject and indeed we don't care whether dp_sysfs_add_if() really succeeds. This commit also makes the same transformation to the datapath's ifobj, for consistency. It is easy to trigger the OOPS fixed by this commit by adding a network device A to a datapath, then renaming network device A to B, then renaming network device C to A, then adding A to the datapath. The last attempt to add A will fail because a file named /sys/class/net/<datapath>/brif/A already exists from the time that C was added to the datapath under the name A. This commit also adds some compatibility infrastructure, because it moves code out of #ifdef SUPPORT_SYSFS and it otherwise wouldn't build.	2009-08-06 16:57:06 -07:00
Ben Pfaff	2ba9026e2f	datapath: Rename brc_sysfs_* to dp_sysfs_*. These files and names are now part of the datapath, not brcompat, so name them appropriately so as not to confuse anyone.	2009-08-06 16:57:06 -07:00
Ben Pfaff	2e7dd8eca8	datapath: Move sysfs support from brcompat_mod into openvswitch_mod. In the past problems have arisen due to the different ways that datapaths are created and destroyed in the three different cases: 1. sysfs supported, brcompat_mod loaded. 2. sysfs supported, brcompat_mod not loaded. 3. sysfs not supported. The brcompat_mod loaded versus not loaded distinction is the stickiest because we have to do all the calls into brcompat_mod through hook functions, which in turn causes pressure to keep the number of hook functions small and well-defined, which makes it really difficult to put the hook call points at exactly the right place. Witness, for example, this piece of code in datapath.c: int dp_del_port(struct net_bridge_port *p) { ASSERT_RTNL(); #ifdef SUPPORT_SYSFS if (p->port_no != ODPP_LOCAL && dp_del_if_hook) sysfs_remove_link(&p->dp->ifobj, p->dev->name); #endif The code inside the #ifdef is logically part of the brcompat_mod sysfs support, but the author of this code (quite reasonably) didn't want to add a hook function call there. After all, what would you call the hook function? There's no obvious name from the dp_del_port() caller's perspective. All this argues that sysfs support should be in openvswitch_mod itself, since it has to be tightly integrated, not bolted on. So this commit moves it there. Now, this is not to say that openvswitch_mod should actually be implementing bridge-compatible sysfs. In the future, it probably should not be; rather, it should implement something appropriate for Open vSwitch datapaths instead. But right now we have bridge-compatible sysfs, and so that's what this commit moves.	2009-08-06 16:57:06 -07:00
Justin Pettit	1dcf111b1c	datapath: Support jumbo frames in the datapath device The datapath has no problems switching jumbo frames (frames with a payload greater than 1500 bytes), but it has not supported sending and receiving them to the device itself. With this commit, the MTU can be set as large as the minimum MTU size of the devices that are directly attached, or 1500 bytes if there are none. This mimics the behavior of the Linux bridge. Feature #1736	2009-08-03 13:29:22 -07:00
Ben Pfaff	3b01baa397	Merge citrix branch into master.	2009-07-16 11:54:37 -07:00
Ben Pfaff	923229363a	datapath: Don't orphan packets in dp_dev transmit path. Before commit `72ca14c1` "datapath: Fix race against workqueue in dp_dev and simplify code," the dp_dev network device had a device queue, and we would orphan packets before sticking them on the queue. This screwed up socket accounting a bit, but the effect was limited to the device queue length. Now, after that commit, the dp_dev device has no device queue, but it still orphans packets. This screws up socket accounting a lot, because the effect is now unlimited, since there is no queue to limit it. The solution is to not orphan packets at all. There is little need for it now since packet transmission now happens immediately, not in a workqueue whose execution may be delayed. This should fix bug #1519, which tests "netperf -t UDP_STREAM" performance, finding that an unrealistically high number of UDP packets could be sent but that none at all were received. The send rate is due to the orphaning, the receive rate presumably because at least one out of approx. 65535/1500 = 44 fragments per full packet were dropped in each case.	2009-07-13 15:50:32 -07:00
Ben Pfaff	828bc1f072	datapath: Fix race in datapath creation. Before we create the local port, we should allocate and assign the table. Otherwise packets sent on the local port before we do so will cause an OOPS. This is a theoretical race that has not been observed in practice.	2009-07-08 14:13:15 -07:00
Ben Pfaff	72ca14c154	datapath: Fix race against workqueue in dp_dev and simplify code. The dp_dev_destroy() function failed to cancel the xmit_queue work, which allowed it to run after the device had been destroyed, accessing freed memory. However, simply canceling the work with cancel_work_sync() would be insufficient, since other packets could get queued while the work function was running. Stopping the queue with netif_tx_disable() doesn't help, because the final action in dp_dev_do_xmit() is to re-enable the TX queue. This issue led me to re-examine why the dp_dev needs to use a work_struct at all. This was implemented in commit `71f13ed0b` "Send of0 packets from workqueue, to avoid recursive locking of ofN device" due to a complaint from lockdep about recursive locking. However, there's no actual reason that we need any locking around dp_dev_xmit(). Until now, it has accepted the standard locking provided by the network stack. But looking at the other software devices (veth, loopback), those use NETIF_F_LLTX, which disables this locking, and presumably do so for this very reason. In fact, the lwn article at http://lwn.net/Articles/121566/ hints that NETIF_F_LLTX, which is otherwise discouraged in the kernel, is acceptable for "certain types of software device." So this commit switches to using NETIF_F_LLTX for dp_dev and gets rid of the work_struct. In the process, I noticed that veth and loopback also take advantage of a network device destruction "hook" using the net_device "destructor" member. Using this we can automatically get called on network device destruction at the point where rtnl_unlock() is called. This allows us to stop stringing the dp_devs that are being destroyed onto a list so that we can free them, and thus simplifies the code along all the paths that call dp_dev_destroy(). This commit gets rid of a call to synchronize_rcu() (disguised as a call to synchronize_net(), which is a macro that expands to synchronize_rcu()), so it probably speeds up deleting ports, too.	2009-07-08 14:13:15 -07:00
Ben Pfaff	6fba0d0b82	datapath: Fix use-after-free error in datapath destruction. When we create a datapath we do this: 1. Create local port. 2. Call add_dp hook. 3. Allow userspace to add more ports. When we deleted a datapath we were doing this: 1. Call del_dp hook 2. Delete all the ports. Unfortunately step 1 destroys dp->ifobj, then dp_del_port on any port other than the local port in step 2 tries to reference dp->ifobj through a call to sysfs_remove_link(). This commit fixes the problem by changing datapath deletion to mirror creation: 1. Delete all the ports but the local port. 2. Call dp_del hook. 3. Delete local port. Commit 010082639 "datapath: Add sysfs support for all (otherwise supported) Linux versions" makes this problem obvious on a 2.6.25+ kernel configured with slab debugging, because on such kernels the ifobj is a pointer to a slab object that is freed by the del_dp hook function (when brcompat_mod is loaded). This bug may be just as present on older kernels, but there the ifobj is part of struct datapath, not a pointer, and thus it is much harder to trigger. Bug #1465.	2009-07-08 14:13:15 -07:00
Ben Pfaff	334b374988	datapath: Remove redundant synchronize_rcu() call. There is no benefit to synchronizing twice, and it might cost us a lot of time.	2009-07-08 14:13:15 -07:00
Ben Pfaff	f4ba4c4f95	datapath: Change ODP_PORT_LIST semantics. Until now, ODP_PORT_LIST has reported the number of ports actually copied out. It's better for the caller, however, if it reports the number of ports that were available to be copied out.	2009-07-06 09:07:24 -07:00
Ben Pfaff	e86c8696eb	datapath: Make openvswitch_ioctl() have a single point of exit. This makes it easier to insert debug printk() calls in a single place if necessary, and conforms at least as well with Linux kernel style.	2009-07-06 09:07:24 -07:00
Ben Pfaff	330a8abb28	datapath: Fix ODP_PORT_DEL handling of bad user memory read.	2009-07-06 09:07:24 -07:00
Ben Pfaff	1619019101	datapath: Style fix.	2009-07-06 09:07:24 -07:00
Ben Pfaff	f1aa2072c8	datapath: Get rid of query operations for single flows.	2009-07-06 09:07:24 -07:00
Ben Pfaff	9ee3ae3e0d	datapath: Make the datapath responsible for choosing port numbers. Soon we will allow for multiple datapath implementations. By allowing the datapath to choose the port numbers, we possibly simplify some datapath implementations, and the datapath's clients don't have to guess (or to check) what port numbers are free, so this seems like a better way to go.	2009-07-06 09:07:24 -07:00
Ben Pfaff	2f8c6cfb50	datapath: Remove unnecessary range check from put_actions(). This code checked that the number of actions in a flow query did not exceed the maximum number of actions that are allowed on a given flow. But this check is unnecessary, since the code will never copy out any more actions than actually exist in a flow. It would be a shame to refuse a flow query simply on the basis that the caller allocated more memory than necessary, so eliminate the check.	2009-07-06 09:07:24 -07:00
Ben Pfaff	7793a4aaeb	datapath: Fix use-after-free error in datapath destruction. When we create a datapath we do this: 1. Create local port. 2. Call add_dp hook. 3. Allow userspace to add more ports. When we deleted a datapath we were doing this: 1. Call del_dp hook 2. Delete all the ports. Unfortunately step 1 destroys dp->ifobj, then dp_del_port on any port other than the local port in step 2 tries to reference dp->ifobj through a call to sysfs_remove_link(). This commit fixes the problem by changing datapath deletion to mirror creation: 1. Delete all the ports but the local port. 2. Call dp_del hook. 3. Delete local port. Commit 010082639 "datapath: Add sysfs support for all (otherwise supported) Linux versions" makes this problem obvious on a 2.6.25+ kernel configured with slab debugging, because on such kernels the ifobj is a pointer to a slab object that is freed by the del_dp hook function (when brcompat_mod is loaded). This bug may be just as present on older kernels, but there the ifobj is part of struct datapath, not a pointer, and thus it is much harder to trigger. Bug #1465.	2009-06-26 14:15:46 -07:00
Ben Pfaff	312b427dab	datapath: Remove redundant synchronize_rcu() call. There is no benefit to synchronizing twice, and it might cost us a lot of time.	2009-06-26 12:20:19 -07:00
Ben Pfaff	cfe7c1f5e8	datapath: Ignore return value from rtnl_notify(). In Linux 2.6.30, the rtnl_notify() return type was changed from int to void along with the following commit message: This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Thus there's no point in checking the return value, even in older versions of the kernel, and so this commit changes our code to ignore it, even on older kernel versions. We also update the rtnl_notify() wrapper macros to make the return type void on older kernel versions. This has not been tested, just built. Thanks to Mikio for spurring me to try building with Linux 2.6.29 and 2.6.30.	2009-06-24 14:58:57 -07:00
Ben Pfaff	34e63086ed	Merge changes from citrix branch into master.	2009-06-15 16:04:54 -07:00
Ben Pfaff	a14bc59fb8	Update primary code license to Apache 2.0.	2009-06-15 15:11:30 -07:00
Ben Pfaff	806e39cfdf	datapath: Add sysfs support for all (otherwise supported) Linux versions. This turned out to be less trouble than I expected. This builds successfully against 2.6.18 through 2.6.28. Justin has lightly tested it on a 2.6.27 kernel provided by Citrix.	2009-06-12 16:45:01 -07:00
Justin Pettit	b4ebfbeaf1	xenserver: xen 2.6.27 kernel doesn't need skb_checksum_setup defined The latest XenServer release is based on 2.6.27. The datapath code defined "skb_checksum_setup", since it wasn't exported in their 2.6.18 kernels. This change causes it only to be built if the kernel is version 2.6.18.	2009-06-12 15:39:59 -07:00
Ben Pfaff	064af42167	Import from old repository commit 61ef2b42a9c4ba8e1600f15bb0236765edc2ad45.	2009-07-08 13:19:16 -07:00

... 5 6 7 8 9

427 Commits