openvswitch

mirror of https://github.com/openvswitch/ovs synced 2025-10-23 14:57:06 +00:00

Author	SHA1	Message	Date
Jesse Gross	1452b28ca6	datapath: Add get_vport_protected(). In some places we access the array of datapath ports without RCU protection. This introduces a new function to check that in these case the dp mutex is held for protection. Found with sparse. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-29 10:30:09 -08:00
Jesse Gross	6f20002c12	datapath: Use get_table_protected() in additional places. There are several places where the flow table is accessed without any kind of RCU protection. This is fine because dp mutex is held so this adds checks for that condition. Found with sparse. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-29 10:29:48 -08:00
Justin Pettit	dd851cbbcc	datapath: Return vport configuration when queried. Additional configuration is passed down to the kernel in the "config" array of an odp_port when a vport is created. This information is not returned when a vport is queried, though. This information is useful for debugging, since it may be used to distinguish ports based on additional data, such as the peer in tunnels. In a forthcoming patch, it will be essential to distinguish between plain GRE and GRE over IPsec. Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-28 14:30:35 -08:00
Ben Pfaff	c6fadeb1f0	datapath: Clarify meaning of n_buckets argument to tbl_create(). The n_buckets argument to tbl_create() can be zero, but the comment didn't mention that. However, there's no reason that the caller can't just pass in a correct size, so this commit changes them to do that. Also, TBL_L1_SIZE was conceptually wrong as the minimum size: the minimum size is one L2 page, e.g. TBL_L2_SIZE. But TBL_MIN_BUCKETS seems like a better all-around way to indicate the minimum size, so this commit also introduces that macro and uses it. Jesse Gross pointed out inconsistencies in this area. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-28 09:17:12 -08:00
Ben Pfaff	6c2297372b	datapath: Suppress sparse complaints about address spaces. The sparse checker was complaining about incorrect address spaces (e.g. __user versus non-__user pointers). I looked at each of them, checked that the code looked correct to me, and added the appropriate __force annotations to casts. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-28 09:16:40 -08:00
Ben Pfaff	58f8f0e703	datapath: Avoid calling flow_hash() twice for the same key. This is a small optimization for the case where a new flow is being added to the flow table. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-17 14:58:52 -08:00
Jesse Gross	e0ce13c4dc	datapath: Harmonize [get\|set]_skb_csum_pointers(). The functions to get and set the checksum pointers consistently across different kernel versions had different interpretations of what the csum_offset pointer was relative to, which is confusing, to say the least. This makes the meaning be the same as skb->csum_offset in modern kernels and updates the caller. For a given function the results were consistent across kernel versions and the callers knew what the meaning should be, so this doesn't actually fix any bugs. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-16 09:42:14 -08:00
Jesse Gross	94947cd83a	datapath: Correctly return error if percpu allocation fails. If the allocation of percpu stats fails when creating a new datapath, we currently don't return the correct error code. Since we don't explicitly set it when the allocation fails it will keep the value from the previous call. This means we will return success when the creation actually failed. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 16:08:23 -08:00
Jesse Gross	d3c5445150	datapath: Validate lock when handling flow actions. When reading actions without rcu_read_lock we need to hold the datapath lock. This checks that using lockdep. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:43:50 -08:00
Jesse Gross	9abaf6b3f8	datapath: Check locks on access to flow table. When accessing the flow table without holding rcu_read_lcok we need to hold the lock on the datapath. This enables lockdep to validate that that is the case. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:43:44 -08:00
Jesse Gross	eb3ccf114a	datapath: Validate access to DP array. When access the array of DPs, we need to hold either rcu_read_lock or dp_mutex. This enables lockdep to validate those conditions. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:43:36 -08:00
Jesse Gross	ad9197112a	datapath: Don't directly access RCU protected pointers. If RTNL lock is used to protected updates to RCU data structures then it isn't necessary to use rcu_dereference() to access them if RTNL is held. This adds rtnl_dereference() to access these pointers which has several benefits: documents the locking expectations; checks that RTNL actually is held when run with lockdep; makes sparse not complain about directly accessing RCU pointers. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:40:27 -08:00
Jesse Gross	8dda8c9b63	datapath: Correct byte order annotations. We have generally been using the byte order specific data types (i.e. __be32 instead of u32) in most places. This corrects a declaration and adds a few needed casts. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:40:20 -08:00
Jesse Gross	e1040c772f	datapath: Add usage of __rcu annotation. Sparse can warn about incorrect usage of RCU via direct access to points when used in conjuction with __rcu and CONFIG_SPARSE_RCU. This adds the necessary annotations. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:40:13 -08:00
Jesse Gross	1b29ebe576	datapath: Correct usage of __user annotation. We generally have been using the __user annotation but there were a few places where it was missing or needed a cast. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-13 13:37:41 -08:00
Ben Pfaff	40796b34ab	datapath: Remove explicit 'unlikely' from IS_ERR calls. As David Miller pointed out on netdev today, IS_ERR has a built-in 'unlikely', so there's no point in adding one of our own. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-13 10:24:04 -08:00
Jesse Gross	f267de8ae6	datpath: Fix memory leak when a loop is detected. If we detect a packet that is looping we kill the flow but then don't do anything with the packet that caused the problem in the first place, so this frees the packet. This isn't a very serious leak because we try to shut off the flow that lead to the loop as early as possible. Once this happens, packets will no longer hit the loop detector and will be freed just as any other packet that should be dropped. It also fixes an issue where the offset to the stats counter is uninitialized after a loop is detected. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-11 15:19:47 -08:00
Jesse Gross	2d7ce2ee1a	datapth: Drop check for impossible condition after skb_gso_segment(). It's possible for skb_gso_segment to return NULL but only if the hardware supports the correct form of segmentation offload but just wants software to verify the offload parameters. However, since we're not hardware and don't support any kind of segmentation offload natively, we can never get in this situation. Therefore drop the check and comment. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-11 15:19:47 -08:00
Ben Pfaff	b9298d3f82	Expand tunnel IDs from 32 to 64 bits. We have a need to identify tunnels with keys longer than 32 bits. This commit adds basic datapath and OpenFlow support for such keys. It doesn't actually add any tunnel protocols that support 64-bit keys, so this is not very useful yet. The 'arg' member of struct odp_msg had to be expanded to 64-bits also, because it sometimes contains a tunnel ID. This member also contains the argument passed to ODPAT_CONTROLLER, so I expanded that action's argument to 64 bits also so that it can use the full width of the expanded 'arg'. Userspace doesn't take advantage of the new space though (it was only using 16 bits anyhow). This commit has been tested only to the extent that it doesn't disrupt basic Open vSwitch operation. I have not tested it with tunnel traffic. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Feature #3976.	2010-12-10 11:14:13 -08:00
Ben Pfaff	cdee00fd63	datapath: Replace "struct odp_action" by Netlink attributes. In the medium term, we plan to migrate the datapath to use Netlink as its communication channel. In the short term, we need to be able to have actions with 64-bit arguments but "struct odp_action" only has room for 48 bits. So this patch shifts to variable-length arguments using Netlink attributes, which starts in on the Netlink transition and makes 64-bit arguments possible at the same time. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-10 11:13:32 -08:00
Jesse Gross	fb016a6043	datapath: Drop unused file ops. There have been two ops to support async access to the datapath character device for a long time but they have never been implemented. Drop the commented out code. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-09 17:43:37 -08:00
Jesse Gross	e22d4953ec	datapath: Hold mutex for DP while userspace is blocking on read. Currently we get a pointer to the DP in openvswitch_read() and openvswitch_poll() and use it without any synchronization. This means that the DP could disappear from underneath us while we are using it. Currently, this isn't a problem because userspace is single threaded but it's better for the locking to be correct. With this change we hold the mutex while doing a blocking wait, which means that no changes can be made, including adding/removing flows. It's possible to make this finer grained but for the time being that isn't done, since current userspace doesn't care. Found with lockdep. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-09 17:43:37 -08:00
Jesse Gross	dad80ec308	datapath: dp_sysfs_add_dp() needs RTNL lock. We currently drop RTNL before adding a new datapath to sysfs but then access the dp data structures. This moves the call to dp_sysfs_add_dp() before we drop the locks to prevent a potential race. All other calls to sysfs functions already hold RTNL. Found with lockdep. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-09 17:43:37 -08:00
Jesse Gross	33b38b63e4	datapath: Use static where possible. Mark functions and global variables used only in a single file as static. Found with sparse. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-09 17:43:36 -08:00
Jesse Gross	dd8d6b8cd4	datapath: Consolidate checksum compatibility code. Checksum offloading has changed quite a bit across different kernel and Xen versions. Since it is part of the skb data structure it is unfortunately difficult to separate out into compatibility code. This consolidates all of the checksum code in one place which makes it easier read and remove as we prepare for upstreaming. On newer kernels it also puts everything in inline functions, eliminating the need to run through the compat code or make extra function calls. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-03 17:40:26 -08:00
Ben Pfaff	7237e4f4b6	datapath: Merge vport "attach" into "create" and "detach" into "destroy". These steps are sequentially in lockstep, so we might as well combine them. This expands the region over which the vport_lock is held. I didn't carefully verify that this was OK. This also eliminates the synchronize_rcu() call from destruction of tunnel vports, since they didn't appear to me to need it. It should be possible to eliminate the synchronize_rcu() from the netdev, patch, and internal_dev vports, but this commit does not do that. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-03 15:44:51 -08:00
Ben Pfaff	e779d8d90d	datapath: Merge "struct dp_port" into "struct vport". After the previous commit, which changed the datapath to always create and attach a vport at the same time, and to always detach and delete a vport at the same time, there is no longer any real distinction between a dp_port and a vport. This commit, therefore, merges the two together to simplify code. It might even improve performance, although I have not checked. I wasn't sure at first whether the merged structure should be "struct dp_port" or "struct vport". I went with the latter since the "v" prefix sounds cool. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-03 14:43:38 -08:00
Ben Pfaff	c3827f619a	datapath: Make adding and attaching a vport a single step. For some time now, Open vSwitch datapaths have internally made a distinction between adding a vport and attaching it to a datapath. Adding a vport just means to create it, as an entity detached from any datapath. Attaching it gives it a port number and a datapath. Similarly, a vport could be detached and deleted separately. After some study, I think I understand why this distinction exists. It is because ovs-vswitchd tries to open all the datapath ports before it tries to create them. However, changing it to create them before it tries to open them is not difficult, so this commit does this. The bulk of this commit, however, changes the datapath interface to one that always creates a vport and attaches it to a datapath in a single step, and similarly detaches a vport and deletes it in a single step. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-03 14:41:38 -08:00
Ben Pfaff	94903c9898	datapath: Encapsulate parameters for new vports in new struct vport_parms. Upcoming commits will keep needing to pass more information to the vport 'create' member function. It's annoying to have to modify a dozen pieces of code every time just to do this, so this commit encapsulates all of the parameters in a new struct and passes that instead. Acked-by: Jesse Gross <jesse@nicira.com>	2010-12-03 11:33:41 -08:00
Jesse Gross	83e3e75ba6	datapath: Use __read_mostly annotations where appropriate. Variables which are changed only infrequently should be annotated with __read_mostly, which will group them together in a special linker section. This prevents them from sharing cache lines with data on the hot path. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-12-02 17:10:15 -08:00
Jesse Gross	7eaa983051	datapath: Add loop detection for RT kernels. Our normal loop detection requires disabling preemption while packet processing takes place. On RT kernels this isn't acceptable and interacts badly with spinlocks, so we can't use it. This takes advantage of some extra space that is added to struct task_struct on RT kernels (and the knowledge that we will always have a valid task_struct) to store the loop counter for a given thread. Since we can't make these assumptions on non-RT kernels, we continue to use the previous method of loop detection there. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-10-25 13:40:51 -07:00
Ben Pfaff	27bcf966b4	datapath: Simplify ODPAT_SET_DL_TCI action. There's no need to have a mask in this action, because both parts of the TCI are part of the flow structure. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-10-18 11:18:23 -07:00
Ben Pfaff	c31abb566d	datapath: Fix build break in compat_query_flows(). Commit `a01ef04ce` "datapath: Drop padding from struct odp_flow_key" removed the "reserved" member from struct odp_flow_key but overlooked uses of that member from 64-bit compatibility code. This commit fixes up the problem. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-10-11 14:20:22 -07:00
Hao Zheng	65d042a1b4	datapath: fix flow used time stats This commit fixes a problem where datapath flow used stats can report incorrent value, thus may result in flow expirations being incorrect. This happens when a kernel jiffies rollover occurs between the last time a flow is hit and the flow stats is queried. Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Hao Zheng <hzheng@nicira.com>	2010-10-11 13:55:31 -07:00
Ben Pfaff	7956695a6c	datapath: Always use GFP_ATOMIC to execute actions. These functions run 99% of the time in atomic context and the benefit of passing along the 'gfp' argument for the other 1% doesn't seem to outweigh the cost. Suggested-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2010-10-11 13:31:43 -07:00
Ben Pfaff	a01ef04ced	datapath: Drop padding from struct odp_flow_key. Breaking this out as a separate commit should make it easier to see what needs to change later, if we need to reintroduce padding at some point. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-10-11 13:31:43 -07:00
Ben Pfaff	26233bb461	datapath: Combine dl_vlan and dl_vlan_pcp. This allows eliminating padding from odp_flow_key, although actually doing that is postponed until the next commit. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2010-10-11 13:31:43 -07:00
Ben Pfaff	f1588b1fa1	datapath: Remove implementation of port groups. The "port group" concept seems like a good one, but it has not been used very much in userspace so far, so before we commit ourselves to a frozen API that we must maintain forever, remove it. We can always add it back in later as a new kind of vport. Signed-off-by: Ben Pfaff <blp@nicira.com>	2010-10-11 12:40:11 -07:00
Jesse Gross	7c79397fe8	datapath: Backport workqueue functions. An upcoming commit will use some workqueue functions that weren't available on earlier kernels, so this backports those functions. The backporting uses timers instead of delayed work queues because the earlier versions of work queues have some unsafe corner cases. In addition, this removes some unused work queue backporting code that is no longer used because it is potentially unsafe. Note that this commit changes the behavior of work queues: normally they run in process context but the backported version runs in softirq context. Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Ben Pfaff <blp@nicira.com>	2010-09-22 13:43:02 -07:00
Jesse Gross	b7a31ec13d	datapath: Move is_frag out of struct ovs_skb_cb. is_frag is only used for communication between two functions, which means that it doesn't really need to be in the SKB CB. This wouldn't necessarily be a problem except that there are also a number of other paths that lead to this being uninitialized. This isn't a problem now but uninitialized memory seems dangerous and there isn't much upside. Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Ben Pfaff <blp@nicira.com>	2010-09-22 13:43:01 -07:00
Jesse Gross	3976f6d57b	datapath: Enable usage of cached flows. An upcoming commit will add support for supplying cached flows for packets entering the datapath. This adds the code in the datapath itself to recognize these cached flows and use them instead of extracting the flow fields and doing a lookup. Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Ben Pfaff <blp@nicira.com>	2010-09-22 13:43:01 -07:00
Jesse Gross	fb8c93473e	datapath: Add ref counting for flows. Currently flows are only used within the confines of one rcu_read_lock()/rcu_read_unlock() session. However, with the addition of header caching we will need to hold references to flows for longer periods of time. This adds support for that by adding refcounts to flows. RCU is still used for normal packet handling to avoid a performance impact from constantly updating the refcount. However, instead of directly freeing the flow after a grace period we simply decrement the refcount. Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Ben Pfaff <blp@nicira.com>	2010-09-22 13:43:01 -07:00
Jesse Gross	560e802229	datapath: Move flow allocation into a function. As the process to allocate a flow becomes more involved it becomes more cumbersome for the code to be mixed in with the general datapath so split it out into a new function. Signed-off-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Ben Pfaff <blp@nicira.com>	2010-09-22 13:43:01 -07:00
Jesse Gross	3273ae634a	datapath: Remove backported random32(). Nothing uses it anymore and it causes problems when backported on some distributions. Kernels we support have net_random(), which is the same thing so there is no reason to have an entire copy of the random number generator in our source tree. Reported-by: Alexey I. Froloff <raorn@altlinux.org> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-09-17 13:34:05 -07:00
Jesse Gross	8ba1fd2fb9	datapath: Check IS_ERR() in do_execute(). flow_actions_alloc() returns an error code in the form of a pointer but we checked that the pointer was not NULL, which is always true. This caused oopses on allocation errors when we would write into an invalid pointer. NIC-234 Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2010-09-15 17:43:46 -07:00
Joe Perches	dfffaef1eb	treewide: Use pr_fmt and pr_<level> Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> [Jesse: Added missing pr_fmt in vport-gre.c and dp_sysfs_dp.c] Signed-off-by: Jesse Gross <jesse@nicira.com>	2010-08-30 13:23:08 -07:00
Joe Perches	d295e8e97a	treewide: Remove trailing whitespace Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>	2010-08-30 13:23:08 -07:00
Ben Pfaff	4c1ad23312	datapath: Report memory allocation errors in flow_extract(). Until now flow_extract() has simply returned a bogus flow when memory allocation errors occurred. This fixes the problem by propagating the error to the caller. Signed-off-by: Ben Pfaff <blp@nicira.com>	2010-08-27 12:32:05 -07:00
Ben Pfaff	769f8ccd5f	datapath: Free up flow_extract() return value for reporting errors. flow_extract() can fail due to memory allocation errors in pskb_may_pull(). Currently it doesn't return those properly, instead just reporting a bogus flow to the caller. But its return value is currently in use for reporting whether the packet was an IPv4 fragment. This commit switches to reporting that in the skb itself so that the return value can be reused to report errors. Signed-off-by: Ben Pfaff <blp@nicira.com>	2010-08-26 09:15:42 -07:00
Jesse Gross	38c6ecbc8d	datpath: Avoid reporting half updated statistics. We enforce mutual exclusion when updating statistics by disabling bottom halves and only writing to per-CPU state. However, reading requires looking at the statistics for foreign CPUs, which could be in the process of updating them since there isn't a lock. This means we could get garbage values for 64-bit values on 32-bit machines or byte counts that don't correspond to packet counts, etc. This commit introduces a sequence lock for statistics values to avoid this problem. Getting a write lock is very cheap - it only requires incrementing a counter plus a memory barrier (which is compiled away on x86) to acquire or release the lock and will never block. On read we spin until the sequence number hasn't changed in the middle of the operation, indicating that the we have a consistent set of values. Signed-off-by: Jesse Gross <jesse@nicira.com>	2010-08-20 19:44:46 -07:00

1 2 3 4 5

205 Commits