mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-31 06:15:47 +00:00

Author	SHA1	Message	Date
Joe Stringer	781fce1574	dpif: Support fetching flow mask via dpif_flow_get(). Change the interface to allow implementations to pass back a buffer, and allow callers to specify which of actions, mask, and stats they wish to receive. This will be used in the next commit. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-07-15 15:49:06 +12:00
Daniele Di Proietto	2240af2576	dpif-netdev: enumerate dpif belonging to the right class Since dpif_netdev_enumerate() is used for "netdev" and "dummy" class, it incorrectly lists dpif-netdevs as "dummy" and vice versa. This patches address the issue by changing the dpif-provider interface: a dpif_class parameter is passed to the 'enumerate' call to match the right class. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-06-12 16:54:12 -07:00
Ben Pfaff	ac64794acb	dpif: Refactor flow dumping interface to make better sense for batching. Commit `a6ce4b9d25` (ofproto-dpif-upcall: Avoid use-after-free in revalidate() corner case.) showed that it is somewhat tricky to correctly use the existing dpif flow dumping interface to obtain batches of flows. One has to be careful about calling dpif_flow_dump_next_may_destroy_keys() before going on to the next flow. A better interface is possible, one that is naturally oriented toward retrieving batches when that is a useful optimization. This commit replaces the dpif interface by such a design, and updates both the implementations and the callers to adopt it. This is a fairly large change, but I think that the code in ofproto-dpif-upcall is easier to understand after the change. Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-05-20 11:37:02 -07:00
Alex Wang	1954e6bbcb	dpif: Change dpif API to allow multiple handler threads read upcall. This commit changes the API in 'dpif-provider.h' to allow multiple handler threads call dpif_recv() simultaneously. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-03-20 10:27:10 -07:00
Joe Stringer	bdeadfdd95	dpif: New function flow_dump_next_may_destroy_keys(). This new function allows callers to determine whether previously returned keys will be modified or reallocated on the next call to dpif_flow_dump_next(). This will be used in a future commit to allow batched flow deletion by revalidator threads. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-02-27 14:39:21 -08:00
Joe Stringer	d2ad7ef178	dpif: Make dpif_flow_dump_next() thread-safe. This patch makes it the caller's responsibility to initialize a per-thread 'state' object and pass it down to the dpif_flow_dump_next() implementation. The implementation can expect to be called from multiple threads with the same 'iter' and different 'state' objects. When flow_dump_next() returns non-zero, the implementation must ensure that subsequent calls with the same arguments also return non-zero. Subsequent calls with the same 'iter' and different 'state' may return zero, but should make progress towards returning non-zero. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-02-27 14:30:25 -08:00
Joe Stringer	e723fd32d5	dpif: Separate local and shared flow dump state. This patch separates the structures for thread-local flow dump state ("state") from the shared flow dump state ("iter") in dpif-linux and dpif-netdev. Future patches will make use of this to allow multiple threads to dump flows from the same flow dump operation. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2014-02-27 14:27:32 -08:00
Ben Pfaff	9e5026938c	dpif: Remove unused 'get_max_ports' from provider interface. Nothing ever called this function. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2014-01-08 17:10:31 -08:00
Jarno Rajahalme	758c456df5	dpif: Use explicit packet metadata. This helps reduce confusion about when a flow is a flow and when it is just metadata. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2013-12-30 16:52:43 -08:00
Jarno Rajahalme	837a88dccb	Do not free uninitialized packets. Commit `da546e0` (dpif: Allow execute to modify the packet.) uninitializes the "dpif_upcall.packet" of "struct upcall" when dpif_recv() returns error. The packet ofpbuf is likely uninitialized in this case, hence calling ofpbuf_uninit() on it will likely cause a SEGFAULT. This commit fixes this bug by only uninitializing packet's ofpbuf on successfully received upcalls. A note warning about this is added on the comment of dpif_recv() in dpif.c and dpif-provider.h. Reported-by: Alex Wang <alexw@nicira.com> Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2013-12-17 15:54:30 -08:00
Jarno Rajahalme	da546e0764	dpif: Allow execute to modify the packet. Allowing the packet to be modified by execution allows less data copying for userspace action execution. Some users of the dpif_execute already expect that the packet may be modified. This patch makes this behavior uniform and makes the userspace datapath and the execution helpers modify the packet as it is being executed. Userspace action now steals the packet if given permission, as the packet is normally not needed after it. The only exception is the sample action, and this is accounted for my keeping track of any actions that could be following the userspace action. The packet in dpif_upcall is changed from a pointer to a struct, allowing the packet to be honest about it's headroom. After this change the packet can safely be pushed on over the precarious 4 byte limit earlier allowed by the netlink data preceding the packet. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2013-12-16 08:14:52 -08:00
Alex Wang	1dd16b9a27	dpif: Change get_max_ports() to return uint32_t. The declaration of 'get_max_ports()' to return odp_port_t adds unwanted complexity to coding. This commit changes it back to return uint32_t type. Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2013-08-23 11:33:17 -07:00
Alex Wang	4e022ec09e	Create specific types for ofp and odp port Until now, datapath ports and openflow ports were both represented by unsigned integers of various sizes. With implicit conversions, etc., it is easy to mix them up and use one where the other is expected. This commit creates two typedefs, ofp_port_t and odp_port_t. Both of these two types are marked by "__attribute__((bitwise))" so that sparse can be used to detect any misuse. Signed-off-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2013-06-20 10:42:37 -07:00
Andy Zhou	e6cc0babc2	ovs-dpctl: Add mega flow support Added support to allow mega flow specified and displayed. ovs-dpctl tool is mainly used as debugging tool. This patch also implements the low level user space routines to send and receive mega flow netlink messages. Those netlink suppor routines are required for forthcoming user space mega flow patches. Added a unit test to test parsing and display of mega flows. Ethan contributed the ovs-dpctl mega flow output function. Co-authored-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Ethan Jackson <ethan@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2013-06-20 10:33:51 -07:00
Ben Pfaff	cb22974d77	Replace most uses of assert by ovs_assert. This is a straight search-and-replace, except that I also removed #include <assert.h> from each file where there were no assert calls left. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Ethan Jackson <ethan@nicira.com>	2013-01-16 16:03:37 -08:00
Justin Pettit	0aeaabc8db	Add functions to determine how port should be opened based on type. Depending on the port and type of datapath, a port may need to be opened as a different type of device than it's configured. For example, an "internal" port on a "dummy" datapath should opened as a "dummy" port. This commit adds the ability for a dpif to provide this information to a caller. It will be used in a future commit. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-11-16 12:35:55 -08:00
Justin Pettit	4afba28d55	dpif: Add new dpif_port_exists() function. Provide the ability to determine whether a port exists in a datapath without having to deal with a "dpif_port" structure as with dpif_port_query_by_name(). A future patch will use this function. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-11-01 22:54:27 -07:00
Justin Pettit	9b56fe137d	Always treat datapath ports as 32 bits. Most of the code referred to datapath ports as 32-bit values, but a few places still used 16-bit references. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-11-01 22:54:27 -07:00
Jesse Gross	296e07ace0	flow: Extend struct flow to contain tunnel outer header. Soon the kernel will begin supplying the information about the outer IP header for tunneled packets and userspace will need to be able to track it as part of the flow. For the time being this is only used internally by OVS and not exposed outwards to OpenFlow. As a result, this threads the information throughout userspace but simply stores the existing tun_id in it. Signed-off-by: Jesse Gross <jesse@nicira.com>	2012-10-03 10:04:10 -07:00
Justin Pettit	232dfa4aa3	dpif: Allow the port number to be requested when adding an interface. The datapath allows requesting a specific port number for a port, but the dpif interface didn't expose it. This commit adds that support. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-07-30 20:54:16 -07:00
Ben Pfaff	625b07205a	ofproto-dpif: Segregate CFM, LACP, and STP traffic into separate queues. Until now, packets for these special protocols have been mixed with general traffic in the kernel-to-userspace queues. This means that a big-enough storm of new flows in these queues can cause packets for these special protocols to be dropped at this interface, fooling userspace into believing that, say, no CFM packets have been received even though they are arriving at the expected rate. This commit moves special protocols to a dedicated kernel-to-userspace queue to avoid the problem. Bug #7550. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-05-09 12:58:55 -07:00
Raju Subramanian	e0edde6fee	Global replace of Nicira Networks. Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc. Feature #10593 Signed-off-by: Raju Subramanian <rsubramanian@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-05-02 17:08:02 -07:00
Ben Pfaff	90a7c55e56	dpif: Make caller of dpif_recv() provide buffer space. This improves performance under heavy flow setup loads. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-18 20:28:51 -07:00
Ben Pfaff	b99d3ceeed	ofproto-dpif: Batch flow uninstallations due to expiration. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-18 20:28:12 -07:00
Ben Pfaff	89625d1efb	dpif: Change provider interface to consistently use operation structs. Until now, a "flow put" has represented its parameters in two different ways, depending on whether it was coming from dpif_flow_put() or from dpif_operate(), and similarly for an "execute" operation. This commit adopts the operation struct consistently within the dpif provider interface, which seems cleaner. This commit also factors out logging for flow puts and executes, which is useful in the following commit. This doesn't change the dpif client interface, since the two forms are more convenient for clients than always filling out an operation struct. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-01-16 13:37:27 -08:00
Ben Pfaff	c2b565b54e	dpif: Factor 'type' and 'error' out of individual dpif_op members. I'd like to change ->dpif_flow_put() and ->dpif_execute() in the dpif provider to take the structures of the same names as parameters, instead of passing them discrete parameters, because this seems like a more sensible way to do things internally than to have two different ways to pass the parameters. It might even simplify code slightly. But ->flow_put() and ->execute() wouldn't want the 'type' (because it's implied by the function being called) or 'error' (because it would be the same as the return value). Although of course they could just ignore those members, it seems slightly cleaner to omit them entirely, as this change allows. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-01-16 13:35:21 -08:00
Ben Pfaff	a12b3eadc6	dpif: Simplify the "listen mask" concept. At one point in the past, there were three separate queues between the kernel module and OVS userspace, each of which corresponded to a Netlink socket (or, before that, to a character device). It made sense to allow each of these to be enabled or disabled separately, hence the "listen mask" concept in the dpif layer. These days, the concept is much less clear-cut. Queuing is no longer on the basis of different classes of packets but instead striped across a collection of sockets based on input port. It doesn't really make sense to enable receiving packets on the basis of the kind of packet anymore. Accordingly, this commit simplifies the "listen_mask" to just a bool that either enables or disables receiving packets. It could be useful to enable or disable receiving packets on a per-vport basis, but the rest of the code isn't ready to make use of that so this commit doesn't generalize this much. Based on this discussion on ovs-dev: http://openvswitch.org/pipermail/dev/2011-October/012044.html Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-01-12 17:09:22 -08:00
Pravin B Shelar	abff858b5a	datapath: Convert kernel priority actions into match/set. Following patch adds skb-priority to flow key. So userspace will know what was priority when packet arrived and we can remove the pop/reset priority action. It's no longer necessary to have a special action for pop that is based on the kernel remembering original skb->priority. Userspace can just emit a set priority action with the original value. Since the priority field is a match field with just a normal set action, we can convert it into the new model for actions that are based on matches. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #7715	2011-11-01 10:13:16 -07:00
Ben Pfaff	7257b535ab	Implement new fragment handling policy. Until now, OVS has handled IP fragments more awkwardly than necessary. It has not been possible to match on L4 headers, even in fragments with offset 0 where they are actually present. This means that there was no way to implement ACLs that treat, say, different TCP ports differently, on fragmented traffic; instead, all decisions for fragment forwarding had to be made on the basis of L2 and L3 headers alone. This commit improves the situation significantly. It is still not possible to match on L4 headers in fragments with nonzero offset, because that information is simply not present in such fragments, but this commit adds the ability to match on L4 headers for fragments with zero offset. This means that it becomes possible to implement ACLs that drop such "first fragments" on the basis of L4 headers. In practice, that effectively blocks even fragmented traffic on an L4 basis, because the receiving IP stack cannot reassemble a full packet when the first fragment is missing. This commit works by adding a new "fragment type" to the kernel flow match and making it available through OpenFlow as a new NXM field named NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields are always 0 for IP fragments, it adds a new OpenFlow fragment handling mode that fills in the L4 fields for "first fragments". It also enhances ovs-ofctl to allow users to configure this new fragment handling mode and to parse the new field. Signed-off-by: Ben Pfaff <blp@nicira.com> Bug #7557.	2011-10-21 15:07:36 -07:00
Ben Pfaff	6bc6002490	dpif: New function dpif_operate() and dpif-linux implementation. This will be used in an upcoming commit.	2011-10-14 14:08:44 -07:00
Ben Pfaff	98403001ec	datapath: Move Netlink PID for userspace actions from flows to actions. Commit `b063d9f06` "datapath: Use unicast Netlink sockets for upcalls" that switched from multicast to unicast Netlink for sending upcalls added a Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions within the flow as target. This commit drops this per-flow PID in favor of a per-action PID, because that is more flexible. It does not yet make use of this additional flexibility, so behavior should not change. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #7559.	2011-10-12 16:27:00 -07:00
Ben Pfaff	a8d9304d12	dpif: Avoid use of "struct ovs_dp_stats" in platform-independent modules. Over time we wish to reduce the number of datapath-protocol.h definitions used directly outside of Linux-specific code. This commit removes use of "struct ovs_dp_stats" from platform-independent code. Bug #7559.	2011-10-05 11:18:13 -07:00
Pravin Shelar	6ff686f2bc	sFlow: Genericize/simplify kernel sFlow implementation Following patch adds sampling action which takes probability and set of actions as arguments. When probability is hit, actions are executed for given packet. USERSPACE action's userdata (u64) is used to store struct user_action_cookie as cookie. CONTROLLER action is fixed accordingly. Now we can remove sFlow code from kernel and implement sFlow generically as SAMPLE action. sFlow is defined as SAMPLE Action with probability (sFlow sampling rate) and USERSPACE action as argument. USERSPACE action's data is used as cookie. sFlow uses this cookie to store output-port, number of output ports and vlan-id. sample-pool is calculated by using vport stats. Signed-off-by: Pravin Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2011-09-28 10:43:07 -07:00
Justin Pettit	df2c07f433	datapath: Use "OVS_" as opposed to "ODP_" for user<->kernel interactions. The prefix "ODP_" is not overly descriptive in the context of the larger Linux tree. This commit changes the prefix to "OVS_" for the userpace to kernel interactions. The userspace libraries still use "ODP_" in many of their interfaces since it is more descriptive in the OVS oeuvre. Feature #6904 Signed-off-by: Justin Pettit <jpettit@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-08-19 22:48:23 -07:00
Ben Pfaff	80e5eed9c2	datapath: Get packet metadata from userspace in odp_packet_cmd_execute(). Until now, the tun_id and in_port have been lost when a packet is sent from the kernel to userspace and then back to the kernel. I didn't think that this was a problem, but recent behavior made me look closer and see that it makes a difference if sFlow is turned on or if an ODP_ATTR_ACTION_CONTROLLER action is present. We could possibly kluge around those, but for future-proofing it seems better to pass the packet metadata from userspace to the kernel. That is what this commit does. This commit introduces a user-kernel protocol break. We could avoid that, if it is desirable, by making ODP_PACKET_ATTR_KEY optional for ODP_PACKET_CMD_EXECUTE commands. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-06-01 13:39:51 -07:00
Ben Pfaff	640e1b2077	dpif: Improve abstraction by making 'run' and 'wait' functions per-dpif. Until now, the dp_run() and dp_wait() functions had to be called at the top level of the program because they applied to every open dpif. By replacing them by functions that take a specific dpif as an argument, we can call them only from ofproto, which is currently the correct layer to deal with dpifs.	2011-05-11 12:26:07 -07:00
Ben Pfaff	d0c23a1a57	dpif: Use sset instead of svec in dpif interface.	2011-03-31 16:42:01 -07:00
Ben Pfaff	7aec165dbc	datapath: s/ODPAT_/ODP_ACTION_ATTR_/ to fit new naming scheme. Jesse suggested this naming scheme, so I'm adjusting existing names to fit it. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-28 15:34:40 -08:00
Ben Pfaff	3d8c95357f	dpif: Remove dpif_get_all_names(). None of the remaining dpif implementations have more than one name per dpif, so there's no need for this function anymore. Suggested-by: Jesse Gross <jesse@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-28 15:34:40 -08:00
Ben Pfaff	82272eded1	Eliminate ODPL_* from userspace-facing interface. Reviewed by Justin Pettit.	2011-01-27 21:08:41 -08:00
Ben Pfaff	693c4a0112	datapath: Eliminate 'flags' member from odp_flow. Nothing was productively using the 'flags' member of odp_flow, so this commit removes it. ODPFF_ZERO_TCP_FLAGS isn't used at all (as of the previous commit). ODPFF_EOF has been replaced by a special case of the 'key_len' member. This will go away, too, once AF_NETLINK starts being used. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-27 21:08:39 -08:00
Ben Pfaff	ba25b8f41f	dpif: Eliminate ODPPF_* constants from client-visible interface. Following this commit, the ODPPF_* constants are only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.	2011-01-27 21:08:39 -08:00
Ben Pfaff	c97fb13280	dpif: Eliminate "struct odp_flow_stats" from client-visible interface. Following this commit, "struct odp_flow_stats" is only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.	2011-01-27 21:08:38 -08:00
Ben Pfaff	feebdea2e5	dpif: Eliminate "struct odp_flow" from client-visible interface. Following this commit, "struct odp_flow" and related data structures are only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.	2011-01-27 21:08:38 -08:00
Ben Pfaff	bc4a05c639	datapath: Change ODP_FLOW_GET to retrieve only a single flow at a time. This brings the code closer to what the Netlink interface will need to implement. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-27 21:08:38 -08:00
Ben Pfaff	996c1b3d7a	datapath: Drop port information from odp_stats. As with n_flows, n_ports was used regularly by userspace to determine how much memory to allocate when listing ports, but it is no longer needed for that. max_ports, on the other hand, is necessary but it is also a fixed value for the kernel datapath right now and if we expand it we can also come up with a way to report the expanded value. The remaining members of odp_stats are actually real statistics that I intend to keep. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-27 21:08:38 -08:00
Ben Pfaff	1ba530f4b2	datapath: Drop queue information from odp_stats. This queue information will be available through the kernel socket layer once we move over to Netlink socket as transports, so we might as well get rid of the redundancy. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-27 21:08:38 -08:00
Ben Pfaff	4c738a8da5	dpif: Eliminate "struct odp_port" from client-visible interface. Following this commit, "struct odp_port" is only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.	2011-01-27 21:08:37 -08:00
Ben Pfaff	b0ec0f279e	datapath: Change listing ports to use an iterator concept. One of the goals for Open vSwitch is to decouple kernel and userspace software, so that either one can be upgraded or rolled back independent of the other. To do this in full generality, it must be possible to add new features to the kernel vport layer without changing userspace software. In turn, that means that the odp_port structure must become variable-length. This does not, however, fit in well with the ODP_PORT_LIST ioctl in its current form, because that would require userspace to know how much space to allocate for each port in advance, or to allocate as much space as could possibly be needed. Neither choice is very attractive. This commit prepares for a different solution, by replacing ODP_PORT_LIST by a new ioctl ODP_VPORT_DUMP that retrieves information about a single vport from the datapath on each call. It is much cleaner to allocate the maximum amount of space for a single vport than to do so for possibly a large number of vports. It would be faster to retrieve a number of vports in batch instead of just one at a time, but that will naturally happen later when the kernel datapath interface is changed to use Netlink, so this patch does not bother with it. The Netlink version won't need to take the starting port number from userspace, since Netlink sockets can keep track of that state as part of their "dump" feature. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-27 21:08:36 -08:00
Ben Pfaff	856081f683	datapath: Report kernel's flow key when passing packets up to userspace. One of the goals for Open vSwitch is to decouple kernel and userspace software, so that either one can be upgraded or rolled back independent of the other. To do this in full generality, it must be possible to change the kernel's idea of the flow key separately from the userspace version. This commit takes one step in that direction by making the kernel report its idea of the flow that a packet belongs to whenever it passes a packet up to userspace. This means that userspace can intelligently figure out what to do: - If userspace's notion of the flow for the packet matches the kernel's, then nothing special is necessary. - If the kernel has a more specific notion for the flow than userspace, for example if the kernel decoded IPv6 headers but userspace stopped at the Ethernet type (because it does not understand IPv6), then again nothing special is necessary: userspace can still set up the flow in the usual way. - If userspace has a more specific notion for the flow than the kernel, for example if userspace decoded an IPv6 header but the kernel stopped at the Ethernet type, then userspace can forward the packet manually, without setting up a flow in the kernel. (This case is bad from a performance point of view, but at least it is correct.) This commit does not actually make userspace flexible enough to handle changes in the kernel flow key structure, although userspace does now have enough information to do that intelligently. This will have to wait for later commits. This commit is bigger than it would otherwise be because it is rolled together with changing "struct odp_msg" to a sequence of Netlink attributes. The alternative, to do each of those changes in a separate patch, seemed like overkill because it meant that either we would have to introduce and then kill off Netlink attributes for in_port and tun_id, if Netlink conversion went first, or shove yet another variable-length header into the stuff already after odp_msg, if adding the flow key to odp_msg went first. This commit will slow down performance of checksumming packets sent up to userspace. I'm not entirely pleased with how I did it. I considered a couple of alternatives, but none of them seemed that much better. Suggestions welcome. Not changing anything wasn't an option, unfortunately. At any rate some slowdown will become unavoidable when OVS actually starts using Netlink instead of just Netlink framing. (Actually, I thought of one option where we could avoid that: make userspace do the checksum instead, by passing csum_start and csum_offset as part of what goes to userspace. But that's not perfect either.) Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-01-27 21:08:36 -08:00

1 2

72 Commits