mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

Author	SHA1	Message	Date
Justin Pettit	ddbfda8462	Use ODP ports in dpif layer and below. The current code has a simple mapping between datapath and OpenFlow port numbers (the port numbers were the same other than OFPP_LOCAL which maps to datapath port 0). Since the translation was know at compile time, this allowed different layers to easily translate between the two, so the translation often occurred late. A future commit will break this simple mapping, so this commit draws a line between where datapath and OpenFlow port numbers are used. The ofproto-dpif layer will be responsible for the translations. Callers above will use OpenFlow port numbers. Providers below will use datapath port numbers. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-11-01 22:54:27 -07:00
Justin Pettit	9b56fe137d	Always treat datapath ports as 32 bits. Most of the code referred to datapath ports as 32-bit values, but a few places still used 16-bit references. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-11-01 22:54:27 -07:00
Jesse Gross	296e07ace0	flow: Extend struct flow to contain tunnel outer header. Soon the kernel will begin supplying the information about the outer IP header for tunneled packets and userspace will need to be able to track it as part of the flow. For the time being this is only used internally by OVS and not exposed outwards to OpenFlow. As a result, this threads the information throughout userspace but simply stores the existing tun_id in it. Signed-off-by: Jesse Gross <jesse@nicira.com>	2012-10-03 10:04:10 -07:00
Ben Pfaff	8d0abb5ef5	dpif-linux: Report packet loss as WARN instead of ERR. Packet loss is recoverable so it doesn't warrant an ERR. Bug #12920. Reported-by: Scott Hendricks <shendricks@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-09-05 13:36:35 -07:00
Ben Pfaff	ebc56baa41	util: New macro CONST_CAST. Casts are sometimes necessary. One common reason that they are necessary is for discarding a "const" qualifier. However, this can impede maintenance: if the type of the expression being cast changes, then the presence of the cast can hide a necessary change in the code that does the cast. Using CONST_CAST, instead of a bare cast, makes these changes visible. Inspired by my own work elsewhere: http://git.savannah.gnu.org/cgit/pspp.git/tree/src/libpspp/cast.h#n80 Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-08-03 13:33:13 -07:00
Justin Pettit	232dfa4aa3	dpif: Allow the port number to be requested when adding an interface. The datapath allows requesting a specific port number for a port, but the dpif interface didn't expose it. This commit adds that support. Signed-off-by: Justin Pettit <jpettit@nicira.com>	2012-07-30 20:54:16 -07:00
Ben Pfaff	cfceb2b57a	dpif-linux: Zero 'stats' outputs of dpif_operate() ops on failure. When DPIF_OP_FLOW_PUT or DPIF_OP_FLOW_DEL operations failed, they left their 'stats' outputs uninitialized. For DPIF_OP_FLOW_DEL, this meant that the caller would read indeterminate data: Conditional jump or move depends on uninitialised value(s) at 0x805C1EB: subfacet_reset_dp_stats (ofproto-dpif.c:4410) by 0x80637D2: expire_batch (ofproto-dpif.c:3471) by 0x8066114: run (ofproto-dpif.c:3513) by 0x8059DF4: ofproto_run (ofproto.c:1035) by 0x8052E17: bridge_run (bridge.c:2005) by 0x8053F74: main (ovs-vswitchd.c:108) It's unusual for a delete operation to fail. The most common reason is an administrator running "ovs-dpctl del-flows". The only user of DPIF_OP_FLOW_PUT did not request stats, so this doesn't fix an actual bug for that case. Bug #11797. Reported-by: James Schmidt <jschmidt@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-06-25 16:47:21 -07:00
Ethan Jackson	aecfb4af7e	dpif-linux: Fix invalid format specifier. This fixes the following warning on my system. "format '%d' expects argument of type 'int', but argument 5 has type 'long int'" Signed-off-by: Ethan Jackson <ethan@nicira.com>	2012-06-06 11:29:33 -07:00
Ben Pfaff	14b4d2f99f	dpif-linux: Log details when a packet is lost. Until now, when a packet was dropped in the kernel-to-user buffers, we logged the occurrence but nothing that would allow a person reading the log after the fact to learn why it was dropped. This commit adds details that identify the major sources of packets in the buffer, which should help. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-06-01 17:40:35 -04:00
Ben Pfaff	fe3d61b373	dpif-linux: Slightly refactor internal data structures. An initial attempt also replaced the 'uint32_t ready_mask' in struct dpif_linux by a 'bool ready' in each struct dpif_channel, but I wasn't happy with the result (the ready_mask bitmap works out really well) and so I dropped that part. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-06-01 17:40:35 -04:00
Ben Pfaff	e222833e39	dpif-linux: Avoid pessimal behavior when kernel-to-user buffers overflow. When a kernel-to-user Netlink buffer overflows, the kernel reports ENOBUFS without passing along an actual message. When it does this, we should immediately try again, because we know that there is a message waiting, instead of reporting the error to the caller. This improves the OVS response rate to "hping3 --flood" traffic by a few percentage points in my testing. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-06-01 17:40:34 -04:00
Ben Pfaff	625b07205a	ofproto-dpif: Segregate CFM, LACP, and STP traffic into separate queues. Until now, packets for these special protocols have been mixed with general traffic in the kernel-to-userspace queues. This means that a big-enough storm of new flows in these queues can cause packets for these special protocols to be dropped at this interface, fooling userspace into believing that, say, no CFM packets have been received even though they are arriving at the expected rate. This commit moves special protocols to a dedicated kernel-to-userspace queue to avoid the problem. Bug #7550. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-05-09 12:58:55 -07:00
Raju Subramanian	e0edde6fee	Global replace of Nicira Networks. Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc. Feature #10593 Signed-off-by: Raju Subramanian <rsubramanian@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-05-02 17:08:02 -07:00
Ben Pfaff	90a7c55e56	dpif: Make caller of dpif_recv() provide buffer space. This improves performance under heavy flow setup loads. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-18 20:28:51 -07:00
Ben Pfaff	72d32ac0b3	netlink-socket: Make caller provide message receive buffers. Typically an nl_sock client can stack-allocate the buffer for receiving a Netlink message, which provides a performance boost. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-18 20:28:48 -07:00
Ben Pfaff	eabe7c680d	dpif-linux: Avoid malloc() in dpif_linux_operate(). Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-18 20:28:44 -07:00
Ben Pfaff	b99d3ceeed	ofproto-dpif: Batch flow uninstallations due to expiration. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-04-18 20:28:12 -07:00
Ben Pfaff	33db159244	dpif-linux: Make dpif_linux_port_query_by_name() query only one datapath. The kernel will report a vport with the given name in any datapath, but userspace only wants a vport with the given name in a specific datapath. Receiving information on a vport in an unexpected datapath yields bizarre and hard-to-debug problems. Bug #9889. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-02-27 18:42:17 -08:00
Pravin B Shelar	95b1d73a4a	datapath: Increase maximum number of datapath ports. Use hash table to store ports of datapath. Allow 64K ports per switch. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #2462	2012-02-16 17:12:36 -08:00
Ben Pfaff	89625d1efb	dpif: Change provider interface to consistently use operation structs. Until now, a "flow put" has represented its parameters in two different ways, depending on whether it was coming from dpif_flow_put() or from dpif_operate(), and similarly for an "execute" operation. This commit adopts the operation struct consistently within the dpif provider interface, which seems cleaner. This commit also factors out logging for flow puts and executes, which is useful in the following commit. This doesn't change the dpif client interface, since the two forms are more convenient for clients than always filling out an operation struct. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-01-16 13:37:27 -08:00
Ben Pfaff	c2b565b54e	dpif: Factor 'type' and 'error' out of individual dpif_op members. I'd like to change ->dpif_flow_put() and ->dpif_execute() in the dpif provider to take the structures of the same names as parameters, instead of passing them discrete parameters, because this seems like a more sensible way to do things internally than to have two different ways to pass the parameters. It might even simplify code slightly. But ->flow_put() and ->execute() wouldn't want the 'type' (because it's implied by the function being called) or 'error' (because it would be the same as the return value). Although of course they could just ignore those members, it seems slightly cleaner to omit them entirely, as this change allows. Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-01-16 13:35:21 -08:00
Ben Pfaff	a12b3eadc6	dpif: Simplify the "listen mask" concept. At one point in the past, there were three separate queues between the kernel module and OVS userspace, each of which corresponded to a Netlink socket (or, before that, to a character device). It made sense to allow each of these to be enabled or disabled separately, hence the "listen mask" concept in the dpif layer. These days, the concept is much less clear-cut. Queuing is no longer on the basis of different classes of packets but instead striped across a collection of sockets based on input port. It doesn't really make sense to enable receiving packets on the basis of the kind of packet anymore. Accordingly, this commit simplifies the "listen_mask" to just a bool that either enables or disables receiving packets. It could be useful to enable or disable receiving packets on a per-vport basis, but the rest of the code isn't ready to make use of that so this commit doesn't generalize this much. Based on this discussion on ovs-dev: http://openvswitch.org/pipermail/dev/2011-October/012044.html Signed-off-by: Ben Pfaff <blp@nicira.com>	2012-01-12 17:09:22 -08:00
Ben Pfaff	733c8d13d7	dpif-linux: Avoid valgrind warning in epoll_ctl() call. Valgrind points out correctly that there are uninitialized bytes in the 'event' structure. That's OK, but it doesn't hurt to suppress the warning by zeroing all of the bytes. This doesn't fix a real bug. Signed-off-by: Ben Pfaff <blp@nicira.com>	2011-12-12 14:47:10 -08:00
Ben Pfaff	50f80534f2	dpif-linux: Use "epoll" instead of poll(). epoll appears to be much more efficient than poll() at least for static file descriptor sets. I can't otherwise explain why this patch increases netperf CRR performance by 20% above the previous commit, which is also about a 19% overall improvement versus the baseline from before the poll_fd_woke() optimization was removed.	2011-11-28 09:31:07 -08:00
Ben Pfaff	8522ba0996	dpif-linux: Use poll() internally in dpif_linux_recv(). Using poll() internally in dpif_linux_recv(), instead of relying on the results of the main loop poll() call, brings netperf CRR performance back within 1% of par versus the code base before the poll_fd_woke() optimizations were introduced. It also increases the ovs-benchmark results by about 5% versus that baseline, too. My theory is that this is because the main loop takes long enough that a significant number of packets can arrive during the main loop itself, so this reduces the time before OVS gets to those packets.	2011-11-28 09:29:18 -08:00
Ben Pfaff	f6d1465cec	dpif-linux: Remove poll_fd_woke() optimization from dpif_linux_recv(). This optimization on its own provided about 37% benefit against a load of a single netperf CRR test, but at the same time it penalized ovs-benchmark by about 11%. We can get back the CRR performance loss, and more, other ways, so the first step is to revert this patch, temporarily accepting the performance loss.	2011-11-28 09:19:08 -08:00
Ben Pfaff	f7df9823a0	netlink: New macro NL_POLICY_FOR.	2011-11-11 14:07:17 -08:00
Pravin B Shelar	abff858b5a	datapath: Convert kernel priority actions into match/set. Following patch adds skb-priority to flow key. So userspace will know what was priority when packet arrived and we can remove the pop/reset priority action. It's no longer necessary to have a special action for pop that is based on the kernel remembering original skb->priority. Userspace can just emit a set priority action with the original value. Since the priority field is a match field with just a normal set action, we can convert it into the new model for actions that are based on matches. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #7715	2011-11-01 10:13:16 -07:00
Jesse Gross	69685a8882	datapath: Define constants for versions of GENL families. Currently we hard code the versions of our GENL families to 1 but it's nicer to have symbolic constants. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2011-10-23 11:24:19 -07:00
Ben Pfaff	7257b535ab	Implement new fragment handling policy. Until now, OVS has handled IP fragments more awkwardly than necessary. It has not been possible to match on L4 headers, even in fragments with offset 0 where they are actually present. This means that there was no way to implement ACLs that treat, say, different TCP ports differently, on fragmented traffic; instead, all decisions for fragment forwarding had to be made on the basis of L2 and L3 headers alone. This commit improves the situation significantly. It is still not possible to match on L4 headers in fragments with nonzero offset, because that information is simply not present in such fragments, but this commit adds the ability to match on L4 headers for fragments with zero offset. This means that it becomes possible to implement ACLs that drop such "first fragments" on the basis of L4 headers. In practice, that effectively blocks even fragmented traffic on an L4 basis, because the receiving IP stack cannot reassemble a full packet when the first fragment is missing. This commit works by adding a new "fragment type" to the kernel flow match and making it available through OpenFlow as a new NXM field named NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields are always 0 for IP fragments, it adds a new OpenFlow fragment handling mode that fills in the L4 fields for "first fragments". It also enhances ovs-ofctl to allow users to configure this new fragment handling mode and to parse the new field. Signed-off-by: Ben Pfaff <blp@nicira.com> Bug #7557.	2011-10-21 15:07:36 -07:00
Ben Pfaff	6bc6002490	dpif: New function dpif_operate() and dpif-linux implementation. This will be used in an upcoming commit.	2011-10-14 14:08:44 -07:00
Ben Pfaff	30b44744a1	dpif-linux: Only ask datapath to echo back results when they will be used. A fair number of datapath flow operations optionally report back results to the requester based on whether NLM_F_ECHO is set in the request. When userspace isn't going to use those results anyway, it wastes memory to store them and a system call to retrieve them. This commit omits the NLM_F_ECHO bit in cases where the caller isn't going to use the results. (NLM_F_ECHO has no effect on operations whose entire purpose is to retrieve data, e.g. "get" and "dump" operations, so we need not bother to set it for those.) This improves "ovs-benchmark rate" results in my testing by about 4%.	2011-10-14 14:08:44 -07:00
Ben Pfaff	09ded0ad48	datapath-protocol: Rename enums for consistency. Most of the enum tags in this file are lowercased versions of the uppercase enum prefixes (or slightly less abbreviated versions, e.g. "dp" becomes "datapath"). This commit fixes up the others for consistency. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-10-12 16:27:09 -07:00
Ben Pfaff	ea36840fa4	datapath: Require explicit upcall_pid for new datapaths and vports. This increases consistency with the OVS_ACTION_ATTR_USERSPACE action, which also requires an explicit pid. Suggested-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-10-12 16:27:08 -07:00
Ben Pfaff	98403001ec	datapath: Move Netlink PID for userspace actions from flows to actions. Commit b063d9f06 "datapath: Use unicast Netlink sockets for upcalls" that switched from multicast to unicast Netlink for sending upcalls added a Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions within the flow as target. This commit drops this per-flow PID in favor of a per-action PID, because that is more flexible. It does not yet make use of this additional flexibility, so behavior should not change. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #7559.	2011-10-12 16:27:00 -07:00
Ben Pfaff	0e70cdcb8d	dpif-linux: Use get_32aligned_u64() in an appropriate place.	2011-10-12 16:23:16 -07:00
Ben Pfaff	a24a65747a	dpif-linux: Don't reset kernel upcall_pids unintentionally. Commit b063d9f0 "datapath: Use unicast Netlink sockets for upcalls" that introduced an 'upcall_pid' member into struct dpif_linux_vport, struct dpif_linux_dp, and struct dpif_linux_flow neglected to do so only if the member was nonzero. This caused every datapath, vport, and flow operation to supply an upcall_pid. In particular, the netdev_set_config() called at startup when a vport already existed caused the upcall_pid for that vport to be reset to 0, which in turn caused all packets received on the vport to be dropped instead of forwarded to ovs-vswitchd. Reported-by: Shih-Hao Li <shli@nicira.com> Bug #7714.	2011-10-07 19:53:23 -07:00
Pravin B Shelar	33b14e70e2	datapath: Strip down vport interface - ifIndex. Following patch removes ifIndex attribute of vport which is not used in userspace. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Bug #7114	2011-10-05 19:06:29 -07:00
Ben Pfaff	a8d9304d12	dpif: Avoid use of "struct ovs_dp_stats" in platform-independent modules. Over time we wish to reduce the number of datapath-protocol.h definitions used directly outside of Linux-specific code. This commit removes use of "struct ovs_dp_stats" from platform-independent code. Bug #7559.	2011-10-05 11:18:13 -07:00
Pravin Shelar	6ff686f2bc	sFlow: Genericize/simplify kernel sFlow implementation Following patch adds sampling action which takes probability and set of actions as arguments. When probability is hit, actions are executed for given packet. USERSPACE action's userdata (u64) is used to store struct user_action_cookie as cookie. CONTROLLER action is fixed accordingly. Now we can remove sFlow code from kernel and implement sFlow generically as SAMPLE action. sFlow is defined as SAMPLE Action with probability (sFlow sampling rate) and USERSPACE action as argument. USERSPACE action's data is used as cookie. sFlow uses this cookie to store output-port, number of output ports and vlan-id. sample-pool is calculated by using vport stats. Signed-off-by: Pravin Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2011-09-28 10:43:07 -07:00
Jesse Gross	17411ecf2b	dpif-linux: Prevent a single port from monopolizing upcalls. Currently it is possible for a client on a single port to generate a huge number of packets that miss in the kernel flow table and monopolize the userspace/kernel communication path. This effectively DoS's the machine because no new flow setups can take place. This adds some additional fairness by separating each upcall type for each object in the datapath onto a separate socket, each with its own queue. Userspace then reads round-robin from each socket so other flow setups can still succeed. Since the number of objects can potentially be large, we don't always have a unique socket for each. Instead, we create 16 sockets and spread the load around them in a round robin fashion. It's theoretically possible to do better than this with some kind of active load balancing scheme but this seems like a good place to start. Feature #6485	2011-09-23 15:27:49 -07:00
Jesse Gross	b063d9f06e	datapath: Use unicast Netlink sockets for upcalls. Currently we publish several multicast groups for upcalls and let userspace sockets subscribe to them. The benefit of this is mostly that userspace is the one doing the subscription - the actual multicast capability is not currently used and probably wouldn't be even if we moved to a multiprocess model. Despite the convenience, multicast sockets have a number of disadvantages, primarily that we only have a limited number of them so there could be collisions. In addition, unicast sockets give additional flexibility to userspace by allowing every object to potentially have a different socket chosen by userspace for upcalls. Finally, any future optimizations for upcalls to reduce copying will likely not be compatible with multicast anyways so disallowing it potentially simplifies things. We also never unregistered the multicast groups registered for upcalls and leaked them on module unload. As a side effect, this solves that problem. Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2011-09-23 15:27:48 -07:00
Ethan Jackson	213a13ed77	dpif-linux: Handle nl_lookup_genl_mcgroup() failures. The nl_lookup_genl_mcgroup() function can fail on older kernels which do not support the required netlink interface. Before this patch, dpif-linux would refuse to create a datapath when this happened. With this patch, it attempts to use a workaround. If the workaround fails it simply disables the affected features without completely disabling the dpif.	2011-09-16 11:22:30 -07:00
Ethan Jackson	8f4a4df5e3	dpif-linux: Open dpif despite notifier failures. Before this patch, if dpif-linux failed to register a notifier it would give up opening the datapath entirely. This seems draconian as a dpif can still perform the majority of its intended functionality without vport notifications.	2011-09-16 11:22:30 -07:00
Ethan Jackson	2ee6545f2b	notifiers: Create and destroy nln_notifiers. This patch changes the interface of netlink-notifier and rtnetlink-link. Now nln_notifiers are allocated and destroyed by the module instead of passed in by callers. This allows the definition of nln_notifier to be hidden, and generally cleans up the code.	2011-09-16 11:22:30 -07:00
Ethan Jackson	18a2378164	notifiers: Rename run and wait functions. It makes more sense to call nln_notifier_run() and nln_notifier_wait() simply nln_run() and nln_wait() since they don't operate on notifiers but the entire nln object. This patch changes the nln and the rtnetlink-link modules to the new convention.	2011-09-16 11:22:30 -07:00
Pravin Shelar	f613a0d72c	datapath: Always use generic stats for devices (vports) Currently ovs is using device stats for Linux devices and count them itself in other situations. This leads to overlap with hardware stats, inconsistencies, etc. It's much better to just always count the packets flowing through the switch and let userspace do any merging that it wants. Following patch removes vport->get_stats() interface. vport-stat is changed to use new `struct ovs_vport_stat` rather than rtnl_link_stats64. Definitions of rtnl_link_stats64 is removed from OVS. dipf_port->stat is also removed as aggregate stats are only available at netdev layer. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-09-15 19:36:17 -07:00
Pravin Shelar	9b02078077	datapath: Strip down vport interface : OVS_VPORT_ATTR_MTU There is no need to have vport attribute MTU (OVS_VPORT_ATTR_MTU) as linux net-dev-ioctl can be used to get/set MTU for linux device. Following patch removes OVS_VPORT_ATTR_MTU from datapath protocol. This patch also adds netdev_set_mtu interface. So that MTU adjustments can be done from OVS userspace. get_mtu() interface is also changed, now get_mtu() returns EOPNOTSUPP rather than returning 0 and setting *pmtu to INT_MAX in case there is no MTU attribute for given device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-09-12 17:12:52 -07:00
Pravin Shelar	ff8d7a5e81	Strip down vport interface : iflink Remove iflink from vport interface. iflink is not used anywhere in OVS. So there is not need to have iflink as vport attribute. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>	2011-09-08 15:18:42 -07:00
Ethan Jackson	c7178a0b3e	dpif-linux: Stop listening for RTNL notifications. Currently dpif-linux listens for vport change events using rtnetlink notifications. This patch switches to the ovs genl notification system. Feature #6809.	2011-09-01 17:22:00 -07:00

1 2 3

144 Commits