mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

Author	SHA1	Message	Date
Justin Pettit	b2f4b622dd	dpif-netdev: Initialize 'tun_md' member of match. Found by valgrind. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2017-07-15 13:43:09 -07:00
Ilya Maximets	85a4f23811	dpif-netdev: Fix few comments. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-07-13 15:32:46 -07:00
Ilya Maximets	a3aa871111	dpif-netdev: Remove useless port checking. Since commit ff073a71f9bb ("dpif-netdev: Use hmap instead of list+array for tracking ports."), 'is_valid_port_number()' is equal to 'port_no != ODPP_NONE', and the expression below will never be true. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>	2017-07-11 13:08:21 -07:00
Ciara Loftus	656238ee92	dpif-netdev: Fix insertion probability emc_conditional_insert uses pmd->last_cycles and the packet's RSS hash to generate a random number used to determine whether or not an emc entry should be inserted. This works for single-packet bursts as last_cycles is updated for each burst. However, for bursts > 1 packet, where the packets in the batch generate the same RSS hash, pmd->last_cycles remains constant for the entire burst also, and thus cannot be used as a random number for each packet in the burst. This commit replaces the use of pmd->last_cycles with random_uint32() for this purpose and subsequently fixes the behavior of the emc_insert_inv_prob setting for high-throughput (large bursts) single-flow cases. Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert") Reported-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Darrell Ball <dlu998@gmail.com> Tested-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-07-11 13:03:08 -07:00
Antonio Fischetti	1401f6deb6	Fix coding style and some typos. Fixes some lines exceeding 80 chars and a couple of typos. Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-07-11 12:41:21 -07:00
Ciara Loftus	a2ac666d52	dpif-netdev: Change definitions of 'idle' & 'processing' cycles Instead of counting all polling cycles as processing cycles, only count the cycles where packets were received from the polling. Signed-off-by: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Co-authored-by: Georg Schmuecking <georg.schmuecking@ericsson.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-07-06 16:49:50 -07:00
Ben Pfaff	0722f34109	odp-util: Use port names in output in more places. Until now, ODP output only showed port names for in_port matches. This commit shows them in other places port numbers appear. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>	2017-06-23 16:28:42 +08:00
Bhanuprakash Bodireddy	1cc1b5f6b0	dpif-netdev: Skip invoking qsort on empty list. sorted_poll_list() returns the sorted list of rxqs mapped to PMD thread along with the rxq count. Skip sorting the list if there are no rxqs mapped to the PMD thread. This can be reproduced with manual pinning and 'dpif-netdev/pmd-rxq-show' command. Also Clang reports that null argument is passed to qsort in this case. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-20 10:33:46 +08:00
Ben Pfaff	81765c00a1	openvswitch.h: Use odp_port_t for port numbers in userspace-only structs. Using the correct type reduces the need for type conversions. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Reviewed-by: nickcooper-zhangtonghao <nic@opencloud.tech>	2017-06-20 07:35:49 +08:00
Paul Blakey	7e8b719938	dpctl: Add an option to dump only certain kinds of flows Usage: # to dump all datapath flows (default): ovs-dpctl dump-flows # to dump only flows that in kernel datapath: ovs-dpctl dump-flows type=ovs # to dump only flows that are offloaded: ovs-dpctl dump-flows type=offloaded Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2017-06-15 11:53:06 +02:00
Darrell Ball	4cddb1f0d8	dpdk: Parse NAT netlink for userspace datapath. Signed-off-by: Darrell Ball <dlu998@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 15:07:16 -07:00
Jan Scheurich	beb75a40fd	userspace: Switching of L3 packets in L2 pipeline Ports have a new layer3 attribute if they send/receive L3 packets. The packet_type included in structs dp_packet and flow is considered in ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite. A dummy ethernet header is pushed to L3 packets received from L3 ports before the the pipeline processing starts. The ethernet header is popped before sending a packet to a L3 port. For datapath ports that can receive L2 or L3 packets, the packet_type becomes part of the flow key for datapath flows and is handled appropriately in dpif-netdev. In the 'else' branch in flow_put_on_pmd() function, the additional check flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls lookup is sufficient to uniquely identify a flow and b) it caused false negatives because the flow in netdev->flow may not properly masked. In dpif_netdev_flow_put() we now use the same method for constructing the netdev_flow_key as the one used when adding the flow to the dplcs to make sure these always match. The function netdev_flow_key_from_flow() used so far was not only inefficient but sometimes caused mismatches and subsequent flow update failures. The kernel datapath does not support the packet_type match field. Instead it encodes the packet type implictly by the presence or absence of the Ethernet attribute in the flow key and mask. This patch filters the PACKET_TYPE attribute out of netlink flow key and mask to be sent to the kernel datapath. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-06-02 10:15:20 -07:00
Ben Pfaff	f582b6df9e	dpif-netdev: Fix use-after-free error in reconfigure_datapath(). Found by Coverity. Reported-at: https://scan3.coverity.com/reports.htm#v16889/p10449/fileInstanceId=14762915&defectInstanceId=4305352&mergedDefectId=180430 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2017-06-01 16:20:53 -07:00
Eelco Chaudron	34d8e04bec	dpif-netdev: The pmd-*-show commands will show info in core order The "ovs-appctl dpif-netdev/pmd-rxq-show" and "ovs-appctl dpif-netdev/pmd-stats-show" commands show their output per core_id, sorted on the hash location. My OCD was kicking in when using these commands, hence this change to display them in natural core_id order. In addition I had to change a test case that would fail if the cores where not in order in the hash list. This is due to OVS assigning queues to cores based on the order in the hash list. The test case now checks if any core has the set of queues in the given order. Manually tested this on my setup, and ran clang-analyze. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-18 15:40:33 -07:00
Bhanuprakash Bodireddy	1859876c04	dpif-netdev: Fix comments for dp_netdev_pmd_thread struct. The sorted subtable ranking patch introduced a classifier instance per ingress port with its subtables ranked on the frequency of hits. The PMD thread can have more classifier instances now and solely depends on the number of ingress ports currently handled by the pmd thread. Fixes: 3453b4d62a98 ("dpif-netdev: dpcls per in_port with sorted subtables") Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-18 14:17:38 -07:00
Bhanuprakash Bodireddy	65dcf3da40	dpif-netdev: Reorder elements in dp_netdev structure. 'emc_insert_min' variable is made to align on a 64-byte boundary and this introduces a 24 byte hole. This patch moves the emc_insert_min member variable slightly higher in the order to remove the hole and thus saves a cache line with the new ordering. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com>	2017-05-18 14:13:06 -07:00
Bhanuprakash Bodireddy	f79b1ddb84	dpif-netdev: Skip EMC lookup when EMC is disabled. Conditional EMC insert patch gives the flexibility to configure the probability of flow insertion in to EMC. This also allows an option to entirely disable EMC by setting 'emc-insert-inv-prob=0' which can be useful at large number of parallel flows. This patch skips EMC lookup when EMC is disabled. This is useful to avoid wasting CPU cycles and also improve performance considerably. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Georg Schmuecking <georg.schmuecking@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Darrell Ball dlu998@gmail.com	2017-05-18 14:09:28 -07:00
Kevin Traynor	47a45d868f	dpif-netdev/netdev-dpdk: Fix line lengths. Fix line lengths to be <= 79 as per coding style and so that checkpatch will not show up existing warnings on these files. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-18 13:48:38 -07:00
Joe Stringer	f5f64552ec	Revert "tunneling: Avoid recirculation on datapath." This reverts commit f1dac5128ce6db2e493f0d1c7a8b53fb9f34476f. When this commit was introduced, it broke the 'make check-system-userspace' testsuite. It appears that the new translation fails to modify the flow in a way that would represent the flow as an encapsulated flow when the traffic is patched through to the second bridge. As such, rather than matching on, for example, "ip,proto=47" for gre, it would use the inner packet's flow headers. It also results in problems reporting statistics, as the tunnel's header is not reflected in subsequent statistics and truncation is not properly applied during translation. While a refreshed approach to solving the above problem is formed, revert this patch. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/331972.html Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Greg Rose <gvrose8192@gmail.com>	2017-05-10 12:14:02 -07:00
Jan Scheurich	2482b0b0c8	userspace: Add packet_type in dp_packet and flow This commit adds a packet_type attribute to the structs dp_packet and flow to explicitly carry the type of the packet as prepration for the introduction of the so-called packet type-aware pipeline (PTAP) in OVS. The packet_type is a big-endian 32 bit integer with the encoding as specified in OpenFlow verion 1.5. The upper 16 bits contain the packet type name space. Pre-defined values are defined in openflow-common.h: enum ofp_header_type_namespaces { OFPHTN_ONF = 0, /* ONF namespace. / OFPHTN_ETHERTYPE = 1, / ns_type is an Ethertype. / OFPHTN_IP_PROTO = 2, / ns_type is a IP protocol number. / OFPHTN_UDP_TCP_PORT = 3, / ns_type is a TCP or UDP port. / OFPHTN_IPV4_OPTION = 4, / ns_type is an IPv4 option number. */ }; The lower 16 bits specify the actual type in the context of the name space. Only name spaces 0 and 1 will be supported for now. For name space OFPHTN_ONF the relevant packet type is 0 (Ethernet). This is the default packet_type in OVS and the only one supported so far. Packets of type (OFPHTN_ONF, 0) are called Ethernet packets. In name space OFPHTN_ETHERTYPE the type is the Ethertype of the packet. A packet of type (OFPHTN_ETHERTYPE, <Ethertype>) is a standard L2 packet whith the Ethernet header (and any VLAN tags) removed to expose the L3 (or L2.5) payload of the packet. These will simply be called L3 packets. The Ethernet address fields dl_src and dl_dst in struct flow are not applicable for an L3 packet and must be zero. However, to maintain compatibility with the large code base, we have chosen to copy the Ethertype of an L3 packet into the the dl_type field of struct flow. This does not mean that it will be possible to match on dl_type for L3 packets with PTAP later on. Matching must be done on packet_type instead. New dp_packets are initialized with packet_type Ethernet. Ports that receive L3 packets will have to explicitly adjust the packet_type. Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-05-03 16:56:40 -07:00
Andy Zhou	333ad77dbd	ofproto-dpif: Add 'meter_ids' to backer Add 'meter_ids', an id-pool object to manage datapath meter id, i.e. provider_meter_id. Currently, only userspace datapath supports meter, and it implements the provider_meter_id management. Moving this function to 'backer' allows other datapath implementation to share the same logic. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-04-28 14:22:03 -07:00
Jarno Rajahalme	8e83854cf2	datapath: Add eventmask support to CT action. Upstream commit: commit 120645513f55a4ac5543120d9e79925d30a0156f Author: Jarno Rajahalme <jarno@ovn.org> Date: Fri Apr 21 16:48:06 2017 -0700 openvswitch: Add eventmask support to CT action. Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK, which can be used in conjunction with the commit flag (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which conntrack events (IPCT_*) should be delivered via the Netfilter netlink multicast groups. Default behavior depends on the system configuration, but typically a lot of events are delivered. This can be very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some types of events are of interest. Netfilter core init_conntrack() adds the event cache extension, so we only need to set the ctmask value. However, if the system is configured without support for events, the setting will be skipped due to extension not being found. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2017-04-27 10:34:42 -07:00
Sugesh Chandran	f1dac5128c	tunneling: Avoid recirculation on datapath. Open vSwitch datapath recirculates packets for tunneling, i.e. the incoming packets are encapsulated at first pass. Further actions are applied on encapsulated packets on the second pass after recirculating. The proposed patch compute and append the post tunnel actions at the time of translation itself instead of recirculating at datapath. These actions are solely depends on tunnel attributes so there is no need of datapath recirculation. By avoiding the recirculation at datapath, the patch offers up to 30% performance improvement for VXLAN tunneling in our testing. The action execution logic is using the new CLONE action to define the packet cloning when the actions are combined. The length in the CLONE action specifies the size of nested action set. It also fixing the testsuite failures that are introduced by nested CLONE action in tunneling. Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-04-21 15:57:38 -07:00
Eric Garver	f0fb825a37	Add support for 802.1ad (QinQ tunneling) Flow key handling changes: - Add VLAN header array in struct flow, to record multiple 802.1q VLAN headers. - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN, increase the maximum depth of nested OVS_KEY_ATTR_ENCAP. Refactor VLAN handling in dpif-xlate: - Introduce 'xvlan' to track VLAN stack during flow processing. - Input and output VLAN translation according to the xbundle type. Push VLAN action support: - Allow ethertype 0x88a8 in VLAN headers and push_vlan action. - Support push_vlan on dot1q packets. Use other_config:vlan-limit in table Open_vSwitch to limit maximum VLANs that can be matched. This allows us to preserve backwards compatibility. Add test cases for VLAN depth limit, Multi-VLAN actions and QinQ VLAN handling Co-authored-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Co-authored-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-03-16 15:18:40 -07:00
Jarno Rajahalme	a76a37efec	conntrack: Force commit. Userspace support for force commit. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2017-03-08 17:23:57 -08:00
Jarno Rajahalme	b80e259f8e	datapath: Add force commit. Upstream patch: commit dd41d33f0b033885211a5d6f3ee19e73238aa9ee Author: Jarno Rajahalme <jarno@ovn.org> Date: Thu Feb 9 11:22:00 2017 -0800 openvswitch: Add force commit. Stateful network admission policy may allow connections to one direction and reject connections initiated in the other direction. After policy change it is possible that for a new connection an overlapping conntrack entry already exists, where the original direction of the existing connection is opposed to the new connection's initial packet. Most importantly, conntrack state relating to the current packet gets the "reply" designation based on whether the original direction tuple or the reply direction tuple matched. If this "directionality" is wrong w.r.t. to the stateful network admission policy it may happen that packets in neither direction are correctly admitted. This patch adds a new "force commit" option to the OVS conntrack action that checks the original direction of an existing conntrack entry. If that direction is opposed to the current packet, the existing conntrack entry is deleted and a new one is subsequently created in the correct direction. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>	2017-03-08 17:23:46 -08:00
Jarno Rajahalme	4b27db644a	dpif-netdev: Simple DROP meter implementation. Meters may be used by any flow, so some kind of locking must be used. In this version we have an adaptive mutex for each meter, which may not be optimal for DPDK. However, this should serve as a basis for further improvement. A batch of packets is first tried as a whole, and only if some of the meter bands are hit, we need to process the packets individually. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>	2017-03-08 13:09:44 -08:00
Jarno Rajahalme	5dddf96065	dpif: Meter framework. Add DPIF-level infrastructure for meters. Allow meter_set to modify the meter configuration (e.g. set the burst size if unspecified). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: Andy Zhou <azhou@ovn.org>	2017-03-08 13:09:43 -08:00
Yang, Yi Y	6fcecb85ab	datapath: add Ethernet push and pop actions Upstream commit: commit 91820da6ae85904d95ed53bf3a83f9ec44a6b80a Author: Jiri Benc <jbenc@redhat.com> Date: Thu Nov 10 16:28:23 2016 +0100 openvswitch: add Ethernet push and pop actions It's not allowed to push Ethernet header in front of another Ethernet header. It's not allowed to pop Ethernet header if there's a vlan tag. This preserves the invariant that L3 packet never has a vlan tag. Based on previous versions by Lorand Jakab and Simon Horman. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> [Committer notes] Fix build with the upstream commit by folding in the required switch case enum handlers. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Joe Stringer <joe@ovn.org>	2017-03-02 15:51:39 -08:00
Ciara Loftus	4c30b24602	dpif-netdev: Conditional EMC insert Unconditional insertion of EMC entries results in EMC thrashing at high numbers of parallel flows. When this occurs, the performance of the EMC often falls below that of the dpcls classifier, rendering the EMC practically useless. Instead of unconditionally inserting entries into the EMC when a miss occurs, use a 1% probability of insertion. This ensures that the most frequent flows have the highest chance of creating an entry in the EMC, and the probability of thrashing the EMC is also greatly reduced. The probability of insertion is configurable, via the other_config:emc-insert-inv-prob option. This value sets the average probability of insertion to 1/emc-insert-inv-prob. For example the following command changes the insertion probability to (on average) 1 in every 20 packets ie. 1/20 ie. 5%. ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=20 Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Georg Schmuecking <georg.schmuecking@ericsson.com> Co-authored-by: Georg Schmuecking <georg.schmuecking@ericsson.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2017-02-16 11:46:17 -08:00
Daniele Di Proietto	d4f6865c3f	dpif-netdev: Pass Openvswitch other_config smap to dpif. Currently we parse the 'other_config' column in Openvswitch table in bridge.c. We extract the values (just 'pmd-cpu-mask' for now) and we pass them down to the datapath, via different layers. If we want to pass other values to dpif-netdev.c (like we recently discussed) we would have to touch ofproto.c, ofproto-dpif.c and dpif.c. This patch sends the entire other_config column to dpif-netdev, so that dpif-netdev can extract the values it's interested in. No functional change. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>	2017-02-03 09:45:42 -08:00
Andy Zhou	72c84bc2db	dp-packet: Enhance packet batch APIs. One common use case of 'struct dp_packet_batch' is to process all packets in the batch in order. Add an iterator for this use case to simplify the logic of calling sites, Another common use case is to drop packets in the batch, by reading all packets, but writing back pointers of fewer packets. Add macros to support this use case. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-26 17:35:29 -08:00
Andy Zhou	535e3acfa7	dpif-netdev: Add clone action Add support for userspace datapath clone action. The clone action provides an action envelope to enclose an action list. For example, with actions A, B, C and D, and an action list: A, clone(B, C), D The clone action will ensure that: - D will see the same packet, and any meta states, such as flow, as action B. - D will be executed regardless whether B, or C drops a packet. They can only drop a clone. - When B drops a packet, clone will skip all remaining actions within the clone envelope. This feature is useful when we add meter action later: The meter action can be implemented as a simple action without its own envolop (unlike the sample action). When necessary, the flow translation layer can enclose a meter action in clone. The clone action is very similar with the OpenFlow clone action. This is by design to simplify vswitchd flow translation logic. Without datapath clone, vswitchd simulate the effect by inserting datapath actions to "undo" clone actions. The above flow will be translated into A, B, C, -C, -B, D. However, there are two issues: - The resulting datapath action list may be longer without using clone. - Some actions, such as NAT may not be possible to reverse. This patch implements clone() simply with packet copy. The performance can be improved with later patches, for example, to delay or avoid packet copy if possible. It seems datapath should have enough context to carry out such optimization without the userspace context. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-23 22:58:34 -08:00
Andy Zhou	0526761391	dpif-netdev: Avoid sending probe packets When ofproto probe for datapath features, no packets should actually be sent to the network. This pactch fixes the userspace by dropping probe packets before action execution. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-23 22:58:07 -08:00
nickcooper-zhangtonghao	aeff7d9886	dpif-netdev: Avoids repeated addition of DP_STAT_LOST. CC: Daniele Di Proietto <diproiettod@vmware.com> Fixes: 8aaa125dab66 ("dpif-netdev: Share emc and fast path output batches.") Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2017-01-16 18:53:05 -08:00
Daniele Di Proietto	e32971b8dd	dpif-netdev: Centralized threads and queues handling code. Currently we have three different code paths that deal with pmd threads and queues, in response to different input 1. When a port is added 2. When a port is deleted 3. When the cpumask changes or a port must be reconfigured. 1. and 2. are carefully written to minimize disruption to the running datapath, while 3. brings down all the threads reconfigure all the ports and restarts everything. This commit removes the three separate code paths by introducing the reconfigure_datapath() function, that takes care of adapting the pmd threads and queues to the current datapath configuration, no matter how we got there. This aims at simplifying maintenance and introduces a long overdue improvement: port reconfiguration (can happen quite frequently for dpdkvhost ports) is now done without shutting down the whole datapath, but just by temporarily removing the port that needs to be reconfigured (while the rest of the datapath is running). We now also recompute the rxq scheduling from scratch every time a port is added of deleted. This means that the queues will be more balanced, especially when dealing with explicit rxq-affinity from the user (without shutting down the threads and restarting them), but it also means that adding or deleting a port might cause existing queues to be moved between pmd threads. This negative effect can be avoided by taking into account the existing distribution when computing the new scheduling, but I considered code clarity and fast reconfiguration more important than optimizing port addition or removal (a port is added and removed only once, but can be reconfigured many times) Lastly, this commit moves the pmd threads state away from ovs-numa. Now the pmd threads state is kept only in dpif-netdev. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Co-authored-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:12 -08:00
Daniele Di Proietto	947dc56767	dpif-netdev: Use hmap for poll_list in pmd threads. A future commit will use this to determine if a queue is already contained in a pmd thread. To keep the behavior unaltered we now have to sort queues before printing them in pmd_info_show_rxq(). Also this commit introduces 'struct polled_queue' that will be used exclusively in the fast path, uses 'struct dp_netdev_rxq' from 'struct rxq_poll' and uses 'rx' for 'netdev_rxq' and 'rxq' for 'dp_netdev_rxq'. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:12 -08:00
Daniele Di Proietto	f5d317a156	dpctl: Avoid making assumptions on pmd threads. Currently dpctl depends on ovs-numa module to delete and create flows on different pmd threads for pmd devices. The next commits will move away the pmd threads state from ovs-numa to dpif-netdev, so the ovs-numa interface will not be supported. Also, the assignment between ports and thread is an implementation detail of dpif-netdev, dpctl shouldn't know anything about it. This commit changes the dpif_flow_put() and dpif_flow_del() calls to iterate over all the pmd threads, if pmd_id is PMD_ID_NULL. A simple test is added. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:12 -08:00
Daniele Di Proietto	82d765f6f8	dpif-netdev: Make 'static_tx_qid' const. Since previous commit, 'static_tx_qid' doesn't need to be atomic and is actually never touched (except for initialization), so it can be made const. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	b9584f2122	dpif-netdev: Create pmd threads for every numa node. A lot of the complexity in the code that handles pmd threads and ports in dpif-netdev is due to the fact that we postpone the creation of pmd threads on a numa node until we have a port that needs to be polled on that particular node. Since the previous commit, a pmd thread with no ports will not consume any CPU, so it seems easier to create all the threads at once. This will also make future commits easier. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	2788a1b138	dpif-netdev: Block pmd threads if there are no ports. There's no reason for a pmd thread to perform its main loop if there are no queues in its poll_list. This commit introduces a seq object on which the pmd thread can be blocked, if there are no queues. When the main thread wants to reload a pmd threads it must now change the seq object (in case it's blocked) and set 'reload' to true. This is useful to avoid wasting CPU cycles and is also necessary for a future commit. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	14e3e12ac3	dpif-netdev: Use a boolean instead of pmd->port_seq. There's no need for a sequence number, since the main thread has to wait for the pmd thread, so there's no chance that an update will be undetected. A seq object will be introduced for another purpose in the next commit, and changing this to boolean makes the code more readable. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	57eebbb4c3	dpif-netdev: Don't try to output on a device without txqs. Tunnel devices have 0 txqs and don't support netdev_send(). While netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed on output, and that might be confused by devices with no txqs. It seems better to have different structures in the fast path for ports that support netdev_{push,pop}_header (tunnel devices), and ports that support netdev_send. With this we can also remove a branch in netdev_send(). This is also necessary for a future commit, which starts DPDK devices without txqs. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	febf4a7a87	dpif-netdev: Take non_pmd_mutex to access tx cached ports. As documented in dp_netdev_pmd_thread, we must take non_pmd_mutex to access the tx port caches for the non pmd thread. Found by inspection. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2017-01-15 19:25:11 -08:00
Daniele Di Proietto	7c26997257	dpif-netdev: Fix memory leak. We keep all the per-port classifiers around, since they can be reused, but when a pmd thread is destroyed we should free them. Found using valgrind. Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted subtables") Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org>	2017-01-15 19:25:11 -08:00
Jarno Rajahalme	f4b835bb0f	dpcls: Avoid one 8-byte chunk in subtable mask. This patch allows to skip the 8-byte chunk comprising of dp_hash and in_port in the subtable mask when dp_hash is wildcarded. This will slightly speed up the hash computation as one expensive function call to hash_add64() can be skipped. For each new netdev flow we wildcard in_port in the mask, so in the typical case where dp_hash is also wildcarded, the resulting 8-byte chunk will not be part of the subtable mask. This manipulation of the mask is possible as the datapath classifier is explicitly selected based on the in_port value, so that all the datapath flows in the selected classifier have an exact match on that in_port value. Given this, it is safe to ignore the in_port value when doing a lookup in the chosen classifier. Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Co-authored-by: Jarno Rajahalme <jarno@ovn.org>	2017-01-10 14:11:02 -08:00
nickcooper-zhangtonghao	51c37a56d7	dpif-netdev: Uses the OVS_CORE_UNSPEC instead of magic numbers. This patch uses OVS_CORE_UNSPEC for the queue unpinned instead of "-1". More important, the "-1" casted to unsigned int is equal to NON_PMD_CORE_ID. We make the distinction between them. Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2017-01-08 18:16:06 -08:00
Daniele Di Proietto	0f6a066f63	dpif: Return ENODEV from dpif_port_query_by_*() if there's no port. bridge_delete_or_reconfigure() deletes every interface that's not dumped by OFPROTO_PORT_FOR_EACH(). ofproto_dpif.c:port_dump_next(), used by OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by calling port_query_by_name(). If port_query_by_name() returns an error, the dump is interrupted. If port_query_by_name() returns ENODEV, the device doesn't exist and the dump can continue. port_query_by_name() for the userspace datapath returns ENOENT instead of ENODEV. This is expected by dpif_port_query_by_name(), but it's not handled correctly by port_dump_next(). dpif-netdev handles reconfiguration errors for an interface by deleting it from the datapath, so it's possible that a device is missing. When this happens we must make sure that port_dump_next() continues to dump other devices, otherwise they will be deleted and the two layers will have an inconsistent view. This commit fixes the problem by returning ENODEV from the userspace datapath if the port doesn't exist, and by documenting this clearly in the dpif interfaces. The problem was found while developing new code. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>	2017-01-06 15:12:44 -08:00
nickcooper-zhangtonghao	1dea14357f	ovs-vswitchd: Avoid segfault for "netdev" datapath. When the datapath, whose type is "netdev", processes packets in userspce action, it may cause a segmentation fault. In the dp_execute_userspace_action(), we pass the "wc" argument to dp_netdev_upcall() using NULL. In the dp_netdev_upcall() call tree, the "wc" will be used. For example, dp_netdev_upcall() uses the &wc->masks for debugging, and flow_wildcards_init_for_packet() uses the "wc" if we disable megaflow, which is described in more detail below. Segmentation fault in flow_wildcards_init_for_packet: #0 0x0000000000468fe8 flow_wildcards_init_for_packet lib/flow.c:1275 #1 0x0000000000436c0b upcall_cb ofproto/ofproto-dpif-upcall.c:1231 #2 0x000000000045bd96 dp_netdev_upcall lib/dpif-netdev.c:3857 #3 0x0000000000461bf3 dp_execute_userspace_action lib/dpif-netdev.c:4388 #4 dp_execute_cb lib/dpif-netdev.c:4521 #5 0x0000000000486ae2 odp_execute_actions lib/odp-execute.c:538 #6 0x00000000004607f9 dp_netdev_execute_actions lib/dpif-netdev.c:4627 #7 packet_batch_per_flow_execute lib/dpif-netdev.c:3927 #8 dp_netdev_input__ lib/dpif-netdev.c:4229 #9 0x0000000000460ba8 dp_netdev_input lib/dpif-netdev.c:4238 #10 dp_netdev_process_rxq_port lib/dpif-netdev.c:2873 #11 0x000000000046126e dpif_netdev_run lib/dpif-netdev.c:3000 #12 0x000000000042baf5 type_run ofproto/ofproto-dpif.c:504 #13 0x00000000004192bf ofproto_type_run ofproto/ofproto.c:1687 #14 0x0000000000409965 bridge_run__ vswitchd/bridge.c:2875 #15 0x000000000040f145 bridge_run vswitchd/bridge.c:2938 #16 0x00000000004062e5 main vswitchd/ovs-vswitchd.c:111 Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-12-09 10:43:27 -08:00
Joe Stringer	8611f9a468	lib: Use nl_attr_get_odp_port(). This helper is a little tidier than the alternative. Use it treewide. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Simon Horman <simon.horman@netronome.com>	2016-11-16 11:53:50 -08:00

1 2 3 4 5 ...

576 Commits