mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

Author	SHA1	Message	Date
Sriharsha Basavapatna via dev	57924fc91c	revalidator: Rebalance offloaded flows based on the pps rate This is the third patch in the patch-set to support dynamic rebalancing of offloaded flows. The dynamic rebalancing functionality is implemented in this patch. The ukeys that are not scheduled for deletion are obtained and passed as input to the rebalancing routine. The rebalancing is done in the context of revalidation leader thread, after all other revalidator threads are done with gathering rebalancing data for flows. For each netdev that is in OOR state, a list of flows - both offloaded and non-offloaded (pending) - is obtained using the ukeys. For each netdev that is in OOR state, the flows are grouped and sorted into offloaded and pending flows. The offloaded flows are sorted in descending order of pps-rate, while pending flows are sorted in ascending order of pps-rate. The rebalancing is done in two phases. In the first phase, we try to offload all pending flows and if that succeeds, the OOR state on the device is cleared. If some (or none) of the pending flows could not be offloaded, then we start replacing an offloaded flow that has a lower pps-rate than a pending flow, until there are no more pending flows with a higher rate than an offloaded flow. The flows that are replaced from the device are added into kernel datapath. A new OVS configuration parameter "offload-rebalance", is added to ovsdb. The default value of this is "false". To enable this feature, set the value of this parameter to "true", which provides packets-per-second rate based policy to dynamically offload and un-offload flows. Note: This option can be enabled only when 'hw-offload' policy is enabled. It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow offload errors (specifically ENOSPC error this feature depends on) reported by an offloaded device are supressed by TC-Flower kernel module. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-10-19 11:27:52 +02:00
Ilya Maximets	35fe9efb2f	dpif-netdev: Add vlan to mask for flow_put operation. Datapath flows in dpif-netdev classifier always has exact match mask set for vlan. We have to enable it for flow_put operation too in order to avoid flow modification failure due to classifier lookup with wrong hash. Found by OFtest. CC: Jan Scheurich <jan.scheurich@ericsson.com> Fixes: beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") Reported-by: Ben Pfaff <blp@ovn.org> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/352579.html Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-10-09 10:26:39 -07:00
Kevin Traynor	e77c97b9d6	dpif-netdev: Add round-robin based rxq to pmd assignment. Prior to OVS 2.9 automatic assignment of Rxqs to PMDs (i.e. CPUs) was done by round-robin. That was changed in OVS 2.9 to ordering the Rxqs based on their measured processing cycles. This was to assign the busiest Rxqs to different PMDs, improving aggregate throughput. For the most part the new scheme should be better, but there could be situations where a user prefers a simple round-robin scheme because Rxqs from a single port are more likely to be spread across multiple PMDs, and/or traffic is very bursty/unpredictable. Add 'pmd-rxq-assign' config to allow a user to select round-robin based assignment. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-09-14 11:45:05 +01:00
Gavi Teitz	a692410af0	dpctl: Expand the flow dump type filter Added new types to the flow dump filter, and allowed multiple filter types to be passed at once, as a comma separated list. The new types added are: * tc - specifies flows handled by the tc dp * non-offloaded - specifies flows not offloaded to the HW * all - specifies flows of all types The type list is now fully parsed by the dpctl, and a new struct was added to dpif which enables dpctl to define which types of dumps to provide, rather than passing the type string and having dpif parse it. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-09-13 16:56:25 +02:00
Gavi Teitz	0d6b401cf6	dpif-netdev: Initialize dpif_flow attrs In a previous commit, the dpif_flow struct was expanded, with the 'offloaded' field being moved into a new struct which also includes a field for the dp layer the flow is handled on. The initialization of these fields was only done in dpif-netlink. This completes that commit, by initializing the fields in dpif-netdev as well. As the 'offloaded' field was previously ignored by dpif-netdev, the attrs are initialized to the default values of 'false' for the offloaded state, and 'ovs' for the dp layer. Fixes: d63ca5329ff9 ("dpctl: Properly reflect a rule's offloaded to HW state") Signed-off-by: Gavi Teitz <gavi@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2018-09-13 16:56:25 +02:00
Justin Pettit	866bc7567a	dpif-netdev: Prevent unsafe access when retrieving meter stats. dpif_netdev_meter_get() retrieved a pointer to a meter entry without holding a lock. It's possible that another thread could have deleted that entry between retrieving the pointer and dereferencing the pointer. This makes the function hold the lock the entire time the meter entry is needed. Found by inspection. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org>	2018-09-04 13:36:37 -07:00
Justin Pettit	d0db81eac8	dpif-netdev: Don't check if xcalloc() failed when creating meter. xcalloc() can't return null. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-09-04 13:36:37 -07:00
Vishal Deep Ajmera	9b4f08cdca	dpif-netdev: Avoid reordering of packets in a batch with same megaflow OVS reads packets in batches from a given port and packets in the batch are subjected to potentially 3 levels of lookups to identify the datapath megaflow entry (or flow) associated with the packet. Each megaflow entry has a dedicated buffer in which packets that match the flow classification criteria are collected. This buffer helps OVS perform batch processing for all packets associated with a given flow. Each packet in the received batch is first subjected to lookup in the Exact Match Cache (EMC). Each EMC entry will point to a flow. If the EMC lookup is successful, the packet is moved from the rx batch to the per-flow buffer. Packets that did not match any EMC entry are rearranged in the rx batch at the beginning and are now subjected to a lookup in the megaflow cache. Packets that match a megaflow cache entry are appended to the per-flow buffer. Packets that do not match any megaflow entry are subjected to slow-path processing through the upcall mechanism. This cannot change the order of packets as by definition upcall processing is only done for packets without matching megaflow entry. The EMC entry match fields encompass all potentially significant header fields, typically more than specified in the associated flow's match criteria. Hence, multiple EMC entries can point to the same flow. Given that per-flow batching happens at each lookup stage, packets belonging to the same megaflow can get re-ordered because some packets match EMC entries while others do not. The following example can illustrate the issue better. Consider following batch of packets (labelled P1 to P8) associated with a single TCP connection and associated with a single flow. Let us assume that packets with just the ACK bit set in TCP flags have been received in a prior batch also and a corresponding EMC entry exists. 1. P1 (TCP Flag: ACK) 2. P2 (TCP Flag: ACK) 3. P3 (TCP Flag: ACK) 4. P4 (TCP Flag: ACK, PSH) 5. P5 (TCP Flag: ACK) 6. P6 (TCP Flag: ACK) 7. P7 (TCP Flag: ACK) 8. P8 (TCP Flag: ACK) The megaflow classification criteria does not include TCP flags while the EMC match criteria does. Thus, all packets other than P4 match the existing EMC entry and are moved to the per-flow packet batch. Subsequently, packet P4 is moved to the same per-flow packet batch as a result of the megaflow lookup. Though the packets have all been correctly classified as being associated with the same flow, the packet order has not been preserved because of the per-flow batching performed during the EMC lookup stage. This packet re-ordering has performance implications for TCP applications. This patch preserves the packet ordering by performing the per-flow batching after both the EMC and megaflow lookups are complete. As an optimization, packets are flow-batched in emc processing till any packet in the batch has an EMC miss. A new flow map is maintained to keep the original order of packet along with flow information. Post fastpath processing, packets from flow map are appended to per-flow buffer. Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-27 17:48:23 +01:00
Yi-Hung Wei	cd015a11c2	dpif: Support conntrack zone limit. This patch defines the dpif interface to support conntrack per zone limit. Basically, OVS users can use this interface to set, delete, and get the conntrack per zone limit for various dpif interfaces. The following patch will make use of the proposed interface to implement the feature. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>	2018-08-17 09:30:55 -07:00
Justin Pettit	8101f03fcd	dpif: Don't pass in '*meter_id' to meter_set commands. The original intent of the API appears to be that the underlying DPIF implementaion would choose a local meter id. However, neither of the existing datapath meter implementations (userspace or Linux) implemented that; they expected a valid meter id to be passed in, otherwise they returned an error. This commit follows the existing implementations and makes the API somewhat cleaner. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-08-16 10:20:52 -07:00
Ilya Maximets	18e08953cf	dpif-netdev: Fix zero length keys insertion to EMC. 'key.len' should be calculated before inserting to EMC, otherwise resulting entry will match with any packet with the same hash. CC: Yipeng Wang <yipeng1.wang@intel.com> Fixes: 60d8ccae135f ("dpif-netdev: Add SMC cache after EMC cache") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-08-08 22:06:21 +01:00
Justin Pettit	6508c845ad	dpif: Move common meter checks into the dpif layer. Another dpif provider will soon add support for meters, so move some of the common sanity checks up into the dpif layer so that each provider doesn't need to re-implement them. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-30 13:00:49 -07:00
Justin Pettit	f603d7d262	Revert "dpif-netdev: Use compatible function type to fix broken build." Commit ab15e70eb587 ("dpctl: Expand the flow dump type filter") will be reverted, which this patch fixed, so it needs to be reverted as well. This reverts commit b10ac772218afd4f296db866f6b80258e1d1ca8a. CC: Gavi Teitz <gavi@mellanox.com> CC: Simon Horman <simon.horman@netronome.com> CC: Roi Dayan <roid@mellanox.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-25 14:17:33 -07:00
Aaron Conole	b10ac77221	dpif-netdev: Use compatible function type to fix broken build. The dpif_provder flow_dump_create function signature was changed, but the netdev dpif was not updated along with it. This generated a build error with the following warnings: libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Wno-null-pointer-arithmetic -Werror -Werror -g -O2 -MT lib/dpif-netdev.lo -MD -MP -MF lib/.deps/dpif-netdev.Tpo -c lib/dpif-netdev.c -o lib/dpif-netdev.o lib/dpif-netdev.c:6812:5: error: initialization from incompatible pointer type [-Werror] dpif_netdev_flow_dump_create, ^ lib/dpif-netdev.c:6812:5: error: (near initialization for 'dpif_netdev_class.flow_dump_create') [-Werror] Fixes: ab15e70eb587 ("dpctl: Expand the flow dump type filter") Cc: Gavi Teitz <gavi@mellanox.com> Cc: Roi Dayan <roid@mellanox.com> Cc: Simon Horman <simon.horman@netronome.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-07-25 11:35:23 -07:00
Yipeng Wang	60d8ccae13	dpif-netdev: Add SMC cache after EMC cache This patch adds a signature match cache (SMC) after exact match cache (EMC). The difference between SMC and EMC is SMC only stores a signature of a flow thus it is much more memory efficient. With same memory space, EMC can store 8k flows while SMC can store 1M flows. It is generally beneficial to turn on SMC but turn off EMC when traffic flow count is much larger than EMC size. SMC cache will map a signature to an dp_netdev_flow index in flow_table. Thus, we add two new APIs in cmap for lookup key by index and lookup index by key. For now, SMC is an experimental feature that it is turned off by default. One can turn it on using ovsdb options. Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-24 17:01:03 +01:00
Justin Pettit	425a7b9eaf	dpif-netdev: Fix a couple of comments for dp_netdev_run_meter(). Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-07-06 14:23:27 -07:00
Yuanhan Liu	02bb2824e5	dpif-netdev: do hw flow offload in a thread Currently, the major trigger for hw flow offload is at upcall handling, which is actually in the datapath. Moreover, the hw offload installation and modification is not that lightweight. Meaning, if there are so many flows being added or modified frequently, it could stall the datapath, which could result to packet loss. To diminish that, all those flow operations will be recorded and appended to a list. A thread is then introduced to process this list (to do the real flow offloading put/del operations). This could leave the datapath as lightweight as possible. Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-06 10:32:52 +01:00
Yuanhan Liu	aab96ec4d8	dpif-netdev: retrieve flow directly from the flow mark So that we could skip some very costly CPU operations, including but not limiting to miniflow_extract, emc lookup, dpcls lookup, etc. Thus, performance could be greatly improved. A PHY-PHY forwarding with 1000 mega flows (udp,tp_src=1000-1999) and 1 million streams (tp_src=1000-1999, tp_dst=2000-2999) show more that 260% performance boost. Note that though the heavy miniflow_extract is skipped, we still have to do per packet checking, due to we have to check the tcp_flags. Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-06 10:32:52 +01:00
Yuanhan Liu	241bad15d9	dpif-netdev: associate flow with a mark id Most modern NICs have the ability to bind a flow with a mark, so that every packet matches such flow will have that mark present in its descriptor. The basic idea of doing that is, when we receives packets later, we could directly get the flow from the mark. That could avoid some very costly CPU operations, including (but not limiting to) miniflow_extract, emc lookup, dpcls lookup, etc. Thus, performance could be greatly improved. Thus, the major work of this patch is to associate a flow with a mark id (an uint32_t number). The association in netdev datapath is done by CMAP, while in hardware it's done by the rte_flow MARK action. One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the case there is only one phys port but with 2 queues, there could be 2 PMDs. In other words, even for a single mega flow (i.e. udp,tp_src=1000), there could be 2 different dp_netdev flows, one for each PMD. That could results to the same mega flow being offloaded twice in the hardware, worse, we may get 2 different marks and only the last one will work. To avoid that, a megaflow_to_mark CMAP is created. An entry will be added for the first PMD that wants to offload a flow. For later PMDs, it will see such megaflow is already offloaded, then the flow will not be offloaded to HW twice. Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what the mark_to_flow CMAP is for. When the first PMD wants to offload a flow, it allocates a new mark and performs the flow offload by reusing the ->flow_put method. When it succeeds, a "mark to flow" entry will be added. For later PMDs, it will get the corresponding mark by above megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added. Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Co-authored-by: Finn Christensen <fc@napatech.com> Signed-off-by: Finn Christensen <fc@napatech.com> Co-authored-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-07-06 10:32:52 +01:00
Ben Pfaff	5a0e4aec1a	treewide: Convert leading tabs to spaces. It's always been OVS coding style to use spaces rather than tabs for indentation, but some tabs have snuck in over time. This commit converts them to spaces. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2018-06-11 15:32:00 -07:00
Ben Pfaff	fa37affad3	Embrace anonymous unions. Several OVS structs contain embedded named unions, like this: struct { ... union { ... } u; }; C11 standardized a feature that many compilers already implemented anyway, where an embedded union may be unnamed, like this: struct { ... union { ... }; }; This is more convenient because it allows the programmer to omit "u." in many places. OVS already used this feature in several places. This commit embraces it in several others. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org> Tested-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>	2018-05-25 13:36:05 -07:00
Ilya Maximets	47e1b3b625	dpif-netdev: Free packets on TUNNEL_PUSH if should_steal. Unconditional return may cause packet leak in case of 'should_steal == true'. Additionally, removed redundant checking for depth level. CC: Sugesh Chandran <sugesh.chandran@intel.com> Fixes: 7c12dfc527a5 ("tunneling: Avoid datapath-recirc by combining recirc actions at xlate.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokess <ian.stokes@intel.com>	2018-05-25 09:09:50 +01:00
Eelco Chaudron	606f665072	netdev-dpdk: Don't use PMD driver if not configured successfully When initialization of the DPDK PMD driver fails (dpdk_eth_dev_init()), the reconfigure_datapath() function will remove the port from dp_netdev, and the port is not used. Now when bridge_reconfigure() is called again, no changes to the previous failing netdev configuration are detected and therefore the ports gets added to dp_netdev and used uninitialized. This is causing exceptions... The fix has two parts to it. First in netdev-dpdk.c we remember if the DPDK port was started or not, and when calling netdev_dpdk_reconfigure() we also try re-initialization if the port was not already active. The second part of the change is in dpif-netdev.c where it makes sure netdev_reconfigure() is called if the port needs reconfiguration, as netdev_is_reconf_required() is only true until netdev_reconfigure() is called (even if it fails). Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-25 09:09:50 +01:00
Darrell Ball	7d7ded7af7	odp-execute: Rename 'may_steal' to 'should_steal'. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-05-23 11:36:47 -07:00
Jan Scheurich	7178fefbdf	dpif-netdev: Detection and logging of suspicious PMD iterations This patch enhances dpif-netdev-perf to detect iterations with suspicious statistics according to the following criteria: - iteration lasts longer than US_THR microseconds (default 250). This can be used to capture events where a PMD is blocked or interrupted for such a period of time that there is a risk for dropped packets on any of its Rx queues. - max vhost qlen exceeds a threshold Q_THR (default 128). This can be used to infer virtio queue overruns and dropped packets inside a VM, which are not visible in OVS otherwise. Such suspicious iterations can be logged together with their iteration statistics to be able to correlate them to packet drop or other events outside OVS. A new command is introduced to enable/disable logging at run-time and to adjust the above thresholds for suspicious iterations: ovs-appctl dpif-netdev/pmd-perf-log-set on \| off [-b before] [-a after] [-e\|-ne] [-us usec] [-q qlen] Turn logging on or off at run-time (on\|off). -b before: The number of iterations before the suspicious iteration to be logged (default 5). -a after: The number of iterations after the suspicious iteration to be logged (default 5). -e: Extend logging interval if another suspicious iteration is detected before logging occurs. -ne: Do not extend logging interval (default). -q qlen: Suspicious vhost queue fill level threshold. Increase this to 512 if the Qemu supports 1024 virtio queue length. (default 128). -us usec: change the duration threshold for a suspicious iteration (default 250 us). Note: Logging of suspicious iterations itself consumes a considerable amount of processing cycles of a PMD which may be visible in the iteration history. In the worst case this can lead OVS to detect another suspicious iteration caused by logging. If more than 100 iterations around a suspicious iteration have been logged once, OVS falls back to the safe default values (-b 5/-a 5/-ne) to avoid that logging itself causes continuos further logging. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Jan Scheurich	79f368756c	dpif-netdev: Detailed performance stats for PMDs This patch instruments the dpif-netdev datapath to record detailed statistics of what is happening in every iteration of a PMD thread. The collection of detailed statistics can be controlled by a new Open_vSwitch configuration parameter "other_config:pmd-perf-metrics". By default it is disabled. The run-time overhead, when enabled, is in the order of 1%. The covered metrics per iteration are: - cycles - packets - (rx) batches - packets/batch - max. vhostuser qlen - upcalls - cycles spent in upcalls This raw recorded data is used threefold: 1. In histograms for each of the following metrics: - cycles/iteration (log.) - packets/iteration (log.) - cycles/packet - packets/batch - max. vhostuser qlen (log.) - upcalls - cycles/upcall (log) The histograms bins are divided linear or logarithmic. 2. A cyclic history of the above statistics for 999 iterations 3. A cyclic history of the cummulative/average values per millisecond wall clock for the last 1000 milliseconds: - number of iterations - avg. cycles/iteration - packets (Kpps) - avg. packets/batch - avg. max vhost qlen - upcalls - avg. cycles/upcall The gathered performance metrics can be printed at any time with the new CLI command ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len] [-pmd core] [dp] The options are -nh: Suppress the histograms -it iter_len: Display the last iter_len iteration stats -ms ms_len: Display the last ms_len millisecond stats -pmd core: Display only the specified PMD The performance statistics are reset with the existing dpif-netdev/pmd-stats-clear command. The output always contains the following global PMD statistics, similar to the pmd-stats-show command: Time: 15:24:55.270 Measurement duration: 1.008 s pmd thread numa_id 0 core_id 1: Cycles: 2419034712 (2.40 GHz) Iterations: 572817 (1.76 us/it) - idle: 486808 (15.9 % cycles) - busy: 86009 (84.1 % cycles) Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes: 3599415 (1.50 passes/pkt) - EMC hits: 336472 ( 9.3 %) - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 2399607 (2381 Kpps) Tx batches: 171400 (14.00 pkts/batch) Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Jan Scheurich	8492adc270	netdev: Add optional qfill output parameter to rxq_recv() If the caller provides a non-NULL qfill pointer and the netdev implemementation supports reading the rx queue fill level, the rxq_recv() function returns the remaining number of packets in the rx queue after reception of the packet burst to the caller. If the implementation does not support this, it returns -ENOTSUP instead. Reading the remaining queue fill level should not substantilly slow down the recv() operation. A first implementation is provided for ethernet and vhostuser DPDK ports in netdev-dpdk.c. This output parameter will be used in the upcoming commit for PMD performance metrics to supervise the rx queue fill level for DPDK vhostuser ports. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Ben Pfaff	f825fdd4ff	flow: Improve type-safety of MINIFLOW_GET_TYPE. Until mow, this macro has blindly read the passed-in type's size, but that's unnecessarily risky. This commit changes it to verify that the passed-in type is the same size as the field and, on GCC and Clang, that the types are compatible. It also adds a version that does not check, for the one case where (currently) we deliberately read the wrong size, and updates a few uses to use more precise field names. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Armando Migliaccio <armamig@gmail.com>	2018-03-31 11:31:51 -07:00
Justin Pettit	97bf8f478d	Don't shadow iterator values. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-02-28 14:53:29 -08:00
Justin Pettit	e883448e3f	dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-02-28 14:53:27 -08:00
Yi-Hung Wei	271e48a0e2	conntrack: Support conntrack flush by ct 5-tuple This patch adds support of flushing a conntrack entry specified by the conntrack 5-tuple in dpif-netdev. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Darrell Ball <dlu998@gmail.com>	2018-02-14 13:59:09 -08:00
Ben Pfaff	0d71302e36	ofp-util, ofp-parse: Break up into many separate modules. ofp-util had been far too large and monolithic for a long time. This commit breaks it up into units that make some logical sense. It also moves the pieces of ofp-parse that were specific to each unit into the relevant unit. Most of this commit is just moving code around. Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>	2018-02-13 10:43:13 -08:00
Eric Garver	1fe178d251	dpif: Add support for OVS_ACTION_ATTR_CT_CLEAR This supports using the ct_clear action in the kernel datapath. To preserve compatibility with current ct_clear behavior on old kernels, we only pass this action down to the datapath if a probe reveals the datapath actually supports it. Signed-off-by: Eric Garver <e@erig.me> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Justin Pettit <jpettit@ovn.org>	2018-01-20 11:16:37 -08:00
Kevin Traynor	2a2c67b435	dpif-netdev: Add percentage of pmd/core used by each rxq. It is based on the length of history that is stored about an rxq (currently 1 min). $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 4: isolated : false port: dpdkphy1 queue-id: 0 pmd usage: 70 % port: dpdkvhost0 queue-id: 0 pmd usage: 0 % pmd thread numa_id 0 core_id 6: isolated : false port: dpdkphy0 queue-id: 0 pmd usage: 64 % port: dpdkvhost1 queue-id: 0 pmd usage: 0 % These values are what would be used as part of rxq to pmd assignment due to a reconfiguration event e.g. adding pmds, adding rxqs or with the command: ovs-appctl dpif-netdev/pmd-rxq-rebalance Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Kevin Traynor	4f5d13e241	dpif-netdev: Reset the rxq current cycle counter on reload. An rxq may have processing cycles counted in the current counter when a reload happens. That could temporarily create a small skew on the stats for an rxq. Reset the counter after reload. Fixes: 4809891b2e01 ("dpif-netdev: Count the rxq processing cycles for an rxq.") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Ilya Maximets	c71ea3c4a7	dpif-netdev: Time based output batching. This allows to collect packets from more than one RX burst and send them together with a configurable intervals. 'other_config:tx-flush-interval' can be used to configure time that a packet can wait in output batch for sending. 'tx-flush-interval' has microsecond resolution. Tested-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Ilya Maximets	58ed6df048	dpif-netdev: Count cycles on per-rxq basis. Upcoming time-based output batching will allow to collect in a single output batch packets from different RX queues. Lets keep the list of RX queues for each output packet and collect cycles for them on send. Tested-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Ilya Maximets	05f9e707e1	dpif-netdev: Use microsecond granularity. Upcoming time-based output batching will require microsecond granularity for it's flexible configuration. Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Jan Scheurich	a19896abe5	dpif-netdev: Refactor cycle counting Simplify the historically grown TSC cycle counting in PMD threads. Cycles are currently counted for the following purposes: 1. Measure PMD ustilization PMD utilization is defined as ratio of cycles spent in busy iterations (at least one packet received or sent) over the total number of cycles. This is already done in pmd_perf_start_iteration() and pmd_perf_end_iteration() based on a TSC timestamp saved in current iteration at start_iteration() and the actual TSC at end_iteration(). No dependency on intermediate cycle accounting. 2. Measure the processing load per RX queue This comprises cycles spend on polling and processing packets received from the rx queue and the cycles spent on delayed sending of these packets to tx queues (with time-based batching). The previous scheme using cycles_count_start(), cycles_count_intermediate() and cycles-count_end() originally introduced to simplify cycle counting and saving calls to rte_get_tsc_cycles() was rather obscuring things. Replace by a nestable cycle_timer with with start and stop functions to embrace a code segment to be timed. The timed code may contain arbitrary nested cycle_timers. The duration of nested timers is excluded from the outer timer. The caller must ensure that each call to cycle_timer_start() is followed by a call to cycle_timer_end(). Failure to do so will lead to assertion failure or a memory leak. The new cycle_timer is used to measure the processing cycles per rx queue. This is not yet strictly necessary but will be made use of in a subsequent commit. All cycle count functions and data are relocated to module dpif-netdev-perf. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Jan Scheurich	82a48ead4e	dpif-netdev: Refactor PMD performance into dpif-netdev-perf Add module dpif-netdev-perf to host all PMD performance-related data structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-netdev and delegate whatever possible into the new module, using clean interfaces to shield dpif-netdev from the implementation details. Accordingly, the all PMD statistics members are moved from the main struct dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats. Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: 1. The cycles per packet counts are now based on packets received rather than packet passes through the datapath. 2. Packet counters are now kept for packets received and packets recirculated. These are kept as separate counters for maintainability reasons. The cost of incrementing these counters is negligible. These new counters are also displayed to the user. 3. A display statistic is added for the average number of datapath passes per packet. This should be useful for user debugging and understanding of packet processing. 4. The user visible 'miss' counter is used for successful upcalls, rather than the sum of sucessful and unsuccessful upcalls. Hence, this becomes what user historically understands by OVS 'miss upcall'. The user display is annotated to make this clear as well. 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to make it clear what the meaning is. 6. The enum pmd_stat_type is annotated to make the usage of the stats counters clear. 7. The subtable lookup stats is renamed to make it clear that it relates to masked lookups. 8. The PMD stats test is updated to handle the new user stats of packets received, packets recirculated and average number of datapath passes per packet. On top of that introduce a "-pmd <core>" option to the PMD info commands to filter the output for a single PMD. Made the pmd-stats-show output a bit more readable by adding a blank between colon and value. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Darrell Ball	875075b362	dpctl conntrack: Add get number of connections. A get command is added for number of conntrack connections. This command is only supported in the userspace datapath at this time. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-09 11:17:44 -08:00
Darrell Ball	c92339ad19	dpctl conntrack: Add get and set maxconns command. Get and set dpctl commands are added for conntrack maxconns. These commands are only supported in the userspace datapath at this time. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-09 11:16:44 -08:00
Yi Yang	f59cb331c4	nsh: rework NSH netlink keys and actions This patch changes OVS_KEY_ATTR_NSH to nested attribute and adds three new NSH sub attribute keys: OVS_NSH_KEY_ATTR_BASE: for length-fixed NSH base header OVS_NSH_KEY_ATTR_MD1: for length-fixed MD type 1 context OVS_NSH_KEY_ATTR_MD2: for length-variable MD type 2 metadata Its intention is to align to NSH kernel implementation. NSH match fields, set and PUSH_NSH action all use the below nested attribute format: OVS_KEY_ATTR_NSH begin OVS_NSH_KEY_ATTR_BASE OVS_NSH_KEY_ATTR_MD1 OVS_KEY_ATTR_NSH end or OVS_KEY_ATTR_NSH begin OVS_NSH_KEY_ATTR_BASE OVS_NSH_KEY_ATTR_MD2 OVS_KEY_ATTR_NSH end In addition, NSH encap and decap actions are renamed as push_nsh and pop_nsh to meet action naming convention. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-08 13:19:14 -08:00
Ben Pfaff	34944e81f0	Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD	2018-01-02 07:45:17 -08:00
Ben Pfaff	b2befd5bb2	sparse: Add guards to prevent FreeBSD-incompatible #include order. FreeBSD insists that <sys/types.h> be included before <netinet/in.h> and that <netinet/in.h> be included before <arpa/inet.h>. This adds guards to the "sparse" headers to yield a warning if this order is violated. This commit also adjusts the order of many #includes to suit this requirement. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>	2017-12-22 12:58:02 -08:00
Ilya Maximets	cc4891f39d	dpif-netdev: Count sent packets and batches. New statistics for 'pmd-stats-show' command: average number of packets per output batch. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Ilya Maximets	b30896c969	netdev: Remove unused may_steal. Not needed anymore because 'may_steal' already handled on dpif-netdev layer and always true. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Ilya Maximets	009e0033dc	dpif-netdev: Output packet batching. While processing incoming batch of packets they are scattered across many per-flow batches and sent separately. This becomes an issue while using more than a few flows. For example if we have balanced-tcp OvS bonding with 2 ports there will be 256 datapath internal flows for each dp_hash pattern. This will lead to scattering of a single recieved batch across all of that 256 per-flow batches and invoking send for each packet separately. This behaviour greatly degrades overall performance of netdev_send because of inability to use advantages of vectorized transmit functions. But the half (if 2 ports in bonding) of datapath flows will have the same output actions. This means that we can collect them in a single place back and send at once using single call to netdev_send. This patch introduces per-port packet batch for output packets for that purpose. 'output_pkts' batch is thread local and located in send port cache. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Ilya Maximets	b010be1760	dpif-netdev: Keep latest measured time for PMD thread. In current implementation 'now' variable updated once on each receive cycle and passed through the whole datapath via function arguments. It'll be better to keep this variable inside PMD thread structure to be able to get it at any time. Such solution will save the stack memory and simplify possible modifications in current logic. This patch introduces new structure 'dp_netdev_pmd_thread_ctx' contained by 'struct dp_netdev_pmd_thread' to store any processing context of this PMD thread. For now, only time and cycles moved to that structure. Can be extended in the future. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-20 21:07:46 +00:00
Darrell Ball	bd7d93f8b4	conntrack: Allow specified alg port numbers. Algs can use variable control port numbers for servers. The main use case is a kind of feeble security measure; the thinking being by some is that it obscures the alg traffic. It is really not very effective, but the kernel has this capability. This patch mimics the capability. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com>	2017-12-11 14:14:11 -08:00

1 2 3 4 5 ...

619 Commits