mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

Author	SHA1	Message	Date
Eelco Chaudron	edb66993f9	dpif-netdev-perf: Eliminate dead code. This patch eliminates a small piece of dead code. Fixes: 79f368756ce8 ("dpif-netdev: Detailed performance stats for PMDs") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2024-08-29 11:28:41 +02:00
Kevin Traynor	3beff0a6b0	dpif-netdev-perf: Add metric averages when no iterations. pmd-perf-show with pmd-perf-metrics=true displays a histogram with averages. However, averages were not displayed when there is no iterations. They will be all zero so it is not hiding useful information but the stats look incomplete without them, especially when they are displayed for some PMD thread cores and not others. The histogram print is large and this is just an extra couple of lines, so might as well print them all the time to ensure that the user does not think there is something missing from the display. Before patch: Histograms cycles/it 499 0 716 0 1025 0 1469 0 <snip> After patch: Histograms cycles/it 499 0 716 0 1025 0 1469 0 <snip> --------------- cycles/it 0 Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-27 16:57:27 +01:00
Kevin Traynor	7db18054ff	dpif-netdev-perf: Remove not a number stat value. Some stats in pmd-perf-show don't check for divide by zero which results in not a number (-nan). This is a normal case for some of the stats when there are no Rx queues assigned to the PMD thread core. It is not obvious what -nan is to a user so add a check for divide by zero and set stat to 0 if present. Before patch: pmd thread numa_id 1 core_id 9: Iterations: 0 (-nan us/it) - Used TSC cycles: 0 ( 0.0 % of total cycles) - idle iterations: 0 ( -nan % of used cycles) - busy iterations: 0 ( -nan % of used cycles) After patch: pmd thread numa_id 1 core_id 9: Iterations: 0 (0.00 us/it) - Used TSC cycles: 0 ( 0.0 % of total cycles) - idle iterations: 0 ( 0.0 % of used cycles) - busy iterations: 0 ( 0.0 % of used cycles) Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-27 16:57:27 +01:00
Kevin Traynor	de3bbdc479	dpif-netdev: Add PMD load based sleeping. Sleep for an incremental amount of time if none of the Rx queues assigned to a PMD have at least half a batch of packets (i.e. 16 pkts) on an polling iteration of the PMD. Upon detecting the threshold of >= 16 pkts on an Rxq, reset the sleep time to zero (i.e. no sleep). Sleep time will be increased on each iteration where the low load conditions remain up to a total of the max sleep time which is set by the user e.g: ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500 The default pmd-maxsleep value is 0, which means that no sleeps will occur and the default behaviour is unchanged from previously. Also add new stats to pmd-perf-show to get visibility of operation e.g. ... - sleep iterations: 153994 ( 76.8 % of iterations) Sleep time (us): 9159399 ( 59 us/iteration avg.) ... Reviewed-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-12 18:56:05 +01:00
Ilya Maximets	e7e9973b80	dpif-netdev: Forwarding optimization for flows with a simple match. There are cases where users might want simple forwarding or drop rules for all packets received from a specific port, e.g :: "in_port=1,actions=2" "in_port=2,actions=IN_PORT" "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop" "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3" There are also cases where complex OpenFlow rules can be simplified down to datapath flows with very simple match criteria. In theory, for very simple forwarding, OVS doesn't need to parse packets at all in order to follow these rules. "Simple match" lookup optimization is intended to speed up packet forwarding in these cases. Design: Due to various implementation constraints userspace datapath has following flow fields always in exact match (i.e. it's required to match at least these fields of a packet even if the OF rule doesn't need that): - recirc_id - in_port - packet_type - dl_type - vlan_tci (CFI + VID) - in most cases - nw_frag - for ip packets Not all of these fields are related to packet itself. We already know the current 'recirc_id' and the 'in_port' before starting the packet processing. It also seems safe to assume that we're working with Ethernet packets. So, for the simple OF rule we need to match only on 'dl_type', 'vlan_tci' and 'nw_frag'. 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be combined in a single 64bit integer (mark) that can be used as a hash in hash map. We are using only VID and CFI form the 'vlan_tci', flows that need to match on PCP will not qualify for the optimization. Workaround for matching on non-existence of vlan updated to match on CFI and VID only in order to qualify for the optimization. CFI is always set by OVS if vlan is present in a packet, so there is no need to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside the simple match mark. New per-PMD flow table 'simple_match_table' introduced to store simple match flows only. 'dp_netdev_flow_add' adds flow to the usual 'flow_table' and to the 'simple_match_table' if the flow meets following constraints: - 'recirc_id' in flow match is 0. - 'packet_type' in flow match is Ethernet. - Flow wildcards contains only minimal set of non-wildcarded fields (listed above). If the number of flows for current 'in_port' in a regular 'flow_table' equals number of flows for current 'in_port' in a 'simple_match_table', we may use simple match optimization, because all the flows we have are simple match flows. This means that we only need to parse 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching. Now we make the unique flow mark from the 'in_port', 'dl_type', 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'. On successful lookup we don't need to run full 'miniflow_extract()'. Unsuccessful lookup technically means that we have no suitable flow in the datapath and upcall will be required. So, in this case EMC and SMC lookups are disabled. We may optimize this path in the future by bypassing the dpcls lookup too. Performance improvement of this solution on a 'simple match' flows should be comparable with partial HW offloading, because it parses same packet fields and uses similar flow lookup scheme. However, unlike partial HW offloading, it works for all port types including virtual ones. Performance results when compared to EMC: Test setup: virtio-user OVS virtio-user Testpmd1 ------------> pmd1 ------------> Testpmd2 (txonly) x<------ pmd2 <------------ (mac swap) Single stream of 64byte packets. Actions: in_port=vhost0,actions=vhost1 in_port=vhost1,actions=vhost0 Stats collected from pmd1 and pmd2, so there are 2 scenarios: Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2. Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1. Here the packet sent from pmd2 to Testpmd1 is always dropped, because the virtqueue is full since Testpmd1 is in txonly mode and doesn't receive any packets. This should be closer to the performance of a VM-to-Phy scenario. Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz. Table below represents improvement in throughput when compared to EMC. +----------------+------------------------+------------------------+ \| \| Default (-g -O2) \| "-Ofast -march=native" \| \| Scenario +------------+-----------+------------+-----------+ \| \| GCC \| Clang \| GCC \| Clang \| +----------------+------------+-----------+------------+-----------+ \| Virt-to-Virt \| +18.9% \| +25.5% \| +10.8% \| +16.7% \| \| Virt-to-NoCopy \| +24.3% \| +33.7% \| +14.9% \| +22.0% \| +----------------+------------+-----------+------------+-----------+ For Phy-to-Phy case performance improvement should be even higher, but it's not the main use-case for this functionality. Performance difference for the non-simple flows is within a margin of error. Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-07 20:32:20 +01:00
Harry van Haaren	dc39608d2a	dpif/stats: Add miniflow extract opt hits counter This commit adds a new counter to be displayed to the user when requesting datapath packet statistics. It counts the number of packets that are parsed and a miniflow built up from it by the optimized miniflow extract parsers. The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an extra entry indicating if the optimized MFEX was hit: - MFEX Opt hits: 6786432 (100.0 %) Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:14 +01:00
Cian Ferriter	d76a719a7a	dpif-netdev: Add a partial HWOL PMD statistic. It is possible for packets traversing the userspace datapath to match a flow before hitting on EMC by using a mark ID provided by a NIC. Add a PMD statistic for this hit. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:55 +01:00
Lance Yang	96445fa4d9	dpif-netdev-perf: Fix using of uninitialized last_tsc. When compiling Open vSwitch on aarch64, the compiler will warn about a uninitialized variable in lib/dpif-netdev-perf.c. If the clock_gettime function in rdtsc_syscall fails, the member last_tsc of the uninitialized struct will be returned. In order to avoid the warnings, it is necessary to initialize the variable before using. Reviewed-by: Yanqin Wei <Yanqin.Wei@arm.com> Reviewed-by: Malvika Gupta <Malvika.Gupta@arm.com> Signed-off-by: Lance Yang <Lance.Yang@arm.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2019-12-13 18:45:10 +01:00
Ilya Maximets	1276e3db89	dpif-netdev-perf: Fix TSC frequency for non-DPDK case. Unlike 'rte_get_tsc_cycles()' which doesn't need any specific initialization, 'rte_get_tsc_hz()' could be used only after successfull call to 'rte_eal_init()'. 'rte_eal_init()' estimates the TSC frequency for later use by 'rte_get_tsc_hz()'. Fairly said, we're not allowed to use 'rte_get_tsc_cycles()' before initializing DPDK too, but it works this way for now and provides correct results. This patch provides TSC frequency estimation code that will be used in two cases: * DPDK is not compiled in, i.e. DPDK_NETDEV not defined. * DPDK compiled in but not initialized, i.e. other_config:dpdk-init=false This change is mostly useful for AF_XDP netdev support, i.e. allows to use dpif-netdev/pmd-perf-show command and various PMD perf metrics. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: William Tu <u9012063@gmail.com>	2019-09-06 11:45:39 +03:00
Ilya Maximets	7f26e4114d	dpif-netdev-perf: Fix millisecond stats precision with slower TSC. Unlike x86 where TSC frequency usually matches with CPU frequency, another architectures could have much slower TSCs. For example, it's common for Arm SoCs to have 100 MHz TSC by default. In this case perf module will check for end of current millisecond each 10K cycles, i.e 10 times per millisecond. This could be not enough to collect precise statistics. Fix that by taking current TSC frequency into account instead of hardcoding the number of cycles. CC: Jan Scheurich <jan.scheurich@ericsson.com> Fixes: 79f368756ce8 ("dpif-netdev: Detailed performance stats for PMDs") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-22 21:52:25 +00:00
Ilya Maximets	2e97b8419c	dpif-netdev-perf: Fix double update of perf histograms. Real values of 'packets per batch' and 'cycles per upcall' already added to histograms in 'dpif-netdev' on receive. Adding the averages makes statistics wrong. We should not add to histograms values that never really appeared. For exmaple, in current code following situation is possible: pmd thread numa_id 0 core_id 5: ... Rx packets: 83 (0 Kpps, 13873 cycles/pkt) ... - Upcalls: 3 ( 3.6 %, 248.6 us/upcall) Histograms packets/it pkts/batch upcalls/it cycles/upcall 1 83 1 166 1 3 ... 15848 2 19952 2 ... 50118 2 i.e. all the packets counted twice in 'pkts/batch' column and all the upcalls counted twice in 'cycles/upcall' column. CC: Jan Scheurich <jan.scheurich@ericsson.com> Fixes: 79f368756ce8 ("dpif-netdev: Detailed performance stats for PMDs") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-19 10:31:37 +00:00
Ilya Maximets	6794be1dee	dpif-netdev-perf: Clarify frequency number. 'dpif-netdev/pmd-perf-show' command prints the frequency number calculated from the total number of cycles spent for iterations for the measured period. This number could be confusing, because users may think that it should be equal to CPU frequency, especially on non-x86 systems where TSC frequency likely does not match with CPU one. Moreover, counted TSC cycles could differ from the HW TSC cycles in case of a large number of PMD reloads, because cycles spent outside of the main polling loop are not taken into account anywhere. In this case the frequency will not match even TSC frequency. Let's clarify the meaning in order to avoid this misunderstanding. 'Cycles' replaced with 'Used TSC cycles', which describes how many TSC cycles consumed by the main polling loop. % of the total TSC cycles now printed instead of GHz frequency, because GHz is unclear for understanding, especially without knowing the exact TSC frequency. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-10-12 15:22:54 +01:00
Ilya Maximets	21e9b77b88	dpif-netdev-perf: Print SMC statistics. Printing of the SMC hits missed in the 'dpif-netdev/pmd-perf-show' appctl command. CC: Yipeng Wang <yipeng1.wang@intel.com> Fixes: 60d8ccae135f ("dpif-netdev: Add SMC cache after EMC cache") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Yipeng Wang <yipeng1.wang@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-10-12 11:40:28 +01:00
Alin Gabriel Serdean	12028f5f4c	dpif-netdev-perf: Fix linker unresolved symbols on Windows MSVC complains: "libopenvswitch.lib(dpif-netdev.obj) : error LNK2019: unresolved external symbol pmd_perf_start_iteration referenced in function pmd_thread_main libopenvswitch.lib(dpif-netdev.obj) : error LNK2019: unresolved external symbol pmd_perf_end_iteration referenced in function pmd_thread_main" Remove inline keyword from the declaration of: `pmd_perf_start_iteration` and `pmd_perf_end_iteration` More on the subject: https://docs.microsoft.com/en-us/cpp/error-messages/tool-errors/function-inlining-problems Fixes: broken build on Windows Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-05-15 15:17:56 +03:00
Jan Scheurich	7178fefbdf	dpif-netdev: Detection and logging of suspicious PMD iterations This patch enhances dpif-netdev-perf to detect iterations with suspicious statistics according to the following criteria: - iteration lasts longer than US_THR microseconds (default 250). This can be used to capture events where a PMD is blocked or interrupted for such a period of time that there is a risk for dropped packets on any of its Rx queues. - max vhost qlen exceeds a threshold Q_THR (default 128). This can be used to infer virtio queue overruns and dropped packets inside a VM, which are not visible in OVS otherwise. Such suspicious iterations can be logged together with their iteration statistics to be able to correlate them to packet drop or other events outside OVS. A new command is introduced to enable/disable logging at run-time and to adjust the above thresholds for suspicious iterations: ovs-appctl dpif-netdev/pmd-perf-log-set on \| off [-b before] [-a after] [-e\|-ne] [-us usec] [-q qlen] Turn logging on or off at run-time (on\|off). -b before: The number of iterations before the suspicious iteration to be logged (default 5). -a after: The number of iterations after the suspicious iteration to be logged (default 5). -e: Extend logging interval if another suspicious iteration is detected before logging occurs. -ne: Do not extend logging interval (default). -q qlen: Suspicious vhost queue fill level threshold. Increase this to 512 if the Qemu supports 1024 virtio queue length. (default 128). -us usec: change the duration threshold for a suspicious iteration (default 250 us). Note: Logging of suspicious iterations itself consumes a considerable amount of processing cycles of a PMD which may be visible in the iteration history. In the worst case this can lead OVS to detect another suspicious iteration caused by logging. If more than 100 iterations around a suspicious iteration have been logged once, OVS falls back to the safe default values (-b 5/-a 5/-ne) to avoid that logging itself causes continuos further logging. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Jan Scheurich	79f368756c	dpif-netdev: Detailed performance stats for PMDs This patch instruments the dpif-netdev datapath to record detailed statistics of what is happening in every iteration of a PMD thread. The collection of detailed statistics can be controlled by a new Open_vSwitch configuration parameter "other_config:pmd-perf-metrics". By default it is disabled. The run-time overhead, when enabled, is in the order of 1%. The covered metrics per iteration are: - cycles - packets - (rx) batches - packets/batch - max. vhostuser qlen - upcalls - cycles spent in upcalls This raw recorded data is used threefold: 1. In histograms for each of the following metrics: - cycles/iteration (log.) - packets/iteration (log.) - cycles/packet - packets/batch - max. vhostuser qlen (log.) - upcalls - cycles/upcall (log) The histograms bins are divided linear or logarithmic. 2. A cyclic history of the above statistics for 999 iterations 3. A cyclic history of the cummulative/average values per millisecond wall clock for the last 1000 milliseconds: - number of iterations - avg. cycles/iteration - packets (Kpps) - avg. packets/batch - avg. max vhost qlen - upcalls - avg. cycles/upcall The gathered performance metrics can be printed at any time with the new CLI command ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len] [-pmd core] [dp] The options are -nh: Suppress the histograms -it iter_len: Display the last iter_len iteration stats -ms ms_len: Display the last ms_len millisecond stats -pmd core: Display only the specified PMD The performance statistics are reset with the existing dpif-netdev/pmd-stats-clear command. The output always contains the following global PMD statistics, similar to the pmd-stats-show command: Time: 15:24:55.270 Measurement duration: 1.008 s pmd thread numa_id 0 core_id 1: Cycles: 2419034712 (2.40 GHz) Iterations: 572817 (1.76 us/it) - idle: 486808 (15.9 % cycles) - busy: 86009 (84.1 % cycles) Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes: 3599415 (1.50 passes/pkt) - EMC hits: 336472 ( 9.3 %) - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 2399607 (2381 Kpps) Tx batches: 171400 (14.00 pkts/batch) Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Jan Scheurich	82a48ead4e	dpif-netdev: Refactor PMD performance into dpif-netdev-perf Add module dpif-netdev-perf to host all PMD performance-related data structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-netdev and delegate whatever possible into the new module, using clean interfaces to shield dpif-netdev from the implementation details. Accordingly, the all PMD statistics members are moved from the main struct dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats. Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: 1. The cycles per packet counts are now based on packets received rather than packet passes through the datapath. 2. Packet counters are now kept for packets received and packets recirculated. These are kept as separate counters for maintainability reasons. The cost of incrementing these counters is negligible. These new counters are also displayed to the user. 3. A display statistic is added for the average number of datapath passes per packet. This should be useful for user debugging and understanding of packet processing. 4. The user visible 'miss' counter is used for successful upcalls, rather than the sum of sucessful and unsuccessful upcalls. Hence, this becomes what user historically understands by OVS 'miss upcall'. The user display is annotated to make this clear as well. 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to make it clear what the meaning is. 6. The enum pmd_stat_type is annotated to make the usage of the stats counters clear. 7. The subtable lookup stats is renamed to make it clear that it relates to masked lookups. 8. The PMD stats test is updated to handle the new user stats of packets received, packets recirculated and average number of datapath passes per packet. On top of that introduce a "-pmd <core>" option to the PMD info commands to filter the output for a single PMD. Made the pmd-stats-show output a bit more readable by adding a blank between colon and value. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Co-authored-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Darrell Ball <dlu998@gmail.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00

17 Commits