mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

Author	SHA1	Message	Date
Cian Ferriter	f32bebc42c	dpif-avx512: Add support for simple match lookup. Perform scalar simple match lookup in AVX512 DPIF by reusing the simple match lookup functions. The simple match lookup is placed in a separate per packet loop before the batch miniflow extract call since miniflow extract can be skipped when simple match is being used. Unsuccessful lookup during simple match lookup means an upcall is required because there is no suitable flow in the datapath. Fall back to the scalar DPIF to do this upcall just like we already do later in AVX512 DPIF when we have misses in the DPCLS. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Tested-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2022-07-12 13:31:45 +01:00
David Marchand	fe171e4f10	dpif-netdev: Refactor AVX512 runtime checks. As described in the bugzilla below, cpu_has_isa code may be compiled with some AVX512 instructions in it, because cpu.c is built as part of the libopenvswitchavx512. This is a problem when this function (supposed to probe for AVX512 instructions availability) is invoked from generic OVS code, on older CPUs that don't support them. For the same reason, dpcls_subtable_avx512_gather_probe, dp_netdev_input_outer_avx512_probe, mfex_avx512_probe and mfex_avx512_vbmi_probe are potential runtime bombs and can't either be built as part of libopenvswitchavx512. Move cpu.c to be part of the "normal" libopenvswitch. And move other helpers in generic OVS code. Note: - dpcls_subtable_avx512_gather_probe is split in two, because it also needs to do its own magic, - while moving those helpers, prefer direct calls to cpu_has_isa and avoid cast to intermediate integer variables when a simple boolean is enough, Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.") Fixes: abb807e27dd4 ("dpif-netdev: Add command to switch dpif implementation.") Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract") Fixes: b366fa2f4947 ("dpif-netdev: Call cpuid for x86 isa availability.") Reported-at: https://bugzilla.redhat.com/2100393 Reported-by: Ales Musil <amusil@redhat.com> Co-authored-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-29 11:27:09 +02:00
Cian Ferriter	2080979aed	dpif-netdev-avx512: Fix overflow of UINT32_C(1). UINT64_C(1) is required in this bitshift since batch_size can be 32 and 1 << 32 overflows UINT32_C(1). Fixes: ba0a2619ca0c ("dpif-netdev-avx512: Fix ubsan shift error in bitmasks.") Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-04-27 20:08:10 +02:00
Harry van Haaren	5db8aa39d9	dpif-netdev-avx512: Fix ubsan shift error in bitmasks. The code changes here are to handle (1 << i) shifts where 'i' is the packet index in the batch, and 1 << 31 is an overflow of the signed '1'. Fixed by adding UINT32_C() around the 1 character, ensuring compiler knows the 1 is unsigned (and 32-bits). Undefined Behaviour sanitizer is now happy with the bit-shifts at runtime. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-04-27 00:36:28 +02:00
Ilya Maximets	e7e9973b80	dpif-netdev: Forwarding optimization for flows with a simple match. There are cases where users might want simple forwarding or drop rules for all packets received from a specific port, e.g :: "in_port=1,actions=2" "in_port=2,actions=IN_PORT" "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop" "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3" There are also cases where complex OpenFlow rules can be simplified down to datapath flows with very simple match criteria. In theory, for very simple forwarding, OVS doesn't need to parse packets at all in order to follow these rules. "Simple match" lookup optimization is intended to speed up packet forwarding in these cases. Design: Due to various implementation constraints userspace datapath has following flow fields always in exact match (i.e. it's required to match at least these fields of a packet even if the OF rule doesn't need that): - recirc_id - in_port - packet_type - dl_type - vlan_tci (CFI + VID) - in most cases - nw_frag - for ip packets Not all of these fields are related to packet itself. We already know the current 'recirc_id' and the 'in_port' before starting the packet processing. It also seems safe to assume that we're working with Ethernet packets. So, for the simple OF rule we need to match only on 'dl_type', 'vlan_tci' and 'nw_frag'. 'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be combined in a single 64bit integer (mark) that can be used as a hash in hash map. We are using only VID and CFI form the 'vlan_tci', flows that need to match on PCP will not qualify for the optimization. Workaround for matching on non-existence of vlan updated to match on CFI and VID only in order to qualify for the optimization. CFI is always set by OVS if vlan is present in a packet, so there is no need to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside the simple match mark. New per-PMD flow table 'simple_match_table' introduced to store simple match flows only. 'dp_netdev_flow_add' adds flow to the usual 'flow_table' and to the 'simple_match_table' if the flow meets following constraints: - 'recirc_id' in flow match is 0. - 'packet_type' in flow match is Ethernet. - Flow wildcards contains only minimal set of non-wildcarded fields (listed above). If the number of flows for current 'in_port' in a regular 'flow_table' equals number of flows for current 'in_port' in a 'simple_match_table', we may use simple match optimization, because all the flows we have are simple match flows. This means that we only need to parse 'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching. Now we make the unique flow mark from the 'in_port', 'dl_type', 'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'. On successful lookup we don't need to run full 'miniflow_extract()'. Unsuccessful lookup technically means that we have no suitable flow in the datapath and upcall will be required. So, in this case EMC and SMC lookups are disabled. We may optimize this path in the future by bypassing the dpcls lookup too. Performance improvement of this solution on a 'simple match' flows should be comparable with partial HW offloading, because it parses same packet fields and uses similar flow lookup scheme. However, unlike partial HW offloading, it works for all port types including virtual ones. Performance results when compared to EMC: Test setup: virtio-user OVS virtio-user Testpmd1 ------------> pmd1 ------------> Testpmd2 (txonly) x<------ pmd2 <------------ (mac swap) Single stream of 64byte packets. Actions: in_port=vhost0,actions=vhost1 in_port=vhost1,actions=vhost0 Stats collected from pmd1 and pmd2, so there are 2 scenarios: Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2. Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1. Here the packet sent from pmd2 to Testpmd1 is always dropped, because the virtqueue is full since Testpmd1 is in txonly mode and doesn't receive any packets. This should be closer to the performance of a VM-to-Phy scenario. Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz. Table below represents improvement in throughput when compared to EMC. +----------------+------------------------+------------------------+ \| \| Default (-g -O2) \| "-Ofast -march=native" \| \| Scenario +------------+-----------+------------+-----------+ \| \| GCC \| Clang \| GCC \| Clang \| +----------------+------------+-----------+------------+-----------+ \| Virt-to-Virt \| +18.9% \| +25.5% \| +10.8% \| +16.7% \| \| Virt-to-NoCopy \| +24.3% \| +33.7% \| +14.9% \| +22.0% \| +----------------+------------+-----------+------------+-----------+ For Phy-to-Phy case performance improvement should be even higher, but it's not the main use-case for this functionality. Performance difference for the non-simple flows is within a margin of error. Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-07 20:32:20 +01:00
David Marchand	b366fa2f49	dpif-netdev: Call cpuid for x86 isa availability. DPIF AVX512 optimizations currently rely on DPDK availability while they can be used without DPDK. Besides, checking for availability of some isa only has to be done once and won't change while a OVS process runs. Resolve isa availability in constructors by using a simplified query based on cpuid API that comes from the compiler. Note: this also fixes the check on BMI2 availability: DPDK had a bug for this isa, see https://git.dpdk.org/dpdk/commit/?id=aae3037ab1e0. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-03 18:45:40 +01:00
Ilya Maximets	20a4f546f7	dpif-netdev: Use PMD context to get the port for HW miss recovery. Last RX queue, from which the packet got received, is already stored in the PMD context. So, we can get the netdev from it without the expensive hash map lookup. In my V2V testing this patch improves performance in case HW offload and experimental APIs are enabled by about 3%. That narrows down the performance difference with the case with experimental API disabled to about 0.5%, which is way within a margin of error for that setup. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-12-09 22:46:47 +01:00
Harry van Haaren	dc39608d2a	dpif/stats: Add miniflow extract opt hits counter This commit adds a new counter to be displayed to the user when requesting datapath packet statistics. It counts the number of packets that are parsed and a miniflow built up from it by the optimized miniflow extract parsers. The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an extra entry indicating if the optimized MFEX was hit: - MFEX Opt hits: 6786432 (100.0 %) Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:14 +01:00
Kumar Amber	3d8f47bc04	dpif-netdev: Add command line and function pointer for miniflow extract This patch introduces the MFEX function pointers which allows the user to switch between different miniflow extract implementations which are provided by the OVS based on optimized ISA CPU. The user can query for the available minflow extract variants available for that CPU by following commands: $ovs-appctl dpif-netdev/miniflow-parser-get Similarly an user can set the miniflow implementation by the following command : $ ovs-appctl dpif-netdev/miniflow-parser-set name This allows for more performance and flexibility to the user to choose the miniflow implementation according to the needs. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 09:56:58 +01:00
Cian Ferriter	d76a719a7a	dpif-netdev: Add a partial HWOL PMD statistic. It is possible for packets traversing the userspace datapath to match a flow before hitting on EMC by using a mark ID provided by a NIC. Add a PMD statistic for this hit. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:55 +01:00
Harry van Haaren	abb807e27d	dpif-netdev: Add command to switch dpif implementation. This commit adds a new command to allow the user to switch the active DPIF implementation at runtime. A probe function is executed before switching the DPIF implementation, to ensure the CPU is capable of running the ISA required. For example, the below code will switch to the AVX512 enabled DPIF assuming that the runtime CPU is capable of running AVX512 instructions: $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512 A new configuration flag is added to allow selection of the default DPIF. This is useful for running the unit-tests against the available DPIF implementations, without modifying each unit test. The design of the testing & validation for ISA optimized DPIF implementations is based around the work already upstream for DPCLS. Note however that a DPCLS lookup has no state or side-effects, allowing the auto-validator implementation to perform multiple lookups and provide consistent statistic counters. The DPIF component does have state, so running two implementations in parallel and comparing output is not a valid testing method, as there are changes in DPIF statistic counters (side effects). As a result, the DPIF is tested directly against the unit-tests. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:24 +01:00
Harry van Haaren	9ac84a1a36	dpif-avx512: Add ISA implementation of dpif. This commit adds the AVX512 implementation of DPIF functionality, specifically the dp_netdev_input_outer_avx512 function. This function only handles outer (no re-circulations), and is optimized to use the AVX512 ISA for packet batching and other DPIF work. Sparse is not able to handle the AVX512 intrinsics, causing compile time failures, so it is disabled for this file. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Co-authored-by: Kumar Amber <kumar.amber@intel.com> Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:12 +01:00

12 Commits