This patch enables most of the tunnel tests in the testsuite, and adds a
large TCP transfer to a vxlan and geneve test to verify TSO
functionality. Some additional changes were required to accommodate these
changes with netdev-linux interfaces. The test for vlan over vxlan is
purposely not enabled as the traffic produced by this test gives
incorrect values in the vnet header.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.
Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.
Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A typical NVGRE encapsulated packet starts with the ETH/IP/GRE
protocols. Miniflow extract will parse just the ETH and IP headers. The
GRE header will be processed later as part of the pop action. Add
support for parsing the ETH/IP headers in this scenario.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv6 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Add AVX512 Ipv6 optimized profile for vlan/IPv6/UDP and
vlan/IPv6/TCP, IPv6/UDP and IPv6/TCP.
MFEX autovalidaton test-case already has the IPv6 support for
validating against the scalar mfex.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The patch removes magic numbers from miniflow_bits
and calculates the bits at compile time. This also
makes it easier to handle any ABI changes.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The patch removes magic numbers pkt offsets and
minimum packet lenght and instead calculate it at
compile time.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
As described in the bugzilla below, cpu_has_isa code may be compiled
with some AVX512 instructions in it, because cpu.c is built as part of
the libopenvswitchavx512.
This is a problem when this function (supposed to probe for AVX512
instructions availability) is invoked from generic OVS code, on older
CPUs that don't support them.
For the same reason, dpcls_subtable_avx512_gather_probe,
dp_netdev_input_outer_avx512_probe, mfex_avx512_probe and
mfex_avx512_vbmi_probe are potential runtime bombs and can't either be
built as part of libopenvswitchavx512.
Move cpu.c to be part of the "normal" libopenvswitch.
And move other helpers in generic OVS code.
Note:
- dpcls_subtable_avx512_gather_probe is split in two, because it also
needs to do its own magic,
- while moving those helpers, prefer direct calls to cpu_has_isa and
avoid cast to intermediate integer variables when a simple boolean
is enough,
Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.")
Fixes: abb807e27dd4 ("dpif-netdev: Add command to switch dpif implementation.")
Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract")
Fixes: b366fa2f4947 ("dpif-netdev: Call cpuid for x86 isa availability.")
Reported-at: https://bugzilla.redhat.com/2100393
Reported-by: Ales Musil <amusil@redhat.com>
Co-authored-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
__builtin_constant_p is only available in GCC and only versions >= 4.
Use the same "#if __GNUC__ >= 4" check used in other parts of OVS for
this builtin.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Checking for each of the required AVX512 ISA separately will allow the
compiler to generate some AVX512 code where there is some support in the
compiler rather than only generating all AVX512 code when all of it is
supported or no AVX512 code at all.
For example, in GCC 4.9 where there is just support for AVX512F, this
patch will allow building the AVX512 DPIF.
Another example, in GCC 5 and 6, most AVX512 code can be generated, just
without AVX512VPOPCNTDQ support.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
No instructions from the AVX512VL ISA are used.
Compilation for AVX512F and AVX512 BW ISA are already enabled in
lib/automake.mk for the dpif-netdev-lookup-avx512-gather.c file because
it's part of the libopenvswitchavx512.la library. They don't need to be
enabled at a function level.
Remove these unnecessary function-level compiler target attributes.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv4 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The code changes here are to handle (1 << i) shifts where 'i' is the
packet index in the batch, and 1 << 31 is an overflow of the signed '1'.
Fixed by adding UINT32_C() around the 1 character, ensuring compiler knows
the 1 is unsigned (and 32-bits). Undefined Behaviour sanitizer is now happy
with the bit-shifts at runtime.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit fixes the minimum packet size for the vlan/ipv4/tcp
traffic profile, which was previously incorrectly set.
This commit also disallows any fragmented IPv4 packets from being
matched in the optimized miniflow-extract, avoiding complexity of
handling fragmented packets and using scalar fallback instead.
The DF (don't fragment) bit is now ignored, and stripped from the
resulting miniflow.
Fixes: aa85a25095 ("dpif-netdev/mfex: Add more AVX512 traffic profiles.")
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit improves handling of packets where the allocated memory
is less than 64 bytes. For packets recevied from DPDK ports this
never matters, as an mbuf always pre-allocates enough space, however
this can occur in cases where packet received from a kernel interface
or injected by an OpenFlow controller. The fix is required to
ensure OVS doesn't overread the allocated memory, e.g.:
==49944==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x6060000d8181 at pc 0x000001cb9d24 bp 0x7ffce3b385d0 sp 0x7ffce3b385c8
READ of size 64 at 0x6060000d8181 thread T0
#0 0x1cb9d23 in mfex_avx512_process lib/dpif-netdev-extract-avx512.c:491:26
#1 0x1cb9d23 in mfex_avx512_ip_udp lib/dpif-netdev-extract-avx512.c:625:1
#2 0x18786a1 in dpif_miniflow_extract_autovalidator lib/dpif-netdev-private-extract.c:277:29
#3 0x1cbca5c in dp_netdev_input_outer_avx512 lib/dpif-netdev-avx512.c:159:19
#4 0x1853048 in dp_netdev_process_rxq_port lib/dpif-netdev.c:4900:19
#5 0x1837c76 in dpif_netdev_run lib/dpif-netdev.c:6197:25
#6 0x1727a02 in type_run ofproto/ofproto-dpif.c:370:9
#7 0x16f6e07 in ofproto_type_run ofproto/ofproto.c:1778:31
#8 0x16c1a8b in bridge_run__ vswitchd/bridge.c:3245:9
#9 0x16bd2fd in bridge_run vswitchd/bridge.c:3310:5
#10 0x16db8fe in main vswitchd/ovs-vswitchd.c:127:9
#11 0x7fbc0c5b61a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
#12 0xedabbd in _start (vswitchd/ovs-vswitchd+0xedabbd)
0x6060000d8181 is located 9 bytes to the right of 56-byte
region [0x6060000d8140,0x6060000d8178)
allocated by thread T0 here:
#0 0xf7b09f in malloc (vswitchd/ovs-vswitchd+0xf7b09f)
#1 0x1aff3b9 in xmalloc__ lib/util.c:137:15
#2 0x1aff3b9 in xmalloc lib/util.c:172:12
#3 0x1afe211 in process_command lib/unixctl.c:310:13
#4 0x1afe211 in run_connection lib/unixctl.c:344:17
#5 0x1afe211 in unixctl_server_run lib/unixctl.c:395:21
#6 0x16db918 in main vswitchd/ovs-vswitchd.c:128:9
#7 0x7fbc0c5b61a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
The solution implemented uses a mask-to-zero if the available buffer
size is less than 64 bytes, and a branch for which type of load is used.
Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
DPIF AVX512 optimizations currently rely on DPDK availability while
they can be used without DPDK.
Besides, checking for availability of some isa only has to be done once
and won't change while a OVS process runs.
Resolve isa availability in constructors by using a simplified query
based on cpuid API that comes from the compiler.
Note: this also fixes the check on BMI2 availability: DPDK had a bug
for this isa, see https://git.dpdk.org/dpdk/commit/?id=aae3037ab1e0.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit tightens the requirements for processing TCP packets
in AVX512, ensuring that there are no TCP options by validating that
the "data offset" field of the TCP header is exactly equal to 5.
This ensures that the TCP header is not too short, and that it does
not contain extra options.
On the IP handling side, improve checks around total packet length.
Now the next protocol is included in the length checks, ensuring that
the IP header reported length is of appropriate size to contain the
next protocol (e.g. UDP requires 8 bytes, TCP requires 20). Note that
the inner protocol is always of a fixed size per profile, so it can be
set using the UDP_ and TCP_ HEADER_LEN defines.
Fixes: 250ceddcc2d0 ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds 3 new traffic profile implementations to the
existing avx512 miniflow extract infrastructure. The profiles added are:
- Ether()/IP()/TCP()
- Ether()/Dot1Q()/IP()/UDP()
- Ether()/Dot1Q()/IP()/TCP()
The design of the avx512 code here is for scalability to add more
traffic profiles, as well as enabling CPU ISA. Note that an implementation
is primarily adding static const data, which the compiler then specializes
away when the profile specific function is declared below.
As a result, the code is relatively maintainable, and scalable for new
traffic profiles as well as new ISA, and does not lower performance
compared with manually written code for each profile/ISA.
Note that confidence in the correctness of each implementation is
achieved through autovalidation, unit tests with known packets, and
fuzz tested packets.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.
The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.
The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.
Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>