2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00

90 Commits

Author SHA1 Message Date
David Marchand
cf7b86db1f dp-packet: Rework TCP segmentation.
Rather than mark with a offload flags + mark with a segmentation size,
simply rely on the netdev implementation which sets a segmentation size
when appropriate.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:03:09 +02:00
David Marchand
2956a61265 dp-packet: Rework L4 checksum offloads.
The DPDK mbuf API specifies 4 status when it comes to L4 checksums:
- RTE_MBUF_F_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
- RTE_MBUF_F_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
- RTE_MBUF_F_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
- RTE_MBUF_F_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
  data, but the integrity of the L4 data is verified.

Similarly to the IP checksum offloads API, revise OVS L4 offloads API.

No information about the L4 protocol is provided by any netdev-*
implementation, so OVS needs to mark this L4 protocol during flow
extraction.

Rename current API for consistency with dp_packet_(inner_)?l4_checksum_.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:02:56 +02:00
David Marchand
3daf04a4c5 dp-packet: Rework IP checksum offloads.
As the packet traverses through OVS, offloading Tx flags must be carefully
evaluated and updated which results in a bit of complexity because of a
separate "outer" Tx offloading flag coming from DPDK API,
and a "normal"/"inner" Tx offloading flag.

On the other hand, the DPDK mbuf API specifies 4 status when it comes to
IP checksums:
- RTE_MBUF_F_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
- RTE_MBUF_F_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
- RTE_MBUF_F_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
- RTE_MBUF_F_RX_IP_CKSUM_NONE: the IP checksum is not correct in the
  packet data, but the integrity of the IP header is verified.

This patch changes OVS API so that OVS code only tracks the status of
the checksum of the "current" L3 header and let the Tx flags aspect to
the netdev-* implementations.

With this API, the flow extraction can be cleaned up.

During packet processing, OVS can simply look for the IP checksum validity
(either good, or partial) before changing some IP header, and then mark
the checksum as partial.

In the conntrack case, when natting packets, the checksum status of the
inner part (ICMP error case) must be forced temporarily as unknown
to force checksum resolution.

When tunneling comes into play, IP checksums status is bit-shifted for
future considerations in the processing if, for example, the tunnel
header gets decapsulated again, or in the netdev-* implementations that
support tunnel offloading.

Finally, netdev-* implementations only need to care about packets in
partial status: a good checksum does not need touching, a bad checksum
has been updated by kept as bad by OVS, an unknown checksum is either
an IPv6 or if it was an IPv4, OVS updated it too (keeping it good or bad
accordingly).

Rename current API for consistency with dp_packet_(inner_)?ip_checksum_.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:54 +02:00
David Marchand
67abd51540 dp-packet: Rework tunnel offloads.
Rather than set bits in the mbuf ol_flags field, that only makes sense
for netdev-dpdk ports, mark packet for tunnel offload in OVS offloads
API.

While at it, since there is nothing really "hardware" related, rename
current API for consistency with dp_packet_tunnel_ prefix.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:48 +02:00
David Marchand
e2200485c5 dp-packet: Expand offloads preparation helper.
Expand this helper to clearly separate the non tunnel case from the
tunnel one. This will make later changes easier to read.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:45 +02:00
David Marchand
d29ba0abdc dp-packet: Add OVS offloading API.
As a preparation for tracking inner checksums, separate Rx checksum
status from the DPDK ol_flags field.
To minimize the cost of translating from DPDK API to OVS API, simply map
OVS flags to DPDK Rx mbuf flags.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:34 +02:00
David Marchand
19ef1b1f0f dp-packet: Remove DPDK specific IP version.
Flagging packets with IP version is only needed at the netdev-dpdk level.

In most cases, OVS is already inspecting the IP header in packet data,
so maintaining such IP version metadata won't save much cycles
(given the cost of additional branches necessary for handling
outer/inner flags).

Cleanup OVS shared code and only set these flags in netdev-dpdk.c.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 20:59:22 +02:00
David Marchand
52fdeda11a dp-packet: Remove Linux specific L4 offloads.
As the virtio-net offload API is used for netdev-linux ports, but
provides no information about the potentially encapsulated protocol
concerned by a checksum request, specific information from this netdev-
specific implementation is propagated into OVS code, and must be
carefully evaluated when some tunnel gets decapsulated.

This induces a cost in "normal" processing path, while the netdev-linux
path is not performance critical.

This patch removes such specific information, yet try harder to parse
the packet on the Rx side and set offload flags accordingly for non
encapsulated traffic. For encapsulated traffic, the inner
checksum is computed.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 20:59:04 +02:00
David Marchand
585c8088eb dpif-netdev: Enhance checksum coverage.
Enhance netdev-dummy:
- add debug log,
- split Rx and Tx aspects,
- add coverage for bad status,

Enhance unit tests:
- enable Tx offloads on the transmitting port,
- test L4 checksums for TCP and UDP (and partial status),
- test IPv6,

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 19:43:01 +02:00
David Marchand
d49994634e flow: Fix bad IP checksum flag.
flow_compose() can generate packets with bad IPv4 checksum, however the
associated Rx flags were not correctly set.
The usefulness of setting this metadata seems limited, yet fix this for
consistency.

Fixes: c62b4ac8f8da ("ovs-ofctl: Implement compose-packet --bare [--bad-csum].")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 19:08:36 +02:00
David Marchand
c771758249 dpif-netdev: Preserve inner offloads on recirculation.
Rather than drop all pending Tx offloads on recirculation,
preserve inner offloads (and mark packet with outer Tx offloads)
when parsing the packet again.

Fixes: c6538b443984 ("dpif-netdev: Fix crash due to tunnel offloading on recirculation.")
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://issues.redhat.com/browse/FDP-1144
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-02-13 21:32:15 +01:00
Mike Pattrick
2276c3a2c6 userspace: Support GRE TSO.
This patch extends the userspace datapaths support of tunnel tso from
only supporting VxLAN and Geneve to also supporting GRE tunnels. There
is also a software fallback for cases where the egress netdev does not
support this feature.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-01-17 00:20:48 +01:00
Mike Pattrick
82c1028e37 Userspace: Software fallback for UDP encapsulated TCP segmentation.
When sending packets that are flagged as requiring segmentation to an
interface that does not support this feature, send the packet to the TSO
software fallback instead of dropping it.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2024-09-11 15:36:27 +02:00
Mike Pattrick
2ff8ed8de0 dp-packet: Correct IPv4 checksum calculation.
During the transition towards checksum offloading, the function to
handle software fallback of IPv4 checksums didn't account for the
options field.

Fixes: 5d11c47d3ebe ("userspace: Enable IP checksum offloading by default.")
Reported-by: Jun Wang <junwang01@cestc.cn>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-July/053236.html
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-08-15 12:25:14 +02:00
David Marchand
c39a84c131 netdev-dpdk: Refactor tunnel checksum offloading.
All information required for checksum offloading can be deduced by
already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs
fields.
Remove DPDK specific l[2-4]_len from generic OVS code.

netdev-dpdk code then fills mbuf specifics step by step:
- outer_l2_len and outer_l3_len are needed for tunneling (and below
  features),
- l2_len and l3_len are needed for IP and L4 checksum (and below features),
- l4_len and tso_segsz are needed when doing TSO,

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
2024-06-06 17:10:29 +01:00
Ilya Maximets
c6538b4439 dpif-netdev: Fix crash due to tunnel offloading on recirculation.
Recirculation involves re-parsing the packet from scratch and that
process is not aware of multiple header levels nor the inner/outer
offsets.  So, it overwrites offsets with new ones from the outermost
headers and sets offloading flags that change their meaning when
the packet is marked for tunnel offloading.

For example:

 1. TCP packet enters OVS.
 2. TCP packet gets encapsulated into UDP tunnel.
 3. Recirculation happens.
 4. Packet is re-parsed after recirculation with miniflow_extract()
    or similar function.
 5. Packet is marked for UDP checksumming because we parse the
    outermost set of headers.  But since it is tunneled, it means
    inner UDP checksumming.  And that makes no sense, because the
    inner packet is TCP.

This is causing packet drops due to malformed packets or even
assertions and crashes in the code that is trying to fixup checksums
for packets using incorrect metadata:

 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

 lib/packets.c:2061:15: runtime error:
        member access within null pointer of type 'struct udp_header'

  0 0xbe5221 in packet_udp_complete_csum lib/packets.c:2061:15
  1 0x7e5662 in dp_packet_ol_send_prepare lib/dp-packet.c:638:9
  2 0x96ef89 in netdev_send lib/netdev.c:940:9
  3 0x818e94 in dp_netdev_pmd_flush_output_on_port lib/dpif-netdev.c:5577:9
  4 0x817606 in dp_netdev_pmd_flush_output_packets lib/dpif-netdev.c:5618:27
  5 0x81cfa5 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5677:9
  6 0x7eefe4 in dpif_netdev_run lib/dpif-netdev.c:7001:25
  7 0x610e87 in type_run ofproto/ofproto-dpif.c:367:9
  8 0x5b9e80 in ofproto_type_run ofproto/ofproto.c:1879:31
  9 0x55bbb4 in bridge_run__ vswitchd/bridge.c:3281:9
 10 0x558b6b in bridge_run vswitchd/bridge.c:3346:5
 11 0x591dc5 in main vswitchd/ovs-vswitchd.c:130:9
 12 0x172b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89)
 13 0x172c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a)
 14 0x47eff4 in _start (vswitchd/ovs-vswitchd+0x47eff4)

Tests added for both IPv4 and IPv6 cases.  Though IPv6 test doesn't
trigger the issue it's better to have a symmetric test.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053014.html
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-03-22 20:45:37 +01:00
Mike Pattrick
f81d782c19 netdev-native-tnl: Mark all vxlan/geneve packets as tunneled.
Previously some packets were excluded from the tunnel mark if they
weren't L4. However, this causes problems with multi encapsulated
packets like arp.

Due to these flags being set, additional checks are required in checksum
modification code.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-02-16 15:23:26 +01:00
Mike Pattrick
a2d4ad651d netdev-linux: Only repair IP checksum in IPv4.
Previously a change was added to the vnet prepend code to solve for the
case where no L4 checksum offloading was needed but the L3 checksum
hadn't been calculated. But the added check didn't properly account
for IPv6 traffic.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-02-16 15:23:25 +01:00
Mike Pattrick
bf921e5677 dp-packet: Validate correct offset for L4 inner size.
This patch fixes the correctness of dp_packet_inner_l4_size() when
checking for the existence of an inner L4 header. Previously it checked
for the outer L4 header.

This function is currently only used when a packet is already flagged
for tunneling, so an incorrect determination isn't possible as long as
the flags of the packet are correct.

Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-02-13 20:51:43 +01:00
Mike Pattrick
96990ea1e4 dp-packet: Reset offload/offsets when clearing a packet.
The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
a BFD packet flagged as double encapsulated would trigger a seg fault.
The problem surfaced because bfd_put_packet was reusing a packet
allocated on the stack that wasn't having its flags reset between calls.

This change will reset OL flags as well as the layer offsets in
data_clear(), which should fix this type of packet reuse issue in
general as long as data_clear() is called in between uses.

Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
Reported-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-300
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-01-26 16:31:56 +01:00
Ilya Maximets
bacd2c304a dp-packet: Avoid checks while preparing non-offloading packets.
Currently, dp_packet_ol_send_prepare() performs multiple checks for
each offloading flag separately.  That takes a noticeable amount of
extra cycles for packets that do not have any offloading flags set.

Skip most of the work if no checksumming flags are set.

The change improves performance of direct forwarding between two
virtio-user ports (V2V) by ~2.5 % and offsets all the negative
effects of TSO support introduced recently.

It adds an extra check to the offloading path, but it is not a
default configuration and also should take much smaller hit due
to lower number of larger packets.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-01-19 13:52:22 +01:00
Mike Pattrick
85bcbbed83 userspace: Enable tunnel tests with TSO.
This patch enables most of the tunnel tests in the testsuite, and adds a
large TCP transfer to a vxlan and geneve test to verify TSO
functionality. Some additional changes were required to accommodate these
changes with netdev-linux interfaces. The test for vlan over vxlan is
purposely not enabled as the traffic produced by this test gives
incorrect values in the vnet header.

Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-01-17 22:06:51 +01:00
Dexia Li
084c808729 userspace: Support VXLAN and GENEVE TSO.
For userspace datapath, this patch provides vxlan and geneve tunnel tso.
Only support userspace vxlan or geneve tunnel, meanwhile support
tunnel outter and inner csum offload. If netdev do not support offload
features, there is a software fallback.If netdev do not support vxlan
and geneve tso,packets will drop. Front-end devices can close offload
features by ethtool also.

Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-01-17 22:06:45 +01:00
Mike Pattrick
9e3c842d57 dp-packet: Set checksum flags during software TSO.
When OVS needs to fallback on the software TSO implementation to segment
a packet, it currently doesn't guarantee that IP and TCP checksum
offload flags are set. However, it is possible that these is required.
This is true in the case of dp_netdev_upcall(), which clears these
flags.

This patch explicitly sets the appropriate flags when the segmentation
flag is removed, to guarantee that packets always end up with correct
checksums.

Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-01-17 21:40:51 +01:00
Flavio Leitner
8b5fe2dc60 userspace: Add Generic Segmentation Offloading.
This provides a software implementation in the case
the egress netdev doesn't support segmentation in hardware.

The challenge here is to guarantee packet ordering in the
original batch that may be full of TSO packets. Each TSO
packet can go up to ~64kB, so with segment size of 1440
that means about 44 packets for each TSO. Each batch has
32 packets, so the total batch amounts to 1408 normal
packets.

The segmentation estimates the total number of packets
and then the total number of batches. Then allocate
enough memory and finally do the work.

Finally each batch is sent in order to the netdev.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-12-02 01:33:37 +01:00
Flavio Leitner
e0056018c4 userspace: Respect tso/gso segment size.
Currently OVS will calculate the segment size based on the
MTU of the egress port. That usually happens to be correct
when the ports share the same MTU, but that is not always true.

Therefore, if the segment size is provided, then use that and
make sure the over sized packets are dropped.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-12-02 00:56:36 +01:00
Mike Pattrick
3337e6d91c userspace: Enable L4 checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.

Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.

Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:50:30 +02:00
Mike Pattrick
5d11c47d3e userspace: Enable IP checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.

Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.

Linux devices don't support IP checksum offload alone, so the
support is not enabled.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:49:51 +02:00
Mike Pattrick
4339e7b19f dp-packet: Allocate on cacheline boundary with DPDK.
UB Sanitizer report:
lib/dp-packet.h:587:22: runtime error: member access within misaligned
address 0x000001ecde10 for type 'struct dp_packet', which requires 64
byte alignment

    #0 in dp_packet_set_base lib/dp-packet.h:587
    #1 in dp_packet_use__ lib/dp-packet.c:46
    #2 in dp_packet_use lib/dp-packet.c:60
    #3 in dp_packet_init lib/dp-packet.c:126
    #4 in dp_packet_new lib/dp-packet.c:150
    [...]

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 22:18:16 +01:00
Cian Ferriter
9855f35dd2 dpif-netdev/mfex: Add AVX512 NVGRE traffic profiles.
A typical NVGRE encapsulated packet starts with the ETH/IP/GRE
protocols.  Miniflow extract will parse just the ETH and IP headers. The
GRE header will be processed later as part of the pop action. Add
support for parsing the ETH/IP headers in this scenario.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-12-21 15:44:17 +00:00
David Marchand
0937209fc7 netdev-dpdk: Cleanup code when DPDK is disabled.
Remove one unused stub: netdev_dpdk_register() can't be called if DPDK
is disabled at build time.

Remove unneeded #ifdef in call to free_dpdk_buf.
Drop unneeded cast when calling free_dpdk_buf.

Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 13:58:15 +01:00
Emma Finn
eec8227614 odp-execute: Add auto validation function for actions.
This commit introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different action implementations against the linear
action implementation.

The autovalidator function can be triggered at runtime using the
following command:

$ ovs-appctl odp-execute/action-impl-set autovalidator

Signed-off-by: Emma Finn <emma.finn@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-07-15 11:38:54 +01:00
Kumar Amber
b80f58cde2 dpif-netdev/mfex: Add ipv6 profile based hashing.
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.

This commit adds IPv6 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-07-05 15:42:07 +01:00
Rosemarie O'Riorden
7e7083cc46 dpif-netdev: Replace loop iterating over packet batch with macro.
The function dp_netdev_pmd_flush_output_on_port() iterates over the
p->output_pkts batch directly, when it should be using the special
iterator macro, DP_PACKET_BATCH_FOR_EACH.

However, this wasn't possible because the macro could not accept
&p->output_pkts.

The addition of parentheses when BATCH is dereferenced allows the macro
to expand properly. Parenthesizing arguments in macros is good practice
to be able to handle whichever expressions are passed in.

Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 21:18:08 +02:00
Kumar Amber
af864cedb0 dpif-netdev/mfex: Add ipv4 profile based hashing.
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.

This commit adds IPv4 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-05-04 14:24:04 +01:00
Ian Stokes
17346b3899 dpdk: Update to use DPDK v21.11.
This commit adds support for DPDK v21.11, it includes the following
changes.

1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*

5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*

In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.

For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-12-09 18:40:14 +00:00
Tony van der Peet
7e6b41ac8d dpif-netdev: Fix crash when PACKET_OUT is metered.
When a PACKET_OUT has output port of OFPP_TABLE, and the rule
table includes a meter and this causes the packet to be deleted,
execute with a clone of the packet, restoring the original packet
if it is changed by the execution.

Add tests to verify the original issue is fixed, and that the fix
doesn't break tunnel processing.

Reported-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-08 17:52:35 +02:00
Aaron Conole
640d4db788 ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.
As reported by Wang Liang, the way packets are passed to the ipf module
doesn't allow for use later on in reassembly.  Such packets may be get
released anyway, such as during cleanup of tx processing.  Because the
ipf module lacks a way of forcing the dp_packet to be retained, it
will later reuse the packet.  Instead, just clone the packet and let the
ipf queue own the copy until the queue is destroyed.

After this change, there are no more in-tree users of the batch
'do_not_steal' flag.  Thus, we remove it as well.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Fixes: 0b3ff31d35f5 ("dp-packet: Add 'do_not_steal' packet batch flag.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
Reported-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-15 10:46:33 +02:00
Flavio Leitner
79349cbab0 flow: Support extra padding length.
Although not required, padding can be optionally added until
the packet length is MTU bytes. A packet with extra padding
currently fails sanity checks.

Vulnerability: CVE-2020-35498
Fixes: fa8d9001a624 ("miniflow_extract: Properly handle small IP packets.")
Reported-by: Joakim Hindersson <joakim.hindersson@elastx.se>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-02-10 14:59:55 +01:00
Ben Pfaff
3eec7fb075 pcap-file: Fix calculation of TCP payload length in tcp_reader_run().
The calculation in tcp_reader_run() failed to account for L2 padding.
This fixes the problem, by moving the existing function
tcp_payload_length() from a conntrack private header file into
dp-packet.h and renaming it to suit the dp_packet style.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2021-02-02 09:59:31 -08:00
William Tu
29bb3093eb userspace: Enable TSO support for non-DPDK.
This patch enables TSO support for non-DPDK use cases, and
also add check-system-tso testsuite. Before TSO, we have to
disable checksum offload, allowing the kernel to calculate the
TCP/UDP packet checsum. With TSO, we can skip the checksum
validation by enabling checksum offload, and with large packet
size, we see better performance.

Consider container to container use cases:
  iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
And I got around 6Gbps, similar to TSO with DPDK-enabled.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-05-14 07:21:34 -07:00
Flavio Leitner
c724012970 dp-packet: prefetch the next packet when cloning a batch.
There is a cache miss when accessing mbuf->data_off while cloning
a batch and using prefetch improved the throughput by ~2.3%.

Before: 13709416.30 pps
 After: 14031475.80 pps

Fixes: d48771848560 ("dp-packet: preserve headroom when cloning a pkt batch")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-02-10 09:41:15 -08:00
Flavio Leitner
73858f9dbe netdev-linux: Prepend the std packet in the TSO packet
Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-02-06 11:37:23 -08:00
Flavio Leitner
29cf9c1b3b userspace: Add TCP Segmentation Offload support
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 22:27:25 +00:00
Flavio Leitner
d487718485 dp-packet: preserve headroom when cloning a pkt batch
The headroom is useful if the packet needs to insert additional
header, so preserve the original headroom when cloning the batch.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 17:01:24 +00:00
Ilya Maximets
9965fef8db dp-packet: Fix clearing/copying of memory layout flags.
'ol_flags' of DPDK mbuf could contain bits responsible for external
or indirect buffers which are not actually offload flags in a common
sense.  Clearing/copying of these flags could lead to memory leaks of
external memory chunks and crashes due to access to wrong memory.

OVS should not clear these flags while resetting offloads and also
should not copy them to the newly allocated packets.

This change is required to support DPDK 19.11, as some drivers may
return mbufs with these flags set.  However, it might be good to do
the same for DPDK 18.11, because these flags are present and should
be taken into account.

Fixes: 03f3f9c0faf8 ("dpdk: Update to use DPDK 18.11.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-11-28 16:50:10 +01:00
William Tu
0de1b42596 netdev-afxdp: add new netdev type for AF_XDP.
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-19 17:42:06 +03:00
Ilya Maximets
0f706b37d8 dp-packet: Add flow_mark support for non-DPDK case.
Additionally, new API call 'dp_packet_set_flow_mark' is needed
for packet clone. Mostly for dummy HWOL implementation.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-13 11:07:24 +00:00
Ilya Maximets
a47e2db209 dp-packet: Refactor offloading API.
1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
   decapsulation or sending to 'ring' ports.

In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.

Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-13 09:51:30 +00:00
Ilya Maximets
92330af529 dp-packet: Constantify offloading APIs.
Getters should have const arguments.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-02-27 22:28:34 +00:00