Rather than mark with a offload flags + mark with a segmentation size,
simply rely on the netdev implementation which sets a segmentation size
when appropriate.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The DPDK mbuf API specifies 4 status when it comes to L4 checksums:
- RTE_MBUF_F_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
- RTE_MBUF_F_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
- RTE_MBUF_F_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
- RTE_MBUF_F_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
data, but the integrity of the L4 data is verified.
Similarly to the IP checksum offloads API, revise OVS L4 offloads API.
No information about the L4 protocol is provided by any netdev-*
implementation, so OVS needs to mark this L4 protocol during flow
extraction.
Rename current API for consistency with dp_packet_(inner_)?l4_checksum_.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As the packet traverses through OVS, offloading Tx flags must be carefully
evaluated and updated which results in a bit of complexity because of a
separate "outer" Tx offloading flag coming from DPDK API,
and a "normal"/"inner" Tx offloading flag.
On the other hand, the DPDK mbuf API specifies 4 status when it comes to
IP checksums:
- RTE_MBUF_F_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
- RTE_MBUF_F_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
- RTE_MBUF_F_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
- RTE_MBUF_F_RX_IP_CKSUM_NONE: the IP checksum is not correct in the
packet data, but the integrity of the IP header is verified.
This patch changes OVS API so that OVS code only tracks the status of
the checksum of the "current" L3 header and let the Tx flags aspect to
the netdev-* implementations.
With this API, the flow extraction can be cleaned up.
During packet processing, OVS can simply look for the IP checksum validity
(either good, or partial) before changing some IP header, and then mark
the checksum as partial.
In the conntrack case, when natting packets, the checksum status of the
inner part (ICMP error case) must be forced temporarily as unknown
to force checksum resolution.
When tunneling comes into play, IP checksums status is bit-shifted for
future considerations in the processing if, for example, the tunnel
header gets decapsulated again, or in the netdev-* implementations that
support tunnel offloading.
Finally, netdev-* implementations only need to care about packets in
partial status: a good checksum does not need touching, a bad checksum
has been updated by kept as bad by OVS, an unknown checksum is either
an IPv6 or if it was an IPv4, OVS updated it too (keeping it good or bad
accordingly).
Rename current API for consistency with dp_packet_(inner_)?ip_checksum_.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Rather than set bits in the mbuf ol_flags field, that only makes sense
for netdev-dpdk ports, mark packet for tunnel offload in OVS offloads
API.
While at it, since there is nothing really "hardware" related, rename
current API for consistency with dp_packet_tunnel_ prefix.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Expand this helper to clearly separate the non tunnel case from the
tunnel one. This will make later changes easier to read.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As a preparation for tracking inner checksums, separate Rx checksum
status from the DPDK ol_flags field.
To minimize the cost of translating from DPDK API to OVS API, simply map
OVS flags to DPDK Rx mbuf flags.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Flagging packets with IP version is only needed at the netdev-dpdk level.
In most cases, OVS is already inspecting the IP header in packet data,
so maintaining such IP version metadata won't save much cycles
(given the cost of additional branches necessary for handling
outer/inner flags).
Cleanup OVS shared code and only set these flags in netdev-dpdk.c.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As the virtio-net offload API is used for netdev-linux ports, but
provides no information about the potentially encapsulated protocol
concerned by a checksum request, specific information from this netdev-
specific implementation is propagated into OVS code, and must be
carefully evaluated when some tunnel gets decapsulated.
This induces a cost in "normal" processing path, while the netdev-linux
path is not performance critical.
This patch removes such specific information, yet try harder to parse
the packet on the Rx side and set offload flags accordingly for non
encapsulated traffic. For encapsulated traffic, the inner
checksum is computed.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Enhance netdev-dummy:
- add debug log,
- split Rx and Tx aspects,
- add coverage for bad status,
Enhance unit tests:
- enable Tx offloads on the transmitting port,
- test L4 checksums for TCP and UDP (and partial status),
- test IPv6,
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
flow_compose() can generate packets with bad IPv4 checksum, however the
associated Rx flags were not correctly set.
The usefulness of setting this metadata seems limited, yet fix this for
consistency.
Fixes: c62b4ac8f8da ("ovs-ofctl: Implement compose-packet --bare [--bad-csum].")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Rather than drop all pending Tx offloads on recirculation,
preserve inner offloads (and mark packet with outer Tx offloads)
when parsing the packet again.
Fixes: c6538b443984 ("dpif-netdev: Fix crash due to tunnel offloading on recirculation.")
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://issues.redhat.com/browse/FDP-1144
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch extends the userspace datapaths support of tunnel tso from
only supporting VxLAN and Geneve to also supporting GRE tunnels. There
is also a software fallback for cases where the egress netdev does not
support this feature.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When sending packets that are flagged as requiring segmentation to an
interface that does not support this feature, send the packet to the TSO
software fallback instead of dropping it.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
During the transition towards checksum offloading, the function to
handle software fallback of IPv4 checksums didn't account for the
options field.
Fixes: 5d11c47d3ebe ("userspace: Enable IP checksum offloading by default.")
Reported-by: Jun Wang <junwang01@cestc.cn>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-July/053236.html
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
All information required for checksum offloading can be deduced by
already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs
fields.
Remove DPDK specific l[2-4]_len from generic OVS code.
netdev-dpdk code then fills mbuf specifics step by step:
- outer_l2_len and outer_l3_len are needed for tunneling (and below
features),
- l2_len and l3_len are needed for IP and L4 checksum (and below features),
- l4_len and tso_segsz are needed when doing TSO,
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Recirculation involves re-parsing the packet from scratch and that
process is not aware of multiple header levels nor the inner/outer
offsets. So, it overwrites offsets with new ones from the outermost
headers and sets offloading flags that change their meaning when
the packet is marked for tunnel offloading.
For example:
1. TCP packet enters OVS.
2. TCP packet gets encapsulated into UDP tunnel.
3. Recirculation happens.
4. Packet is re-parsed after recirculation with miniflow_extract()
or similar function.
5. Packet is marked for UDP checksumming because we parse the
outermost set of headers. But since it is tunneled, it means
inner UDP checksumming. And that makes no sense, because the
inner packet is TCP.
This is causing packet drops due to malformed packets or even
assertions and crashes in the code that is trying to fixup checksums
for packets using incorrect metadata:
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
lib/packets.c:2061:15: runtime error:
member access within null pointer of type 'struct udp_header'
0 0xbe5221 in packet_udp_complete_csum lib/packets.c:2061:15
1 0x7e5662 in dp_packet_ol_send_prepare lib/dp-packet.c:638:9
2 0x96ef89 in netdev_send lib/netdev.c:940:9
3 0x818e94 in dp_netdev_pmd_flush_output_on_port lib/dpif-netdev.c:5577:9
4 0x817606 in dp_netdev_pmd_flush_output_packets lib/dpif-netdev.c:5618:27
5 0x81cfa5 in dp_netdev_process_rxq_port lib/dpif-netdev.c:5677:9
6 0x7eefe4 in dpif_netdev_run lib/dpif-netdev.c:7001:25
7 0x610e87 in type_run ofproto/ofproto-dpif.c:367:9
8 0x5b9e80 in ofproto_type_run ofproto/ofproto.c:1879:31
9 0x55bbb4 in bridge_run__ vswitchd/bridge.c:3281:9
10 0x558b6b in bridge_run vswitchd/bridge.c:3346:5
11 0x591dc5 in main vswitchd/ovs-vswitchd.c:130:9
12 0x172b89 in __libc_start_call_main (/lib64/libc.so.6+0x27b89)
13 0x172c4a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x27c4a)
14 0x47eff4 in _start (vswitchd/ovs-vswitchd+0x47eff4)
Tests added for both IPv4 and IPv6 cases. Though IPv6 test doesn't
trigger the issue it's better to have a symmetric test.
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053014.html
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously some packets were excluded from the tunnel mark if they
weren't L4. However, this causes problems with multi encapsulated
packets like arp.
Due to these flags being set, additional checks are required in checksum
modification code.
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously a change was added to the vnet prepend code to solve for the
case where no L4 checksum offloading was needed but the L3 checksum
hadn't been calculated. But the added check didn't properly account
for IPv6 traffic.
Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch fixes the correctness of dp_packet_inner_l4_size() when
checking for the existence of an inner L4 header. Previously it checked
for the outer L4 header.
This function is currently only used when a packet is already flagged
for tunneling, so an incorrect determination isn't possible as long as
the flags of the packet are correct.
Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The OVN test suite identified a bug in dp_packet_ol_send_prepare() where
a BFD packet flagged as double encapsulated would trigger a seg fault.
The problem surfaced because bfd_put_packet was reusing a packet
allocated on the stack that wasn't having its flags reset between calls.
This change will reset OL flags as well as the layer offsets in
data_clear(), which should fix this type of packet reuse issue in
general as long as data_clear() is called in between uses.
Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
Reported-by: Dumitru Ceara <dceara@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-300
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, dp_packet_ol_send_prepare() performs multiple checks for
each offloading flag separately. That takes a noticeable amount of
extra cycles for packets that do not have any offloading flags set.
Skip most of the work if no checksumming flags are set.
The change improves performance of direct forwarding between two
virtio-user ports (V2V) by ~2.5 % and offsets all the negative
effects of TSO support introduced recently.
It adds an extra check to the offloading path, but it is not a
default configuration and also should take much smaller hit due
to lower number of larger packets.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch enables most of the tunnel tests in the testsuite, and adds a
large TCP transfer to a vxlan and geneve test to verify TSO
functionality. Some additional changes were required to accommodate these
changes with netdev-linux interfaces. The test for vlan over vxlan is
purposely not enabled as the traffic produced by this test gives
incorrect values in the vnet header.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For userspace datapath, this patch provides vxlan and geneve tunnel tso.
Only support userspace vxlan or geneve tunnel, meanwhile support
tunnel outter and inner csum offload. If netdev do not support offload
features, there is a software fallback.If netdev do not support vxlan
and geneve tso,packets will drop. Front-end devices can close offload
features by ethtool also.
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Dexia Li <dexia.li@jaguarmicro.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When OVS needs to fallback on the software TSO implementation to segment
a packet, it currently doesn't guarantee that IP and TCP checksum
offload flags are set. However, it is possible that these is required.
This is true in the case of dp_netdev_upcall(), which clears these
flags.
This patch explicitly sets the appropriate flags when the segmentation
flag is removed, to guarantee that packets always end up with correct
checksums.
Fixes: 8b5fe2dc6080 ("userspace: Add Generic Segmentation Offloading.")
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This provides a software implementation in the case
the egress netdev doesn't support segmentation in hardware.
The challenge here is to guarantee packet ordering in the
original batch that may be full of TSO packets. Each TSO
packet can go up to ~64kB, so with segment size of 1440
that means about 44 packets for each TSO. Each batch has
32 packets, so the total batch amounts to 1408 normal
packets.
The segmentation estimates the total number of packets
and then the total number of batches. Then allocate
enough memory and finally do the work.
Finally each batch is sent in order to the netdev.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently OVS will calculate the segment size based on the
MTU of the egress port. That usually happens to be correct
when the ports share the same MTU, but that is not always true.
Therefore, if the segment size is provided, then use that and
make sure the over sized packets are dropped.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.
Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.
Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
UB Sanitizer report:
lib/dp-packet.h:587:22: runtime error: member access within misaligned
address 0x000001ecde10 for type 'struct dp_packet', which requires 64
byte alignment
#0 in dp_packet_set_base lib/dp-packet.h:587
#1 in dp_packet_use__ lib/dp-packet.c:46
#2 in dp_packet_use lib/dp-packet.c:60
#3 in dp_packet_init lib/dp-packet.c:126
#4 in dp_packet_new lib/dp-packet.c:150
[...]
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A typical NVGRE encapsulated packet starts with the ETH/IP/GRE
protocols. Miniflow extract will parse just the ETH and IP headers. The
GRE header will be processed later as part of the pop action. Add
support for parsing the ETH/IP headers in this scenario.
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Remove one unused stub: netdev_dpdk_register() can't be called if DPDK
is disabled at build time.
Remove unneeded #ifdef in call to free_dpdk_buf.
Drop unneeded cast when calling free_dpdk_buf.
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different action implementations against the linear
action implementation.
The autovalidator function can be triggered at runtime using the
following command:
$ ovs-appctl odp-execute/action-impl-set autovalidator
Signed-off-by: Emma Finn <emma.finn@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv6 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The function dp_netdev_pmd_flush_output_on_port() iterates over the
p->output_pkts batch directly, when it should be using the special
iterator macro, DP_PACKET_BATCH_FOR_EACH.
However, this wasn't possible because the macro could not accept
&p->output_pkts.
The addition of parentheses when BATCH is dereferenced allows the macro
to expand properly. Parenthesizing arguments in macros is good practice
to be able to handle whichever expressions are passed in.
Signed-off-by: Rosemarie O'Riorden <roriorden@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For packets which don't already have a hash calculated,
miniflow_hash_5tuple() calculates the hash of a packet
using the previously built miniflow.
This commit adds IPv4 profile specific hashing which
uses fixed offsets into the packet to improve hashing
performance.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit adds support for DPDK v21.11, it includes the following
changes.
1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*
5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*
In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.
For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
When a PACKET_OUT has output port of OFPP_TABLE, and the rule
table includes a meter and this causes the packet to be deleted,
execute with a clone of the packet, restoring the original packet
if it is changed by the execution.
Add tests to verify the original issue is fixed, and that the fix
doesn't break tunnel processing.
Reported-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As reported by Wang Liang, the way packets are passed to the ipf module
doesn't allow for use later on in reassembly. Such packets may be get
released anyway, such as during cleanup of tx processing. Because the
ipf module lacks a way of forcing the dp_packet to be retained, it
will later reuse the packet. Instead, just clone the packet and let the
ipf queue own the copy until the queue is destroyed.
After this change, there are no more in-tree users of the batch
'do_not_steal' flag. Thus, we remove it as well.
Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Fixes: 0b3ff31d35f5 ("dp-packet: Add 'do_not_steal' packet batch flag.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
Reported-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Although not required, padding can be optionally added until
the packet length is MTU bytes. A packet with extra padding
currently fails sanity checks.
Vulnerability: CVE-2020-35498
Fixes: fa8d9001a624 ("miniflow_extract: Properly handle small IP packets.")
Reported-by: Joakim Hindersson <joakim.hindersson@elastx.se>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The calculation in tcp_reader_run() failed to account for L2 padding.
This fixes the problem, by moving the existing function
tcp_payload_length() from a conntrack private header file into
dp-packet.h and renaming it to suit the dp_packet style.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
This patch enables TSO support for non-DPDK use cases, and
also add check-system-tso testsuite. Before TSO, we have to
disable checksum offload, allowing the kernel to calculate the
TCP/UDP packet checsum. With TSO, we can skip the checksum
validation by enabling checksum offload, and with large packet
size, we see better performance.
Consider container to container use cases:
iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
And I got around 6Gbps, similar to TSO with DPDK-enabled.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
There is a cache miss when accessing mbuf->data_off while cloning
a batch and using prefetch improved the throughput by ~2.3%.
Before: 13709416.30 pps
After: 14031475.80 pps
Fixes: d48771848560 ("dp-packet: preserve headroom when cloning a pkt batch")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.
A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.
It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.
If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.
It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The headroom is useful if the packet needs to insert additional
header, so preserve the original headroom when cloning the batch.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
'ol_flags' of DPDK mbuf could contain bits responsible for external
or indirect buffers which are not actually offload flags in a common
sense. Clearing/copying of these flags could lead to memory leaks of
external memory chunks and crashes due to access to wrong memory.
OVS should not clear these flags while resetting offloads and also
should not copy them to the newly allocated packets.
This change is required to support DPDK 19.11, as some drivers may
return mbufs with these flags set. However, it might be good to do
the same for DPDK 18.11, because these flags are present and should
be taken into account.
Fixes: 03f3f9c0faf8 ("dpdk: Update to use DPDK 18.11.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology. It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Additionally, new API call 'dp_packet_set_flow_mark' is needed
for packet clone. Mostly for dummy HWOL implementation.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
1. No reason to have mbuf related APIs in a generic code.
2. Not only RSS/checksums should be invalidated in case of tunnel
decapsulation or sending to 'ring' ports.
In order to fix two above issues, new function
'dp_packet_reset_offload' introduced. In order to clean up/unify
the code and simplify addition of new offloading features to non-DPDK
version of dp_packet, introduced 'ol_flags' bitmask. Additionally
reduced code complexity in 'dp_packet_clone_with_headroom' by using
already existent generic APIs.
Unfortunately, we still need to have a special case for mbuf
initialization inside 'dp_packet_init__()'.
'dp_packet_init_specific()' introduced for this purpose as a generic
API for initialization of the implementation-specific fields.
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>