2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 13:58:14 +00:00
Commit Graph

19 Commits

Author SHA1 Message Date
fang
e929e2c20d ipf: Cancel fragment pkt copy.
Canceling packet copying can better improve the performance of handling
fragmented packets. In 640d4db, pkt copying was added to fix the crash,
but the crash has been fixed in 7e6b41a, so there is no need to copy the
pkt any longer.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: fang <fangjiannan@cmss.chinamobile.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
2024-12-20 10:07:39 -05:00
Mike Pattrick
16f6885353 ipf: Handle common case of ipf defragmentation.
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, in the case where there are multiple different pmd
threads accessing conntrack simultaneously, there is a race condition
where the reassembled packet may be added to an arbitrary batch even if
the current batch is available.

When this happens, the packet may be handled incorrectly as it is
inserted into a random openflow execution pipeline, instead of the
pipeline for that packets flow.

This change makes a best effort attempt to try to add the defragmented
packet to the current batch. directly. This should succeed most of the
time.

Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
2024-06-05 10:38:52 -04:00
Mike Pattrick
3a6b8c8361 ipf: Only add fragments to batch of same dl_type.
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, this is most likely the batch that added it in the
first place, but that isn't a guarantee.

The packets in these batches should be segregated by network protocol
version (ipv4 vs ipv6) for conntrack defragmentation to function
appropriately. However, there are conditions where we would add a
reassembled packet of one type to a batch of another.

This change introduces checks to make sure that reassembled or expired
fragments are only added to packet batches of the same type.

Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
2024-06-05 10:35:32 -04:00
Mike Pattrick
5d11c47d3e userspace: Enable IP checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.

Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.

Linux devices don't support IP checksum offload alone, so the
support is not enabled.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:49:51 +02:00
Nobuhiro MIKI
349112f975 flow: Support rt_hdr in parse_ipv6_ext_hdrs().
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.

If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-03-29 21:41:28 +02:00
Adrian Moreno
e9bf5bffb0 list: use short version of safe loops if possible.
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Aaron Conole
803ed12e31 ipf: release unhandled packets from the batch
Since 640d4db788 ("ipf: Fix a use-after-free error, ...") the ipf
framework unconditionally allocates a new dp_packet to track
individual fragments.  This prevents a use-after-free.  However, an
additional issue was present - even when the packet buffer is cloned,
if the ip fragment handling code keeps it, the original buffer is
leaked during the refill loop.  Even in the original processing code,
the hardcoded dnsteal branches would always leak a packet buffer from
the refill loop.

This can be confirmed with valgrind:

==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390
==717566==    at 0x484086F: malloc (vg_replace_malloc.c:380)
==717566==    by 0x537BFD: xmalloc__ (util.c:137)
==717566==    by 0x537BFD: xmalloc (util.c:172)
==717566==    by 0x46DDD4: dp_packet_new (dp-packet.c:153)
==717566==    by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163)
==717566==    by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262)
==717566==    by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511)
==717566==    by 0x4AB7E0: netdev_rxq_recv (netdev.c:727)
==717566==    by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699)
==717566==    by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957)
==717566==    by 0x4331D2: type_run (ofproto-dpif.c:370)
==717566==    by 0x41DFD8: ofproto_type_run (ofproto.c:1768)
==717566==    by 0x40A7FB: bridge_run__ (bridge.c:3245)
==717566==    by 0x411269: bridge_run (bridge.c:3310)
==717566==    by 0x406E6C: main (ovs-vswitchd.c:127)

The fix is to delete the original packet when it isn't able to be
reinserted into the packet batch.  Subsequent valgrind runs show that
the packets are not leaked from the batch any longer.

Fixes: 640d4db788 ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.")
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Reported-by: Wan Junjie <wanjunjie@bytedance.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/226
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Wan Junjie <wanjunjie@bytedance.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2021-10-12 18:25:44 +03:00
wenxu
ebcbb534e1 ipf: Fix only nat the first fragment in the reass process.
The ipf collect original fragment packets and reass a new pkt
to do the conntrack logic.  After finish the conntrack things
copy the ct meta info to each original packet and modify the
l4 header in the first fragment. It should modify the ip src/dst
info for all the fragments.

Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Co-authored-by: luke.li <luke.li@ucloud.cn>
Signed-off-by: luke.li <luke.li@ucloud.cn>
Reviewed-by: wangze <wangze712@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>

Test case:
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-15 22:01:25 +02:00
Paolo Valerio
2c597c8900 conntrack: add coverage counters for L3 bad checksum.
similarly to what already exists for L4, add conntrack_l3csum_err
and ipf_l3csum_err for L3.

Received packets with L3 bad checksum will increase respectively
ipf_l3csum_err if they are fragments and conntrack_l3csum_err
otherwise.

Although the patch basically covers IPv4, the names are kept generic.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-30 23:56:03 +02:00
Aaron Conole
640d4db788 ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.
As reported by Wang Liang, the way packets are passed to the ipf module
doesn't allow for use later on in reassembly.  Such packets may be get
released anyway, such as during cleanup of tx processing.  Because the
ipf module lacks a way of forcing the dp_packet to be retained, it
will later reuse the packet.  Instead, just clone the packet and let the
ipf queue own the copy until the queue is destroyed.

After this change, there are no more in-tree users of the batch
'do_not_steal' flag.  Thus, we remove it as well.

Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Fixes: 0b3ff31d35 ("dp-packet: Add 'do_not_steal' packet batch flag.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
Reported-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-15 10:46:33 +02:00
Peng He
81158b920f ipf: Avoid accessing to a freed rp.
if there are multiple pkts in the batch, the loop will access a
freed rp, which cause ovs crash.

Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-01-13 16:05:21 +01:00
Flavio Leitner
29cf9c1b3b userspace: Add TCP Segmentation Offload support
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 22:27:25 +00:00
Li RongQing
0c3057d595 ipf: bail out when ipf state is COMPLETED
it is easy to crash ovs when a packet with same id
hits a list that already reassembled completedly
but have not been sent out yet, and this packet is
not duplicate with this hit ipf list due to bigger
offset

    1  0x00007f9fef0ae2d9 in __GI_abort () at abort.c:89
    2  0x0000000000464042 in ipf_list_state_transition at lib/ipf.c:545

Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Co-authored-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-11-21 16:45:54 -08:00
Darrell Ball
206a26e555 ipf: More cleanup.
No functional changes here.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-25 09:44:52 -08:00
Darrell Ball
0caac1e917 ipf: Handle non-zero L2 padding for first fragments.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-22 14:10:58 -08:00
Darrell Ball
9b5136c3de ipf: Check minimum fragment against L3 size.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-22 14:08:36 -08:00
Darrell Ball
e53213365a ipf: Do not preallocate more than needed.
ipf_reassemble_v4_frags() and ipf_reassemble_v6_frags() are
preallocating more than needed for the reassembled packet.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-22 14:08:05 -08:00
Darrell Ball
6efa20f2e5 ipf: Misc Cleanup.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-22 14:07:46 -08:00
Darrell Ball
4ea96698f6 Userspace datapath: Add fragmentation handling.
Fragmentation handling is added for supporting conntrack.
Both v4 and v6 are supported.

After discussion with several people, I decided to not store
configuration state in the database to be more consistent with
the kernel in future, similarity with other conntrack configuration
which will not be in the database as well and overall simplicity.
Accordingly, fragmentation handling is enabled by default.

This patch enables fragmentation tests for the userspace datapath.

Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-02-14 14:18:56 -08:00