mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-30 13:58:14 +00:00

Author	SHA1	Message	Date
fang	e929e2c20d	ipf: Cancel fragment pkt copy. Canceling packet copying can better improve the performance of handling fragmented packets. In `640d4db`, pkt copying was added to fix the crash, but the crash has been fixed in `7e6b41a`, so there is no need to copy the pkt any longer. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: fang <fangjiannan@cmss.chinamobile.com> Signed-off-by: Aaron Conole <aconole@redhat.com>	2024-12-20 10:07:39 -05:00
Mike Pattrick	16f6885353	ipf: Handle common case of ipf defragmentation. When conntrack is reassembling packet fragments, the same reassembly context can be shared across multiple threads handling different packets simultaneously. Once a full packet is assembled, it is added to a packet batch for processing, in the case where there are multiple different pmd threads accessing conntrack simultaneously, there is a race condition where the reassembled packet may be added to an arbitrary batch even if the current batch is available. When this happens, the packet may be handled incorrectly as it is inserted into a random openflow execution pipeline, instead of the pipeline for that packets flow. This change makes a best effort attempt to try to add the defragmented packet to the current batch. directly. This should succeed most of the time. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Reported-at: https://issues.redhat.com/browse/FDP-560 Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com>	2024-06-05 10:38:52 -04:00
Mike Pattrick	3a6b8c8361	ipf: Only add fragments to batch of same dl_type. When conntrack is reassembling packet fragments, the same reassembly context can be shared across multiple threads handling different packets simultaneously. Once a full packet is assembled, it is added to a packet batch for processing, this is most likely the batch that added it in the first place, but that isn't a guarantee. The packets in these batches should be segregated by network protocol version (ipv4 vs ipv6) for conntrack defragmentation to function appropriately. However, there are conditions where we would add a reassembled packet of one type to a batch of another. This change introduces checks to make sure that reassembled or expired fragments are only added to packet batches of the same type. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Reported-at: https://issues.redhat.com/browse/FDP-560 Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com>	2024-06-05 10:35:32 -04:00
Mike Pattrick	5d11c47d3e	userspace: Enable IP checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the IP checksum was verified and it is GOOD or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner IP header since that is not yet supported. Calculate the IP checksum when the packet is going to be sent over a device that doesn't support the feature. Linux devices don't support IP checksum offload alone, so the support is not enabled. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:49:51 +02:00
Nobuhiro MIKI	349112f975	flow: Support rt_hdr in parse_ipv6_ext_hdrs(). Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers. If it exists, the first address is retrieved. If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in the header are not reported to the caller. Of course, "frag_hdr" and "rt_hdr" are properly parsed inside this function. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 21:41:28 +02:00
Adrian Moreno	e9bf5bffb0	list: use short version of safe loops if possible. Using the SHORT version of the *_SAFE loops makes the code cleaner and less error-prone. So, use the SHORT version and remove the extra variable when possible. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-03-30 16:59:02 +02:00
Aaron Conole	803ed12e31	ipf: release unhandled packets from the batch Since `640d4db788` ("ipf: Fix a use-after-free error, ...") the ipf framework unconditionally allocates a new dp_packet to track individual fragments. This prevents a use-after-free. However, an additional issue was present - even when the packet buffer is cloned, if the ip fragment handling code keeps it, the original buffer is leaked during the refill loop. Even in the original processing code, the hardcoded dnsteal branches would always leak a packet buffer from the refill loop. This can be confirmed with valgrind: ==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390 ==717566== at 0x484086F: malloc (vg_replace_malloc.c:380) ==717566== by 0x537BFD: xmalloc__ (util.c:137) ==717566== by 0x537BFD: xmalloc (util.c:172) ==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153) ==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163) ==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262) ==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511) ==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727) ==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699) ==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957) ==717566== by 0x4331D2: type_run (ofproto-dpif.c:370) ==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768) ==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245) ==717566== by 0x411269: bridge_run (bridge.c:3310) ==717566== by 0x406E6C: main (ovs-vswitchd.c:127) The fix is to delete the original packet when it isn't able to be reinserted into the packet batch. Subsequent valgrind runs show that the packets are not leaked from the batch any longer. Fixes: `640d4db788` ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.") Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Reported-by: Wan Junjie <wanjunjie@bytedance.com> Reported-at: https://github.com/openvswitch/ovs-issues/issues/226 Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Tested-by: Wan Junjie <wanjunjie@bytedance.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-10-12 18:25:44 +03:00
wenxu	ebcbb534e1	ipf: Fix only nat the first fragment in the reass process. The ipf collect original fragment packets and reass a new pkt to do the conntrack logic. After finish the conntrack things copy the ct meta info to each original packet and modify the l4 header in the first fragment. It should modify the ip src/dst info for all the fragments. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Signed-off-by: wenxu <wenxu@ucloud.cn> Co-authored-by: luke.li <luke.li@ucloud.cn> Signed-off-by: luke.li <luke.li@ucloud.cn> Reviewed-by: wangze <wangze712@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Test case: Co-authored-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-09-15 22:01:25 +02:00
Paolo Valerio	2c597c8900	conntrack: add coverage counters for L3 bad checksum. similarly to what already exists for L4, add conntrack_l3csum_err and ipf_l3csum_err for L3. Received packets with L3 bad checksum will increase respectively ipf_l3csum_err if they are fragments and conntrack_l3csum_err otherwise. Although the patch basically covers IPv4, the names are kept generic. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-30 23:56:03 +02:00
Aaron Conole	640d4db788	ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag. As reported by Wang Liang, the way packets are passed to the ipf module doesn't allow for use later on in reassembly. Such packets may be get released anyway, such as during cleanup of tx processing. Because the ipf module lacks a way of forcing the dp_packet to be retained, it will later reuse the packet. Instead, just clone the packet and let the ipf queue own the copy until the queue is destroyed. After this change, there are no more in-tree users of the batch 'do_not_steal' flag. Thus, we remove it as well. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Fixes: `0b3ff31d35` ("dp-packet: Add 'do_not_steal' packet batch flag.") Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html Reported-by: Wang Liang <wangliangrt@didiglobal.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Co-authored-by: Wang Liang <wangliangrt@didiglobal.com> Signed-off-by: Wang Liang <wangliangrt@didiglobal.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-15 10:46:33 +02:00
Peng He	81158b920f	ipf: Avoid accessing to a freed rp. if there are multiple pkts in the batch, the loop will access a freed rp, which cause ovs crash. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Signed-off-by: Peng He <hepeng.0320@bytedance.com> Acked-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-01-13 16:05:21 +01:00
Flavio Leitner	29cf9c1b3b	userspace: Add TCP Segmentation Offload support Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-17 22:27:25 +00:00
Li RongQing	0c3057d595	ipf: bail out when ipf state is COMPLETED it is easy to crash ovs when a packet with same id hits a list that already reassembled completedly but have not been sent out yet, and this packet is not duplicate with this hit ipf list due to bigger offset 1 0x00007f9fef0ae2d9 in __GI_abort () at abort.c:89 2 0x0000000000464042 in ipf_list_state_transition at lib/ipf.c:545 Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Co-authored-by: Wang Li <wangli39@baidu.com> Signed-off-by: Wang Li <wangli39@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-11-21 16:45:54 -08:00
Darrell Ball	206a26e555	ipf: More cleanup. No functional changes here. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-02-25 09:44:52 -08:00
Darrell Ball	0caac1e917	ipf: Handle non-zero L2 padding for first fragments. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-02-22 14:10:58 -08:00
Darrell Ball	9b5136c3de	ipf: Check minimum fragment against L3 size. Fixes: `4ea96698f6` ("Userspace datapath: Add fragmentation handling.") Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-02-22 14:08:36 -08:00
Darrell Ball	e53213365a	ipf: Do not preallocate more than needed. ipf_reassemble_v4_frags() and ipf_reassemble_v6_frags() are preallocating more than needed for the reassembled packet. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-02-22 14:08:05 -08:00
Darrell Ball	6efa20f2e5	ipf: Misc Cleanup. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-02-22 14:07:46 -08:00
Darrell Ball	4ea96698f6	Userspace datapath: Add fragmentation handling. Fragmentation handling is added for supporting conntrack. Both v4 and v6 are supported. After discussion with several people, I decided to not store configuration state in the database to be more consistent with the kernel in future, similarity with other conntrack configuration which will not be in the database as well and overall simplicity. Accordingly, fragmentation handling is enabled by default. This patch enables fragmentation tests for the userspace datapath. Signed-off-by: Darrell Ball <dlu998@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2019-02-14 14:18:56 -08:00

19 Commits