Canceling packet copying can better improve the performance of handling
fragmented packets. In 640d4db, pkt copying was added to fix the crash,
but the crash has been fixed in 7e6b41a, so there is no need to copy the
pkt any longer.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: fang <fangjiannan@cmss.chinamobile.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, in the case where there are multiple different pmd
threads accessing conntrack simultaneously, there is a race condition
where the reassembled packet may be added to an arbitrary batch even if
the current batch is available.
When this happens, the packet may be handled incorrectly as it is
inserted into a random openflow execution pipeline, instead of the
pipeline for that packets flow.
This change makes a best effort attempt to try to add the defragmented
packet to the current batch. directly. This should succeed most of the
time.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
When conntrack is reassembling packet fragments, the same reassembly
context can be shared across multiple threads handling different packets
simultaneously. Once a full packet is assembled, it is added to a packet
batch for processing, this is most likely the batch that added it in the
first place, but that isn't a guarantee.
The packets in these batches should be segregated by network protocol
version (ipv4 vs ipv6) for conntrack defragmentation to function
appropriately. However, there are conditions where we would add a
reassembled packet of one type to a batch of another.
This change introduces checks to make sure that reassembled or expired
fragments are only added to packet batches of the same type.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Reported-at: https://issues.redhat.com/browse/FDP-560
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers.
If it exists, the first address is retrieved.
If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in
the header are not reported to the caller. Of course, "frag_hdr" and
"rt_hdr" are properly parsed inside this function.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Since 640d4db788 ("ipf: Fix a use-after-free error, ...") the ipf
framework unconditionally allocates a new dp_packet to track
individual fragments. This prevents a use-after-free. However, an
additional issue was present - even when the packet buffer is cloned,
if the ip fragment handling code keeps it, the original buffer is
leaked during the refill loop. Even in the original processing code,
the hardcoded dnsteal branches would always leak a packet buffer from
the refill loop.
This can be confirmed with valgrind:
==717566== 16,672 (4,480 direct, 12,192 indirect) bytes in 8 blocks are definitely lost in loss record 390 of 390
==717566== at 0x484086F: malloc (vg_replace_malloc.c:380)
==717566== by 0x537BFD: xmalloc__ (util.c:137)
==717566== by 0x537BFD: xmalloc (util.c:172)
==717566== by 0x46DDD4: dp_packet_new (dp-packet.c:153)
==717566== by 0x46DDD4: dp_packet_new_with_headroom (dp-packet.c:163)
==717566== by 0x550AA6: netdev_linux_batch_rxq_recv_sock.constprop.0 (netdev-linux.c:1262)
==717566== by 0x5512AF: netdev_linux_rxq_recv (netdev-linux.c:1511)
==717566== by 0x4AB7E0: netdev_rxq_recv (netdev.c:727)
==717566== by 0x47F00D: dp_netdev_process_rxq_port (dpif-netdev.c:4699)
==717566== by 0x47FD13: dpif_netdev_run (dpif-netdev.c:5957)
==717566== by 0x4331D2: type_run (ofproto-dpif.c:370)
==717566== by 0x41DFD8: ofproto_type_run (ofproto.c:1768)
==717566== by 0x40A7FB: bridge_run__ (bridge.c:3245)
==717566== by 0x411269: bridge_run (bridge.c:3310)
==717566== by 0x406E6C: main (ovs-vswitchd.c:127)
The fix is to delete the original packet when it isn't able to be
reinserted into the packet batch. Subsequent valgrind runs show that
the packets are not leaked from the batch any longer.
Fixes: 640d4db788 ("ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.")
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Reported-by: Wan Junjie <wanjunjie@bytedance.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/226
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Tested-by: Wan Junjie <wanjunjie@bytedance.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
The ipf collect original fragment packets and reass a new pkt
to do the conntrack logic. After finish the conntrack things
copy the ct meta info to each original packet and modify the
l4 header in the first fragment. It should modify the ip src/dst
info for all the fragments.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Co-authored-by: luke.li <luke.li@ucloud.cn>
Signed-off-by: luke.li <luke.li@ucloud.cn>
Reviewed-by: wangze <wangze712@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Test case:
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
similarly to what already exists for L4, add conntrack_l3csum_err
and ipf_l3csum_err for L3.
Received packets with L3 bad checksum will increase respectively
ipf_l3csum_err if they are fragments and conntrack_l3csum_err
otherwise.
Although the patch basically covers IPv4, the names are kept generic.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As reported by Wang Liang, the way packets are passed to the ipf module
doesn't allow for use later on in reassembly. Such packets may be get
released anyway, such as during cleanup of tx processing. Because the
ipf module lacks a way of forcing the dp_packet to be retained, it
will later reuse the packet. Instead, just clone the packet and let the
ipf queue own the copy until the queue is destroyed.
After this change, there are no more in-tree users of the batch
'do_not_steal' flag. Thus, we remove it as well.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Fixes: 0b3ff31d35 ("dp-packet: Add 'do_not_steal' packet batch flag.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
Reported-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
if there are multiple pkts in the batch, the loop will access a
freed rp, which cause ovs crash.
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Mark Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.
A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.
It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.
If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.
It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
it is easy to crash ovs when a packet with same id
hits a list that already reassembled completedly
but have not been sent out yet, and this packet is
not duplicate with this hit ipf list due to bigger
offset
1 0x00007f9fef0ae2d9 in __GI_abort () at abort.c:89
2 0x0000000000464042 in ipf_list_state_transition at lib/ipf.c:545
Fixes: 4ea96698f6 ("Userspace datapath: Add fragmentation handling.")
Co-authored-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ipf_reassemble_v4_frags() and ipf_reassemble_v6_frags() are
preallocating more than needed for the reassembled packet.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Fragmentation handling is added for supporting conntrack.
Both v4 and v6 are supported.
After discussion with several people, I decided to not store
configuration state in the database to be more consistent with
the kernel in future, similarity with other conntrack configuration
which will not be in the database as well and overall simplicity.
Accordingly, fragmentation handling is enabled by default.
This patch enables fragmentation tests for the userspace datapath.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>