mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

Author	SHA1	Message	Date
David Marchand	2956a61265	dp-packet: Rework L4 checksum offloads. The DPDK mbuf API specifies 4 status when it comes to L4 checksums: - RTE_MBUF_F_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum - RTE_MBUF_F_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong - RTE_MBUF_F_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid - RTE_MBUF_F_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet data, but the integrity of the L4 data is verified. Similarly to the IP checksum offloads API, revise OVS L4 offloads API. No information about the L4 protocol is provided by any netdev-* implementation, so OVS needs to mark this L4 protocol during flow extraction. Rename current API for consistency with dp_packet_(inner_)?l4_checksum_. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:02:56 +02:00
David Marchand	67abd51540	dp-packet: Rework tunnel offloads. Rather than set bits in the mbuf ol_flags field, that only makes sense for netdev-dpdk ports, mark packet for tunnel offload in OVS offloads API. While at it, since there is nothing really "hardware" related, rename current API for consistency with dp_packet_tunnel_ prefix. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:00:48 +02:00
David Marchand	d29ba0abdc	dp-packet: Add OVS offloading API. As a preparation for tracking inner checksums, separate Rx checksum status from the DPDK ol_flags field. To minimize the cost of translating from DPDK API to OVS API, simply map OVS flags to DPDK Rx mbuf flags. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:00:34 +02:00
David Marchand	19ef1b1f0f	dp-packet: Remove DPDK specific IP version. Flagging packets with IP version is only needed at the netdev-dpdk level. In most cases, OVS is already inspecting the IP header in packet data, so maintaining such IP version metadata won't save much cycles (given the cost of additional branches necessary for handling outer/inner flags). Cleanup OVS shared code and only set these flags in netdev-dpdk.c. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 20:59:22 +02:00
David Marchand	52fdeda11a	dp-packet: Remove Linux specific L4 offloads. As the virtio-net offload API is used for netdev-linux ports, but provides no information about the potentially encapsulated protocol concerned by a checksum request, specific information from this netdev- specific implementation is propagated into OVS code, and must be carefully evaluated when some tunnel gets decapsulated. This induces a cost in "normal" processing path, while the netdev-linux path is not performance critical. This patch removes such specific information, yet try harder to parse the packet on the Rx side and set offload flags accordingly for non encapsulated traffic. For encapsulated traffic, the inner checksum is computed. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 20:59:04 +02:00
David Marchand	e99ce7d5df	flow: Fix checksum offloads with simple match. Packets with L4 partial status for a simple match flow would not get L4 checksums offloads applied. This was not caught in unit tests, because packets from netdev-dummy (calling miniflow_extract) would get Tx flags set early, before parse_tcp_flags() got called during packet processing. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-05-30 18:00:56 +02:00
David Marchand	5603b869a8	netdev-linux: Fix offloads for IPv6 UDP packets. Caught by code review, offloading checksum of IPv6 UDP packets was wrong as the IPv6 header used for the pseudo header checksum was wrong. Fixes: cb0cbffbe8fb ("netdev-linux: Favour inner packet for multi-encapsulated TSO.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-05-21 23:52:54 +02:00
Ilya Maximets	77ac0b28c8	netdev: Always clear struct ifreq before ioctl. It's not nice to send random stack memory to a kernel over ioctl: Uninitialized bytes in ioctl_common_pre at offset 20 inside [0x7fff8899f3c0, 40) WARNING: MemorySanitizer: use-of-uninitialized-value 0 0x1180b7 in af_inet_ioctl lib/socket-util-unix.c:417:15 1 0x1180ff in af_inet_ifreq_ioctl lib/socket-util-unix.c:428:13 2 0x11e4fd in netdev_linux_set_mtu lib/netdev-linux.c:2005:13 3 0xba250b in netdev_set_mtu lib/netdev.c:1132:30 4 0x59a19f in update_mtu_ofproto ofproto/ofproto.c:3042:18 5 0x596947 in update_mtu ofproto/ofproto.c:3024:5 6 0x5976d6 in ofport_install ofproto/ofproto.c:2617:5 7 0x572e96 in update_port ofproto/ofproto.c:2893:21 8 0x57636b in ofproto_port_add ofproto/ofproto.c:2208:17 9 0x4f9e0b in iface_do_create vswitchd/bridge.c:2203:13 10 0x4f7f43 in iface_create vswitchd/bridge.c:2246:13 11 0x4f7a63 in bridge_add_ports__ vswitchd/bridge.c:1225:21 12 0x4d8ea4 in bridge_add_ports vswitchd/bridge.c:1241:5 13 0x4cce3a in bridge_reconfigure vswitchd/bridge.c:952:9 14 0x4ca156 in bridge_run vswitchd/bridge.c:3439:9 15 0x5278e7 in main vswitchd/ovs-vswitchd.c:137:9 16 0x7f9def in __libc_start_call_main 17 0x7f9def in __libc_start_main@GLIBC_2.2.5 18 0x432b14 in _start (vswitchd/ovs-vswitchd+0x432b14) We do already initialize this structure for a few ioctls, let's do that for all of them. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-12-02 20:22:01 +01:00
Ilya Maximets	ea2bd0d6fd	netdev-linux: Fix unaligned access to rpl_rtnl_link_stats64. Compilers are allowed to make alignment assumptions based on the pointer type passed to memcpy(). We have to cast before using it to avoid misaligned reads: lib/netdev-linux.c:6738:41: runtime error: load of misaligned address 0x51b000008db4 for type 'const struct rpl_rtnl_link_stats64 ', which requires 8 byte alignment 0x51b000008db4: note: pointer points here cc 00 17 00 01 00 00 00 00 00 00 00 01 00 00 00 ^ 0 0xad2d8 in get_stats_via_netlink lib/netdev-linux.c:6738:17 1 0x9dcce in netdev_linux_get_stats lib/netdev-linux.c:2293:13 2 0xee5b8 in netdev_get_stats lib/netdev.c:1722:16 3 0xc07ad in iface_refresh_stats vswitchd/bridge.c:2795:5 4 0xbf4fc in iface_create vswitchd/bridge.c:2274:5 5 0xbf4fc in bridge_add_ports__ vswitchd/bridge.c:1225:21 6 0x989fc in bridge_add_ports vswitchd/bridge.c:1236:5 7 0x989fc in bridge_reconfigure vswitchd/bridge.c:952:9 8 0x9102a in bridge_run vswitchd/bridge.c:3439:9 9 0xc8976 in main vswitchd/ovs-vswitchd.c:137:9 10 0x2a1c9 in __libc_start_call_main 11 0x2a28a in __libc_start_main 12 0xb3ff4 in _start (vswitchd/ovs-vswitchd+0x734ff4) Reported by Clang 18 with UBsan on Ubuntu 24.04. Casting the original 'void ' pointer returned from nl_attr_get() call to 'struct rpl_rtnl_link_stats64 *' also technically constitutes an undefined behavior, since the resulting pointer is not correctly aligned for the referenced type, but compilers do not insist on that today. Fixes: c8c49a9db9f2 ("netdev-linux: Properly access 32-bit aligned rtnl_link_stats64 structs.") Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-08-14 21:58:25 +02:00
David Marchand	4ba65112dc	netdev-linux: Do not offload IP checksum. Offloading checksum with a virtio_net_hdr header works only for L4. Remove false claim that IP checksum offloading is supported. As a consequence, L3 checksum can be assumed to be correct when preparing a L4 offload request. Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-08-13 15:38:30 +02:00
Eelco Chaudron	f2e4195b0c	netdev-linux: Fix uninitialized gso_type case. This patch fixes an uninitialized gso_type case in netdev_linux_prepend_vnet_hdr() by returning an error. Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-06-03 13:27:38 +02:00
Eelco Chaudron	18436bed85	netdev-linux: Fix possible int overflow in tc_add_matchall_policer(). Fix unintentional integer overflow reported by Coverity by adding the ULL suffix to the numerical literals used in the multiplications. Fixes: ed2300cca0d3 ("netdev-linux: Refactor put police action netlink message") Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-06-03 13:23:11 +02:00
Mike Pattrick	4d25656a23	netdev-linux: Initialize link speed in error conditions. Clang's static analyzer noted that the output from netdev_linux_get_speed_locked can be checked even if this function doesn't set any values. Now we always set those values to a sane default in all cases. Fixes: 19cffe30cfda ("netdev-linux: Avoid deadlock in netdev_get_speed.") Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-05-28 22:18:55 +02:00
Mike Pattrick	0c1b9b015d	netdev-linux: Return an error if device feature names are empty. When retrieving a list of features supported by a network card, return with an error code if the request completed without an error but the list contains zero entries. In practice this should never happen, but it does contribute to a detection in Clang's static analyzer. Fixes: 6c59c195266c ("netdev-linux: Use ethtool to detect offload support.") Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-05-28 22:18:55 +02:00
Roi Dayan	ce1663b5fe	netdev-linux: Fix ethtool_cmd is partly outside array bounds. Used Ubuntu with gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0 lib/netdev-linux.c: In function 'netdev_linux_construct': lib/netdev-linux.c:7003:15: error: array subscript 'struct ethtool_cmd[0]' is partly outside array bounds of ‘union <anonymous>[1]’ [-Werror=array-bounds] 7003 \| ecmd->cmd = cmd; \| ~~~~~~~~~~^~~~~ lib/netdev-linux.c:2411:7: note: while referencing ‘sset_info’ 2411 \| } sset_info; \| ^~~~~~~~~ Fixes: 6c59c195266c ("netdev-linux: Use ethtool to detect offload support.") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-05-28 22:17:35 +02:00
Felix Huettner	6439d694ae	util: Support checking for kernel versions. Extract checking for a given kernel version to a separate function. It will be used also in the next patch. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-04-05 20:21:19 +02:00
Simon Horman	b3ebc34a06	netdev-linux: Rename local variables as primary_*. Recently OVS adopted a policy of using the inclusive naming word list v1 [1, 2]. This patch partially addresses the use of the term master in the context of LAG devices by using the term primary instead: the local variables master_netdev and master_name are renamed as primary_netdev and primary_name. Related comments are also updated. No functional change intended. [1] df5e5cf ("Documentation: Add section on inclusive language.") [2] https://inclusivenaming.org/word-lists/ Signed-off-by: Simon Horman <horms@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2024-03-06 10:12:12 +00:00
Simon Horman	f92b30a0ff	netdev-linux: Rename struct nedev_linux field as is_lag_primary. Recently OVS adopted a policy of using the inclusive naming word list v1 [1, 2]. This patch partially addresses the use of the term master in the context of LAG devices by using the term primary instead: the is_lag_master field of struct netdev_linux is renamed is_lag_primary. A related comment is also updated. No functional change intended. [1] df5e5cf ("Documentation: Add section on inclusive language.") [2] https://inclusivenaming.org/word-lists/ Signed-off-by: Simon Horman <horms@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2024-03-06 10:10:58 +00:00
Mike Pattrick	a2d4ad651d	netdev-linux: Only repair IP checksum in IPv4. Previously a change was added to the vnet prepend code to solve for the case where no L4 checksum offloading was needed but the L3 checksum hadn't been calculated. But the added check didn't properly account for IPv6 traffic. Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-02-16 15:23:25 +01:00
Mike Pattrick	cb0cbffbe8	netdev-linux: Favour inner packet for multi-encapsulated TSO. Previously if an OVS configuration nested multiple layers of UDP tunnels like VXLAN or GENEVE on top of each other through netdev-linux interfaces, the vnet header would be incorrectly set to the outermost UDP tunnel layer instead of the intermediary tunnel layer. This resulted in the middle UDP tunnel not checksum offloading properly. Fixes: 85bcbbed839a ("userspace: Enable tunnel tests with TSO.") Reported-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-02-16 15:23:19 +01:00
Adrian Moreno	19cffe30cf	netdev-linux: Avoid deadlock in netdev_get_speed. netdev_linux_get_speed needs to lock netdev_linux->mutex, and so do the internal tc operations. Therefore, the former cannot be called from the latter. Create a lock-free version of netdev_linux_get_speed() and call it from tc operations. Also expand the unit test to cover queues where ceil is determined by the maximum link speed. Fixes: b8f8fad86435 ("netdev-linux: Use speed as max rate in tc classes.") Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-January/052912.html Reported-by: Daryl Wang <darylywang@google.com> Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-02-13 21:27:50 +01:00
Mike Pattrick	85bcbbed83	userspace: Enable tunnel tests with TSO. This patch enables most of the tunnel tests in the testsuite, and adds a large TCP transfer to a vxlan and geneve test to verify TSO functionality. Some additional changes were required to accommodate these changes with netdev-linux interfaces. The test for vlan over vxlan is purposely not enabled as the traffic produced by this test gives incorrect values in the vnet header. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-01-17 22:06:51 +01:00
Flavio Leitner	e0056018c4	userspace: Respect tso/gso segment size. Currently OVS will calculate the segment size based on the MTU of the egress port. That usually happens to be correct when the ports share the same MTU, but that is not always true. Therefore, if the segment size is provided, then use that and make sure the over sized packets are dropped. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-02 00:56:36 +01:00
Mike Pattrick	6c59c19526	netdev-linux: Use ethtool to detect offload support. Currently when userspace-tso is enabled, netdev-linux interfaces will indicate support for all offload flags regardless of interface configuration. This patch checks for which offload features are enabled during netdev construction. Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-02 00:56:36 +01:00
David Marchand	b561bbdc27	netdev-afxdp: Postpone libbpf logging helper registration. When using net/af_xdp DPDK driver along OVS native AF_XDP support, confusing logs are reported, like: netdev_dpdk\|INFO\|Device 'net_af_xdpp0,iface=ovs-p0' attached to DPDK dpif_netdev\|INFO\|PMD thread on numa_id: 0, core id: 11 created. dpif_netdev\|INFO\|There are 1 pmd threads on numa node 0 dpdk\|INFO\|Device with port_id=0 already stopped dpdk(pmd-c11/id:22)\|INFO\|PMD thread uses DPDK lcore 1. netdev_dpdk\|WARN\|Rx checksum offload is not supported on port 0 netdev_afxdp\|INFO\|libbpf: elf: skipping unrecognized data section(6) .xdp_run_config netdev_afxdp\|INFO\|libbpf: elf: skipping unrecognized data section(7) xdp_metadata netdev_afxdp\|INFO\|libbpf: elf: skipping unrecognized data section(7) xdp_metadata netdev_afxdp\|INFO\|libbpf: elf: skipping unrecognized data section(7) xdp_metadata This comes from the fact that netdev-afxdp unconditionnally registers a helper for logging libbpf messages. Making both net/af_xdp and netdev-afxdp work at the same time seems difficult, so at least, ensure that netdev-afxdp won't register this helper unless a netdev is actually allocated. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <horms@ovn.org>	2023-11-21 09:40:11 +00:00
Jakob Meng	d614f2863f	netdev-afxdp: Sync and clean {get, set}_config() callbacks. For better usability, the function pairs get_config() and set_config() for netdevs should be symmetric: Options which are accepted by set_config() should be returned by get_config() and the latter should output valid options for set_config() only. This patch also moves key-value pairs which are not valid options from get_config() to the get_status() callback. The documentation in vswitchd/vswitch.xml for status columns has been updated accordingly. Reported-at: https://bugzilla.redhat.com/1949855 Signed-off-by: Jakob Meng <code@jakobmeng.de> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2023-11-14 11:03:28 +00:00
Adrian Moreno	07ce41da11	netdev-linux: Support 64-bit rates in tc policing. Use TCA_POLICE_RATE64 if the rate cannot be expressed using 32bits. This breaks the 32Gbps barrier. The new barrier is ~4Tbps caused by netdev's API expressing kbps rates using 32-bit integers. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137643 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:54 +02:00
Adrian Moreno	68ac6e9db7	netdev-linux: Refactor nl_msg_put_act_police. In preparation for supporting 64-bit rates in tc policies, move the allocation and initialization of struct tc_police object inside nl_msg_put_act_police(). That way, the function is now called with the actual rates. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:54 +02:00
Adrian Moreno	13e183da31	netdev-linux: Remove tc_matchall_fill_police. It is equivalent to tc_policer_init() so remove the duplicated function. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:54 +02:00
Adrian Moreno	a86fea06fe	netdev-linux: Use 64-bit rates in htb tc classes. Currently, htb rates are capped at ~34Gbps because they are internally expressed as 32-bit fields. Move min and max rates to 64-bit fields and use TCA_HTB_RATE64 and TCA_HTB_CEIL64 to configure HTC classes to break this barrier. In order to test this, create a dummy tuntap device and set it's speed to a very high value so we can try adding a QoS queue with big rates. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137619 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:54 +02:00
Adrian Moreno	7edfac5745	netdev-linux: Use 64bit rtab and burst calculations. tc uses these "rtab" tables to estimate the time (ticks) that it takes to send a packet of different sizes. In preparation for the introduction of 64-bit rates, add an argument to tc_put_rtab() to allow an external 64-bit rate. Also use 64bits for other burst buffer calculation functions. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:54 +02:00
Adrian Moreno	b8f8fad864	netdev-linux: Use speed as max rate in tc classes. Instead of relying on feature bits, use the speed value directly as maximum rate for htb and hfsc classes. There is still a limitation with the maximum rate that we can express with a 32-bit number in bytes/s (~ 34.3Gbps), but using the actual link speed instead of the feature bits, we can at least use an accurate maximum for some link speeds (such as 25Gbps) which are not supported by netdev's feature bits. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:54 +02:00
Adrian Moreno	6240c0b4c8	netdev: Add netdev_get_speed() to netdev API. Currently, the netdev's speed is being calculated by taking the link's feature bits (using netdev_get_features()) and transforming them into bps. This mechanism can be both inaccurate and difficult to maintain, mainly because we currently use the feature bits supported by OpenFlow which would have to be extended to support all new feature bits of all netdev implementations while keeping the OpenFlow API intact. In order to expose the link speed accurately for all current and future hardware, add a new netdev API call that allows the implementations to provide the current and maximum link speeds in Mbps. Internally, the logic to get the maximum supported speed still relies on feature bits so it might still get out of sync in the future. However, the maximum configurable speed is not used as much as the current speed and these feature bits are not exposed through the netdev interface so it should be easier to add more. Use this new function instead of netdev_get_features() where the link speed is needed. As a consequence of this patch, link speeds of cards is properly reported (internally in OVSDB) even if not supported by OpenFlow. A test verifies this behavior using a tap device. Also, in order to avoid using the old, this patch adds a checkpatch.py warning if the old API is used. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-07-17 20:03:32 +02:00
Mike Pattrick	3337e6d91c	userspace: Enable L4 checksum offloading by default. The netdev receiving packets is supposed to provide the flags indicating if the L4 checksum was verified and it is OK or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner L4 header since that is not yet supported. Calculate the L4 checksum when the packet is going to be sent over a device that doesn't support the feature. Linux tap devices allows enabling L3 and L4 offload, so this patch enables the feature. However, Linux socket interface remains disabled because the API doesn't allow enabling those two features without enabling TSO too. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-15 23:50:30 +02:00
Frode Nordahl	106ef21860	tc: Fix crash on malformed reply from kernel. The tc module combines the use of the `tc_transact` helper function for communication with the in-kernel tc infrastructure with assertions on the reply data by `ofpbuf_at_assert` on the received data prior to further processing. With the presence of bugs on the kernel side, we need to treat the kernel as an unreliable service provider and replace assertions on the reply from it with checks to avoid a fatal crash of OVS. For the record, the symptom of the crash is this in the log: EMER\|include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() And an excerpt of the backtrace looks like this: ofpbuf_at_assert (offset=16, size=20) at include/openvswitch/ofpbuf.h:194 tc_replace_flower at lib/tc.c:3223 netdev_tc_flow_put at lib/netdev-offload-tc.c:2096 netdev_flow_put at lib/netdev-offload.c:257 parse_flow_put at lib/dpif-netlink.c:2297 try_send_to_netdev at lib/dpif-netlink.c:2384 Reported-At: https://launchpad.net/bugs/2018500 Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action") Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload") Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Fixes: c1c9c9c4b636 ("Implement QoS framework.") Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-06-07 22:46:45 +02:00
Miika Petäjäniemi	a6195e2c42	netdev-linux: Add jitter parameter to the netem qos options. Adds jitter option to enable emulating latency fluctuation with netem. Submitted-at: https://github.com/openvswitch/ovs/pull/407 Signed-off-by: Miika Petäjäniemi <miika.petajaniemi@solita.fi> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-21 14:25:57 +01:00
wangchuanlei	e22e1f6725	dpctl: Add support to count upcall packets. Add support to count upcall packets per port, both succeed and failed, which is a better way to see how many packets upcalled on each interface. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: wangchuanlei <wangchuanlei@inspur.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-31 17:40:50 +01:00
Eelco Chaudron	e1e5eac5b0	tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock(). A long long time ago, an effort was made to make tc flower rtnl_lock() free. However, on the OVS part we forgot to add the TCA_KIND "flower" attribute, which tell the kernel to skip the lock. This patch corrects this by adding the attribute for the delete and get operations. The kernel code calls tcf_proto_is_unlocked() to determine the rtnl_lock() is needed for the specific tc protocol. It does this in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter(). If the name is not set, tcf_proto_is_unlocked() will always return false. If set, the specific protocol is queried for unlocked support. Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-30 21:12:31 +01:00
Ilya Maximets	b22c4d8403	netdev: Assume default link speed to be 10 Gbps instead of 100 Mbps. 100 Mbps was a fair assumption 13 years ago. Modern days 10 Gbps seems like a good value in case no information is available otherwise. The change mainly affects QoS which is currently limited to 100 Mbps if the user didn't specify 'max-rate' and the card doesn't report the speed or OVS doesn't have a predefined enumeration for the speed reported by the NIC. Calculation of the path cost for STP/RSTP is also affected if OVS is unable to determine the link speed. Lower link speed adapters are typically good at reporting their speed, so chances for overshoot should be low. But newer high-speed adapters, for which there is no speed enumeration or if there are some other issues, will not suffer that much. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-11-30 14:42:59 +01:00
Ilya Maximets	02be2c318c	netdev-linux: Fix inability to apply QoS on ports with custom qdiscs. tc_del_qdisc() function only removes qdiscs with handle '1:0'. If for some reason the interface has a qdisc with non-zero handle attached, tc_del_qdisc() will not delete it and subsequent tc_install() will fail to install a new qdisc. The problem is that Libvirt by default is setting noqueue qdisc for all tap interfaces it creates. This is done for performance reasons to ensure lockless xmit. The issue is causing non-working QoS in OpenStack setups since new versions of Libvirt started to use OVS to configure it. In the past, Libvirt configured TC on its own, bypassing OVS. Removing the handle value from the deletion request, so any qdisc can be removed. Changing the error checking to also pass ENOENT, since that is the error reported if only default qdisc is present. Alternative solution might be to use NLM_F_REPLACE, but that will be a larger change with a potential need of refactoring. Potential side effect of the change is that OVS may start removing qdiscs that it didn't remove before. Though it's not a new issue and 'linux-noop' QoS type should be used for ports that OVS should not touch. Otherwise, OVS owns qdiscs on all interfaces attached to it. While at it, adding more logs as errors are not logged in any way at the moment making the issue hard to debug. Reported-at: https://bugzilla.redhat.com/2138339 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052088.html Reported-at: https://github.com/openvswitch/ovs-issues/issues/268 Suggested-by: Slawek Kaplonski <skaplons@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-11-02 19:50:02 +01:00
Baowen Zheng	ffcb6f115f	netdev-linux: Allow meter to work in tc software datapath when tc-policy is specified Add tc action flags when adding police action to offload meter table. There is a restriction that the flag of skip_sw/skip_hw should be same for filter rule and the independent created tc actions the rule uses. In this case, if we configure the tc-policy as skip_hw, filter rule will be created with skip_hw flag and the police action according to meter table will have no action flag, then flower rule will fail to add to tc kernel system. To fix this issue, we will add tc action flag when adding police action to offload a meter table, so it will allow meter table to work in tc software datapath. Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action") Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Simon Horman <simon.horman@corigine.com>	2022-11-01 10:18:16 +01:00
Jon Kohler	c0e053f6d1	netdev-linux: Skip some internal kernel stats gathering. For netdev_linux_update_via_netlink(), hint to the kernel that we do not need it to gather netlink internal stats when we want to update the netlink flags, as those stats are not rendered within OVS. Background: ovs-vswitchd can spend quite a bit of time blocked by the kernel during netlink calls, especially systems with many cores. This time is dominated by the kernel-side internal stats gathering mechanism in netlink, specifically: inet6_fill_link_af inet6_fill_ifla6_attrs __snmp6_fill_stats64 In Linux 4.4+, there exists a hint for netlink requests to not trigger the ipv6 stats gathering mechanism, which greatly reduces the amount of time that ovs-vswitchd is on CPU. Testing and Results: Tested booting 320 VM's and measuring OVS utilization with perf record, then visualized into a flamegraph using a patched version of ovs 2.14.2. Calls under bridge_run() seem to get hit the worst by this issue. Before bridge_run() == 11.3% of samples After bridge_run() == 3.4% of samples Note that there are at least two observed netlink calls under bridge_run that are still kernel stats heavy after this patch: Call 1: bridge_run -> netdev_run -> route_table_run -> route_table_reset -> ovs_router_insert -> ovs_router_insert__ -> get_src_addr -> netdev_ger_addr_list -> netdev_linux_get_addr_list -> getifaddrs Since the actual netlink call is coming from getifaddrs() in glibc, fixing would likely involve either duplicating glibc code in ovs source or patch glibc. Call 2: bridge_run -> iface_refresh_stats -> netdev_get_stats -> netdev_linux_get_stats -> get_stats_via_netlink This does use netlink based stats; however, it isn't immediately clear if just dropping the stats from inet6_fill_link_af would impact anything or not. Given this call is more intermittent, its of lesser concern. Acked-by: Greg Smith <gasmith@nutanix.com> Signed-off-by: Jon Kohler <jon@nutanix.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-09-09 20:53:54 +02:00
Vlad Buslov	d9268782af	netdev-linux: set correct action for packets that passed policer Referenced commit changed policer action type from TC_ACT_UNSPEC (continue) to TC_ACT_PIPE. However, since neither TC hardware offload layer nor mlx5 driver at the time validated action type and always assumed 'continue', the breakage wasn't caught until later validation code was added. The change also broke valid configuration when sending from offload-capable device to non-offload capable. For example, when sending from mlx5 VF to OvS bridge netdevice the traffic that passed matchall classifier with policer could no longer match the following flower rule in software: filter protocol all pref 1 matchall chain 0 filter protocol all pref 1 matchall chain 0 handle 0x1 in_hw (rule hit 7863) action order 1: police 0x1 rate 32Mbit burst 1000Kb mtu 64Kb action drop/pipe overhead 0b ref 1 bind 1 installed 17 sec firstused 17 sec Action statistics: Sent 152199634 bytes 102550 pkt (dropped 1315, overlimits 1315 requeues 0) Sent software 74612172 bytes 51275 pkt Sent hardware 77587462 bytes 51275 pkt backlog 0b 0p requeues 0 used_hw_stats delayed filter protocol ip pref 3 flower chain 0 filter protocol ip pref 3 flower chain 0 handle 0x1 dst_mac aa:94:1f:f2:f8:44 src_mac e4:00:01:08:00:02 eth_type ipv4 ip_flags nofrag not_in_hw action order 1: skbedit ptype host pipe index 1 ref 1 bind 1 installed 6 sec used 6 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: mirred (Ingress Redirect to device br-ovs) stolen index 1 ref 1 bind 1 installed 6 sec used 6 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 401a9c8b3d403c62240d3eb5e21c1604 no_percpu Fix the issue by restoring matchall and basic policers action type to 'continue'. Fixes: c2567e533f8a ("add port-based ingress policing based packet-per-second rate-limiting") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>	2022-08-04 10:04:28 +01:00
Tao Liu	8166c066a7	netdev-linux: Do not touch LAG members if master is not attached to OVS. Bond master netdev may be created without a classification type, due to routing or tunneling code. If bond master is not attached to ovs, the ingress block on LAG members should not be updated. Simple reproducer: tc q ls dev net3 ingress ip a add 10.1.1.1/30 dev bond0 ip l set net3 master bond0 tc q ls dev net3 ingress Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-07-26 12:48:14 +02:00
Ilya Maximets	0d153bffbf	netdev-linux: Fix leak of a tc police get/del reply. Direct leak of 64 byte(s) in 1 object(s) allocated from: 0 0x51b1d8 in malloc (vswitchd/ovs-vswitchd+0x51b1d8) 1 0xc81ded in xmalloc__ lib/util.c:137:15 2 0xc81ded in xmalloc lib/util.c:172:12 3 0xb32153 in ofpbuf_new lib/ofpbuf.c:168:24 4 0xd563e4 in nl_sock_transact lib/netlink-socket.c:1113:34 5 0xd56261 in nl_transact lib/netlink-socket.c:1812:13 6 0xd5e096 in tc_transact lib/tc.c:238:17 7 0xd01622 in tc_del_policer_action lib/netdev-linux.c:5807:13 8 0xd2e681 in meter_tc_del_policer lib/netdev-offload-tc.c:2827:15 9 0x94ec21 in meter_offload_del lib/netdev-offload.c:245:23 10 0xcc86c4 in dpif_netlink_meter_del lib/dpif-netlink.c:4288:9 11 0x86c595 in dpif_meter_del lib/dpif.c:2014:13 12 0x663439 in free_meter_id ofproto/ofproto-dpif.c:6789:5 13 0xb47518 in ovsrcu_call_postponed lib/ovs-rcu.c:346:13 14 0xb48031 in ovsrcu_postpone_thread lib/ovs-rcu.c:362:14 15 0xb5015a in ovsthread_wrapper lib/ovs-thread.c:422:12 16 0x7f86af4081ce in start_thread (/lib64/libpthread.so.0+0x81ce) Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>	2022-07-14 14:58:10 +02:00
Jianbo Liu	5c039ddc64	netdev-linux: Add functions to manipulate tc police action Add helpers to add, delete and get stats of police action with the specified index. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>	2022-07-11 11:18:16 +02:00
Jianbo Liu	ed2300cca0	netdev-linux: Refactor put police action netlink message To reuse the code for manipulating police action, move the common initialization code to a function, and change PPS parameters as meter pktps is in unit of packet per second. null_police is redundant because either BPS or PPS, not both, can be configured in one message. So the police passed in to nl_msg_put_act_police can be reused as its rate is zero for PPS, and it also provides the index for police action to be created. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>	2022-07-11 11:17:44 +02:00
Dumitru Ceara	c8c49a9db9	netdev-linux: Properly access 32-bit aligned rtnl_link_stats64 structs. Detected by UB Sanitizer when running system tests: lib/netdev-linux.c:6250:26: runtime error: member access within misaligned address 0x00000229a204 for type 'const struct rpl_rtnl_link_stats64', which requires 8 byte alignment 0x00000229a204: note: pointer points here c4 00 17 00 01 00 00 00 00 00 00 00 01 00 00 00 ^ 00 00 00 00 6e 00 00 00 00 00 00 00 6e 00 00 00 0 0x89f10e in netdev_stats_from_rtnl_link_stats64 lib/netdev-linux.c:6250 1 0x89f10e in get_stats_via_netlink lib/netdev-linux.c:6298 2 0x8a039a in netdev_linux_get_stats lib/netdev-linux.c:2227 3 0x68e149 in netdev_get_stats lib/netdev.c:1599 4 0x407b21 in iface_refresh_stats vswitchd/bridge.c:2687 5 0x419eb6 in iface_create vswitchd/bridge.c:2134 6 0x419eb6 in bridge_add_ports__ vswitchd/bridge.c:1170 7 0x41f71c in bridge_add_ports vswitchd/bridge.c:1181 8 0x41f71c in bridge_reconfigure vswitchd/bridge.c:898 9 0x429f59 in bridge_run vswitchd/bridge.c:3331 10 0x430af3 in main vswitchd/ovs-vswitchd.c:129 11 0x7fbdfd43eb74 in __libc_start_main (/lib64/libc.so.6+0x27b74) 12 0x4072fd in _start (/root/ovs/vswitchd/ovs-vswitchd+0x4072fd) Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-17 23:10:20 +02:00
Adrian Moreno	9e56549c2b	hmap: use short version of safe loops if possible. Using SHORT version of the *_SAFE loops makes the code cleaner and less error prone. So, use the SHORT version and remove the extra variable when possible for hmap and all its derived types. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-03-30 16:59:02 +02:00
Mike Pattrick	eb1ab5357b	netdev-linux: Use matchall classifier for ingress policing. Currently ingress policing uses the basic classifier to apply traffic control filters if hardware offload is not enabled, in which case it uses matchall. This change changes the behavior to always use matchall, and fall back onto basic if the kernel is built without matchall support. The system tests are modified to allow either basic or matchall classification on the ingestion filter, and to allow either 10000 or 10240 packets for the packet burst filter. 10000 is accurate for kernel 5.14 and the most recent iproute2, however, 10240 is left for compatibility with older kernels. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-12 15:23:29 +01:00

1 2 3 4 5 ...

418 Commits