2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

20515 Commits

Author SHA1 Message Date
Ilya Maximets
83de251fa5 ipsec: libreswan: Remove old certs before importing new ones.
If started with --no-restart-ike-daemon, ovs-monitor-ipsec doesn't
clear the NSS database.  This is not a problem if the certificates do
not change while the monitor is down, because completely duplicate
entries cannot be added to the NSS database.  However, if the monitor
is stopped, then certificates change on disk and then the monitor is
started back, it will add new tunnel certificates alongside the old
ones and will fail to add the new CA certificate.  So, we'll end up
with multiple certificates for the same tunnel and the outdated CA
certificate.  This will not allow creating new connections as we'll
not be able to verify certificates of the new CA:

  # certutil -L -d sql:/var/lib/ipsec/nss

  Certificate Nickname             Trust Attributes
                                   SSL,S/MIME,JAR/XPI

  ovs_certkey_c04c352b             u,u,u
  ovs_cert_cacert                  CT,,
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u
  ovs_certkey_c04c352b             u,u,u

  pluto: "ovn-c04c35-0-out-1" #459: processing decrypted
   IKE_AUTH request containing SK{IDi,CERT,CERTREQ,IDr,AUTH,SA,
   TSi,TSr,N(USE_TRANSPORT_MODE)}
  pluto: "ovn-c04c35-0-out-1" #459: NSS: ERROR:
   IPsec certificate CN=c04c352b,OU=kind,O=ovnkubernetes,C=US invalid:
   SEC_ERROR_UNKNOWN_ISSUER: Peer's Certificate issuer is not recognized.
  pluto: "ovn-c04c35-0-out-1" #459: NSS: end certificate invalid

Fix that by always checking certificates in the NSS database before
importing the new one.  If they do not match, then remove the old
one from the NSS and add the new one.

We have to call deletion multiple times in order to remove all the
potential duplicates from previous runs.  This will be useful on
upgrade, but also may save us if one of the deletions ever fail for
any reason and we'll end up with a duplicate entry anyway.

One alternative might be to always clear the database, even if the
--no-restart-ike-daemon option is set, but there is a chance that
we'll refresh and ask to re-read secrets before we got all the tunnel
information from the database.  That may affect dataplane.  Even if
this is really not possible, the logic seems too far apart to rely on.
Also, Libreswan 4.6 seems to have some bug that prevents re-adding
deleted connections if we removed and re-add the same certificate
(newer versions don't have this issue), so it's better if we do not
touch certificates that didn't actually change if we're not restarting
the IKE daemon.

The clearing may seem redundant now, but it may still be useful to
clean up certificates for tunnels that disappeared while the monitor
was down.  Approach taken in this change doesn't cover this case.

Test is added to check the described scenario.  The 'on_exit' command
is converted to obtain the monitor PID at exit, since we're now killing
one monitor and starting another.

Fixes: fe5ff26a49f6 ("ovs-monitor-ipsec: Add option to not restart IKE daemon.")
Reported-at: https://issues.redhat.com/browse/FDP-1473
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-25 12:42:08 +02:00
Ilya Maximets
80d723736b cirrus: Update to FreeBSD 14.3 and 13.5.
13.5 was released a few months back and 14.3 just recently.  Older
point releases may become unavailable in gcloud in the near future.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-23 22:08:50 +02:00
David Marchand
6090603703 netdev-dpdk: Remove limit on maximum descriptors count.
Using larger rxq can be beneficial in highly bursty setups.
Remove the artificial limit on the count of descriptors in rxq and txq.
The device driver will limit the values in any case.

Reported-at: https://issues.redhat.com/browse/FDP-1415
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-20 21:17:02 +02:00
Ilya Maximets
edecb74043 python: idl: Don't notify the application on _Server database updates.
_Server database is not managed by the user and needed mostly for IDL
itself to see changes in the schema or cluster leadership.  However,
we're currently delivering notifications about changes in that database
confusing the application (the application didn't subscribe to this
database) and also we're increasing the change_seqno potentially
returning true for has_ever_connected() call even if we didn't really
get any real data yet or even connected to the right database.

In the tests these notifications can be seen as two events at the
beginning of every test with the notification enabled:

  000: event:create, row={}, uuid=<0>, updates=None
  000: event:create, row={}, uuid=<1>, updates=None

Tests only print the 'simple' table, so the content is omitted, but
the data is still there and the empty events are printed out.

We should not notify the application nor touch the change_seqno.
Tests updated accordingly.  Unfortunately, removing first two lines
from a test changes the numbers generated by the UUID filter, so the
rest of the test needs adjustments as well.

Fixes: c39751e44539 ("python: Monitor Database table to manage lifecycle of IDL client.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-20 21:17:02 +02:00
David Marchand
ab062d3cb4 netdev-dpdk: Adjust IPv4 checksum capability for vhost-user.
If no L4 checksum can be requested, OVS may as well compute IPv4
checksum when needed.

This allows a small optimization where the whole preparation step can
be skipped on a batch when a (vhost-user) DPDK port has no offload
capability.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:03:20 +02:00
David Marchand
dd443c1a7a netdev-dpdk: Stop relying on vhost-user Tx flags.
vhost-user legacy behavior has been to mark mbuf with Tx offload flags
based on what the virtio-net header contained (but provide no
Rx information, like IP checksum or L4 checksum validity).

Changing to the non legacy mode means that no code out of OVS should set
any RTE_MBUF_F_TX_* flag. Had a check accordingly.

Link: https://git.dpdk.org/dpdk/commit/?id=ca7036b4af3a
Reported-at: https://issues.redhat.com/browse/FDP-1147
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:03:15 +02:00
David Marchand
b8032fac2c dp-packet: Remove direct access to DPDK offloads.
Now that every use of ol_flags have been reworked, we can remove helper
and additional field in dp_packet when not building with DPDK.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:03:12 +02:00
David Marchand
cf7b86db1f dp-packet: Rework TCP segmentation.
Rather than mark with a offload flags + mark with a segmentation size,
simply rely on the netdev implementation which sets a segmentation size
when appropriate.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:03:09 +02:00
David Marchand
e36793e11f dp-packet: Resolve unknown checksums.
Now that IP and L4 checksum offloading don't require tweaking Tx flags,
update checksum status in parts of OVS that validate checksums (in case
of unknown status).

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:03:03 +02:00
David Marchand
2956a61265 dp-packet: Rework L4 checksum offloads.
The DPDK mbuf API specifies 4 status when it comes to L4 checksums:
- RTE_MBUF_F_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
- RTE_MBUF_F_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
- RTE_MBUF_F_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
- RTE_MBUF_F_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
  data, but the integrity of the L4 data is verified.

Similarly to the IP checksum offloads API, revise OVS L4 offloads API.

No information about the L4 protocol is provided by any netdev-*
implementation, so OVS needs to mark this L4 protocol during flow
extraction.

Rename current API for consistency with dp_packet_(inner_)?l4_checksum_.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:02:56 +02:00
David Marchand
3daf04a4c5 dp-packet: Rework IP checksum offloads.
As the packet traverses through OVS, offloading Tx flags must be carefully
evaluated and updated which results in a bit of complexity because of a
separate "outer" Tx offloading flag coming from DPDK API,
and a "normal"/"inner" Tx offloading flag.

On the other hand, the DPDK mbuf API specifies 4 status when it comes to
IP checksums:
- RTE_MBUF_F_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
- RTE_MBUF_F_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
- RTE_MBUF_F_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
- RTE_MBUF_F_RX_IP_CKSUM_NONE: the IP checksum is not correct in the
  packet data, but the integrity of the IP header is verified.

This patch changes OVS API so that OVS code only tracks the status of
the checksum of the "current" L3 header and let the Tx flags aspect to
the netdev-* implementations.

With this API, the flow extraction can be cleaned up.

During packet processing, OVS can simply look for the IP checksum validity
(either good, or partial) before changing some IP header, and then mark
the checksum as partial.

In the conntrack case, when natting packets, the checksum status of the
inner part (ICMP error case) must be forced temporarily as unknown
to force checksum resolution.

When tunneling comes into play, IP checksums status is bit-shifted for
future considerations in the processing if, for example, the tunnel
header gets decapsulated again, or in the netdev-* implementations that
support tunnel offloading.

Finally, netdev-* implementations only need to care about packets in
partial status: a good checksum does not need touching, a bad checksum
has been updated by kept as bad by OVS, an unknown checksum is either
an IPv6 or if it was an IPv4, OVS updated it too (keeping it good or bad
accordingly).

Rename current API for consistency with dp_packet_(inner_)?ip_checksum_.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:54 +02:00
David Marchand
67abd51540 dp-packet: Rework tunnel offloads.
Rather than set bits in the mbuf ol_flags field, that only makes sense
for netdev-dpdk ports, mark packet for tunnel offload in OVS offloads
API.

While at it, since there is nothing really "hardware" related, rename
current API for consistency with dp_packet_tunnel_ prefix.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:48 +02:00
David Marchand
e2200485c5 dp-packet: Expand offloads preparation helper.
Expand this helper to clearly separate the non tunnel case from the
tunnel one. This will make later changes easier to read.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:45 +02:00
David Marchand
d29ba0abdc dp-packet: Add OVS offloading API.
As a preparation for tracking inner checksums, separate Rx checksum
status from the DPDK ol_flags field.
To minimize the cost of translating from DPDK API to OVS API, simply map
OVS flags to DPDK Rx mbuf flags.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 21:00:34 +02:00
David Marchand
19ef1b1f0f dp-packet: Remove DPDK specific IP version.
Flagging packets with IP version is only needed at the netdev-dpdk level.

In most cases, OVS is already inspecting the IP header in packet data,
so maintaining such IP version metadata won't save much cycles
(given the cost of additional branches necessary for handling
outer/inner flags).

Cleanup OVS shared code and only set these flags in netdev-dpdk.c.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 20:59:22 +02:00
David Marchand
52fdeda11a dp-packet: Remove Linux specific L4 offloads.
As the virtio-net offload API is used for netdev-linux ports, but
provides no information about the potentially encapsulated protocol
concerned by a checksum request, specific information from this netdev-
specific implementation is propagated into OVS code, and must be
carefully evaluated when some tunnel gets decapsulated.

This induces a cost in "normal" processing path, while the netdev-linux
path is not performance critical.

This patch removes such specific information, yet try harder to parse
the packet on the Rx side and set offload flags accordingly for non
encapsulated traffic. For encapsulated traffic, the inner
checksum is computed.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 20:59:04 +02:00
Terry Wilson
a86ae3c865 python: Add uuid/convert references to uuid for Row.__str__.
Row stringification happens a lot in client logs and it is far
more useful to have the logged Row's uuid printed. This also
adds converting referenced Row objects, and references within
set and map columns to UUIDs.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 12:37:29 +02:00
Mark Michelson
8ee7ecb8a2 db-ctl-base: Allow retrieving rows of type OVSDB_TYPE_UUID.
The ctl_get_row() function attempts to match a user-provided string to a
particular database row. This works by comparing the user-provided
string to the values of columns provided by the ctl utility (e.g.
ovs-vsctl).

Before this commit, this comparison could only be made for columns of
type OVSDB_TYPE_INTEGER and OVSDB_TYPE_STRING. If a ctl utility provided
a column of a different type, then db-ctl-base.c would assert in
get_row_by_id().

This commit enhances the ability of ctl_get_row() to also retrieve rows
based on columns of type OVSDB_TYPE_UUID. The user-provided string is
converted to a UUID and compared against the column's value. If it
matches, then the row matches.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-19 12:36:29 +02:00
Aaron Conole
8a1a0ea7c0 AUTHORS: Add Changliang Wu.
Signed-off-by: Aaron Conole <aconole@redhat.com>
2025-06-13 14:09:11 -04:00
Changliang Wu
aea4734299 lldp: Fix out of bound write in chassisid_to_string.
snprintf will automatically write \0 at the end of the string,
and the last one byte will be out of bound.

create a new function ds_put_hex_with_delimiter,
instead of chassisid_to string and format_hex_arg.

Found in sanitize test.

Signed-off-by: Changliang Wu <changliang.wu@smartx.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
2025-06-13 14:06:55 -04:00
Mike Pattrick
614029aac0 conntrack: Allow inner NAT of related fragments.
Currently conntrack will refuse to extract metadata from fragmented
IPv4 packets. Usually the fragments would be processed by the ipf
module, but this isn't the case for ICMP related packets. The current
handling will result in these being incorrectly processed.

This patch checks for a frag offset instead of just frag flags, which is
similar to how conntrack handles fragments in the kernel.

Reported-at: https://issues.redhat.com/browse/FDP-136
Reported-by: Ales Musil <amusil@redhat.com>
Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
2025-06-13 14:06:07 -04:00
Eelco Chaudron
ca9e67c801 daemon-unix: Handle potential negative values from sysconf().
Coverity reports that daemon_set_new_user() may receive a large
unsigned value from get_sysconf_buffer_size(), due to sysconf()
returning -1 and being cast to size_t.

Although this would likely lead to an allocation failure and abort,
it's better to handle the error in place.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-06-12 15:28:31 +02:00
Eelco Chaudron
99af7f3791 ovsdb: Fix Coverity leak warning by marking code as unreachable.
Coverity reports a memory leak on the 'error' variable in
ovsdb_trigger_try(). However, this code path is unreachable due to an
ovs_assert() in an earlier function call.

To make this clear to Coverity and silence the warning, the section is
explicitly marked as unreachable.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-06-10 17:07:06 +02:00
Eelco Chaudron
2c634482f2 raft: Fix resource leak from ignored ovsdb_log_write_and_free() error.
The Raft codebase includes calls to ovsdb_log_write_and_free() that
are incorrectly wrapped in ignore(). This causes potential error
resources to be leaked.

These calls should be wrapped in ovsdb_error_destroy() instead, to
ensure that any returned error objects are properly freed and do not
result in memory leaks.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-06-10 17:05:37 +02:00
Eelco Chaudron
b90304bfe7 ovsdb-server: Fix potential memory leak in parse_options().
When duplicate --config-file command-line arguments are passed,
the resources for previously specified file path were not freed.

This fix ensures unused resources are properly freed while
preserving the existing behavior of using the last configuration
file path specified.

Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-06-10 17:04:49 +02:00
Eelco Chaudron
d1bd62dae5 ofproto-dpif-upcall: Check odp_tun_key_from_attr() return value.
In the IPFIX and flow sample upcall handling, check the validity
of the tunnel key returned by odp_tun_key_from_attr(). If the
tunnel key is invalid, return an error.

This was reported by Coverity, but the change also improves
robustness and avoids undefined behavior in the case of malformed
tunnel attributes.

Fixes: 8b7ea2d48033 ("Extend OVS IPFIX exporter to export tunnel headers")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
2025-06-10 17:04:09 +02:00
Eelco Chaudron
88737f02ed ofproto-dpif-xlate: Fix memory leak in xlate_generic_encap_action().
This is not a real issue, as the initializer function,
rewrite_flow_push_nsh(), ensures it returns NULL on error.
However, cleaning this up improves code clarity and resolves
a Coverity warning about a potential memory leak.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-06-10 17:03:39 +02:00
Eelco Chaudron
8fca3f99cf lldp: Fix Coverity warning about resource leak in lldp test.
Coverity reported a potential resource leak in the LLDP test code.
While this condition should never occur in practice, since the test
would crash on out-of-memory, the warning is addressed by ensuring
the cleanup function is called on error paths.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-06-10 17:02:58 +02:00
Ales Musil
d283829477 sparse: Define new AVX10 includes added in GCC >= 15.
The GCC >=15 added new AVX10 header files, add defines for them as
sparse is not able to understand new types in those. This can be
seen with DPDK headers.

Tested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-03 18:48:45 +02:00
Ales Musil
0e419d1b4f sparse: Add workaround for OpenSSL configuration.
sparse fails to process OpenSSL configuration header file in recent
OpenSSL version (3.2.x). Add workaround header that will disable
the problematic macro.

Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-02 17:30:02 +02:00
Ilya Maximets
8224cd47f3 tests: tunnel-push-pop: Fix occasional failure of the drop test.
Datapath port zero is normally taken by the 'datapath interface', i.e.
the ovs-dummy interface.  This makes it not possible to allocate port
zero for the p0 interface.  So, it will race with p1 for the number 1.
If p0 happens to be created first, it will take the 1 and p1 will get
the port 2 and then the test passes.  However, if p1 is created first,
then it will take the 1 and p0 will take the 2.  In this case the
test fails as the port name in the trace will be different.

Use '--names' to avoid this problem, but also fix the port numbers and
use the 'add_of_ports' macro instead of plain-coding the port addition.
The macro would've made the issue more obvious in the first place.

Fixes: 1015b13f054d ("ofproto-dpif-xlate: Add a drop action for native tunnel failure.")
Acked-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-06-02 16:03:58 +02:00
David Marchand
e99ce7d5df flow: Fix checksum offloads with simple match.
Packets with L4 partial status for a simple match flow would not get L4
checksums offloads applied.

This was not caught in unit tests, because packets from netdev-dummy
(calling miniflow_extract) would get Tx flags set early, before
parse_tcp_flags() got called during packet processing.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-30 18:00:56 +02:00
Kevin Traynor
48ce3a5a52 dpdk: Use DPDK 24.11.2 release.
Update the CI and docs to use DPDK 24.11.2.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-30 16:48:44 +01:00
Roi Dayan
b42f9fde4a netdev-dpdk: Fix possible memory leak in vhost stats.
On error condition need to release the allocated structs.

Reported by Coverity.

Fixes: 3b29286db1c5 ("netdev-dpdk: Add per virtqueue statistics.")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
2025-05-30 14:22:23 +01:00
Eelco Chaudron
7e3a0b4961 AUTHORS: Add Yang Yang.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-30 11:09:29 +02:00
Yang Yang
60a2193000 perf-counter: Enable exclude_guest by default.
This patch sets exclude_guest to true by default in perf-counter.
Since ovsdb-server typically does not need guest context events,
this change avoids collecting unnecessary data and improves profiling
accuracy on the host.

Co-authored-by: Rongqing Li <lirongqing@baidu.com>
Signed-off-by: Yang Yang <yangyang92@baidu.com>
Signed-off-by: Rongqing Li <lirongqing@baidu.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-30 11:09:29 +02:00
Roi Dayan
2df25f970a util: Remove include of itself.
This is a redundant include.

Fixes: ee89ea7b477b ("json: Move from lib to include/openvswitch.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-30 10:29:58 +02:00
Roi Dayan
37848e2188 util: Ignore return code from str_to_uint().
Reported by Coverity.
  lib/util.c:795 Unchecked return value (CHECKED_RETURN):

As it's not really bug, wrap it with ignore().

Fixes: 9551e80befc0 ("tests: Use environment variable for default timeout.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-30 10:28:53 +02:00
Ilya Maximets
e180c431b9 tests: classifier: Add a stress test for prefixes reconfiguration.
This test is reusing the benchmark infrastructure, but it has some
pre-defined parameters, so it's easier to run in the test suite.

The benchmark code is adjusted to start another thread that does
prefix updates continuously in a loop and the lookup threads are
updated to be able to enter quiescent state periodically, so the
reconfiguration can proceed.

This test is a reproducer for the crashes fixed in the previous
commit.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-26 17:38:52 +02:00
Ilya Maximets
6a61a70fcb classifier: Fix race for prefix tree configuration.
The thread fence in the classifier is supposed to ensure that when the
subtable->trie_plen is updated, the actual prefix tree is ready to be
used.  On the write side in trie_init(), the fence is between the
tree configuration and the 'trie_plen' update.  On the reader's side
however, the fence is at the beginning of the classifier_lookup__(),
and both reads of the 'trie_plen' and the accesses to the tree itself
are happening afterwards.  And since both types of the reads are on
the same side of the fence, the fence is kind of pointless and doesn't
guarantee any memory ordering.  So, readers can be accessing partially
initialized prefix trees.

Another problem with the configuration is that cls->n_tries is updated
without any synchronization as well.  The comment on the fence says
that it also synchronizes for the cls->n_tries, but that doesn't make
a lot of sense.  In practice, cls->n_tries is read multiple times
throughout the classifier_lookup__() and each of these reads may give
a different value if there is a concurrent update, causing the reader
to access trees that are not initialized or in the middle of being
destroyed, leading to OVS crashes while the user updates the flow
table prefixes.

First thing that needs to be fixed here is to only read cls->n_tries
once to avoid obvious crashes with access to uninitialized trie_ctx[]
entries.

The second thing is that we need a proper memory synchronization that
will guarantee that our prefix trees are fully initialized when
readers access them.  In the current logic we would need to issue
a thread fence after every read of a subtable->trie_plen value, i.e.,
we'd need a fence per subtable lookup.  This would be very expensive
and wasteful, considering the prefix tree configuration normally
happens only once somewhere at startup.

What we can do instead is to convert cls->n_tries into atomic and use
it as a synchronization point:

  Writer (classifier_set_prefix_fields):

  1. Before making any changes, set cls->n_tries to zero.  Relaxed
     memory order can be used here, because we'll have a full memory
     barrier at the next step.
  2. ovsrcu_synchronize() to wait for all threads to stop using tries.
  3. Update tries while nobody is using them.
  4. Set cls->n_tries to a new value with memory_order_release.

  Reader (classifier_lookup):

  1. Read the cls->n_tries with the memory_order_acquire.
  2. Use that once read value throughout.

RCU in this scenario will ensure that every thread no longer uses the
prefix trees when we're about to change them.  The acquire-release
semantics on the cls->n_tries just saves us from calling the
ovsrcu_synchronize() the second time once we're done with the whole
reconfiguration.  We're just updating the number and making all the
previous changes visible on CPUs that acquire it.

Alternative solution might be to go full RCU and make the array of
trees itself RCU-protected.  This way we would not need to do any
extra RCU synchronization or managing the memory ordering.  However,
that would mean having multiple layers of RCU with trees and rules
in them potentially surviving multiple grace periods, which I would
like to avoid, if possible.

Previous code was also trying to be smart and not disable prefix tree
lookups for prefixes that are not changing.  We're sacrificing this
functionality in the name of simpler code.  Attempt to make that work
would either require a full conversion to RCU or a per-subtable
synchronization.  Lookups can be done without the prefix match
optimizations for a brief period of time.  This doesn't affect
correctness of the resulted datapath flows.

In the actual implementation instead of dropping cls->n_tries to zero
at step one, we keep the access to the first N tries that are not
going to change by setting the cls->n_tries to the index of the first
trie that will be updated.  So, we'll not be disabling all the prefix
match optimizations completely.

There was an attempt to solve this problem already in commit:
  a6117059904b ("classifier: Prevent tries vs n_tries race leading to NULL dereference.")
But it was focused on one particular crash and didn't take into account
a wider issue with the memory ordering on these trees in general.  The
changes made in that commit are mostly reverted as not needed anymore.

Fixes: f358a2cb2e54 ("lib/classifier: RCUify prefix trie code.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2025-April/422765.html
Reported-by: Numan Siddique <numans@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-26 17:38:22 +02:00
Ilya Maximets
9234b9b40f tests: classifier: Fix the rule number check during trie verification.
Same rule can be in multiple prefix trees and so it is possible that
the total number of rules in all trees exceeds the total number of
rules in the classifier.  But the number of rules in a single prefix
tree still can't exceed the total number of rules in the classifier.
Move the check accordingly.

Note: checkpatch complains about usage of the assert(), but it is
everywhere in this file and so, not changing in just this one place.

Fixes: f358a2cb2e54 ("lib/classifier: RCUify prefix trie code.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-26 17:38:17 +02:00
Eelco Chaudron
f7711efc9d ovs-router: Fix potential resource leak in JSON output.
If we need to bail out of the JSON format loop, we do
not free the allocated resources. This fix moves the
allocation to after the check.

Fixes: d000ff1cd564 ("ovs-router: Add JSON output for 'ovs/route/show' command.")
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-26 08:55:32 +02:00
David Marchand
5603b869a8 netdev-linux: Fix offloads for IPv6 UDP packets.
Caught by code review, offloading checksum of IPv6 UDP packets was wrong
as the IPv6 header used for the pseudo header checksum was wrong.

Fixes: cb0cbffbe8fb ("netdev-linux: Favour inner packet for multi-encapsulated TSO.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 23:52:54 +02:00
David Marchand
c852a8c767 netdev-native-tnl: Do not validate already checked checksum.
Bad packets were still being validated in software when decapsulating
a IP header. Trust decision taken wrt IP checksum offloading (checking
dp_packet_hwol_l3_csum_ipv4_ol()) and avoid revalidating a known
bad checksum.

While at it, add coverage counters so that checksum validation impact
can be monitored, and unit tests.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 23:45:51 +02:00
David Marchand
71f3dd3e9c conntrack: Fix embedded checksums in ICMP errors.
Helpers like packet_set_ipv4() resets IP csum flags.
Inspecting and natting embedded payload in an ICMP error is thus broken
if the "outer" IP header had some Rx checksum flags that made it
eligible to Tx IP checksum.
Reset temporarily any Tx checksum to force those helpers to resolve the
checksums.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 23:14:42 +02:00
David Marchand
4b00509ea1 conntrack: Do not validate already checked checksum.
Bad packets were still being validated in software when entering
conntrack.  Trust decision taken wrt IP checksum offloading (checking
dp_packet_hwol_l3_csum_ipv4_ol()) and avoid revalidating a known
bad checksum.

While at it, add coverage counters so that checksum validation impact
can be monitored, and unit tests.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 22:05:48 +02:00
David Marchand
8a7f1292d5 ipf: Consider checksum only for fragmented packets.
Currently, the fragment reassembly code marks any packets with bad
checksums as invalid and increments the ipf_l3csum_err counter.
This is confusing, because this happens even if these packets are not
fragmented.

Non-fragments should be ignored by the fragment reassembly engine and
be left for the main conntrack code.  It has its own logic for marking
packets with incorrect checksums and will increase the expected
conntrack_l3csum_err counter instead.

While at it, add coverage counters so that checksum validation impact
can be monitored.

Signed-off-by: David Marchand <david.marchand@redhat.com>
[i.maximets: re-worded the commit message]
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 20:58:29 +02:00
David Marchand
585c8088eb dpif-netdev: Enhance checksum coverage.
Enhance netdev-dummy:
- add debug log,
- split Rx and Tx aspects,
- add coverage for bad status,

Enhance unit tests:
- enable Tx offloads on the transmitting port,
- test L4 checksums for TCP and UDP (and partial status),
- test IPv6,

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 19:43:01 +02:00
David Marchand
d49994634e flow: Fix bad IP checksum flag.
flow_compose() can generate packets with bad IPv4 checksum, however the
associated Rx flags were not correctly set.
The usefulness of setting this metadata seems limited, yet fix this for
consistency.

Fixes: c62b4ac8f8da ("ovs-ofctl: Implement compose-packet --bare [--bad-csum].")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-05-21 19:08:36 +02:00
Roi Dayan
261f02ba32 ovs-ctl: Allow to set custom core file size for ovs daemons.
Allow to set custom core file size with --ulimit-core argument.
This argument can be set in ovs config file
in rhel can set OPTIONS in /etc/sysconfig/openvswitch
in debian can set OVS_CTL_OPTS in /etc/default/openvswitch-switch.

Acked-by: Gaetan Rivet <gaetanr@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2025-05-21 17:56:57 +02:00