mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

Author	SHA1	Message	Date
Mike Pattrick	0add983b38	ovsdb: Use table indexes if available for ovsdb_query(). Currently all OVSDB database queries except for UUID lookups all result in linear lookups over the entire table, even if an index is present. This patch modifies ovsdb_query() to attempt an index lookup first, if possible. If no matching indexes are present then a linear index is still conducted. To test this, I set up an ovsdb database with a variable number of rows and timed the average of how long ovsdb-client took to query a single row. The first two tests involved a linear scan that didn't match any rows, so there was no overhead associated with sending or encoding output. The post-patch linear scan was a worst case scenario where the table did have an appropriate index but the conditions made its usage impossible. The indexed lookup test was for a matching row, which did also include overhead associated with a match. The results are included in the table below. Rows \| 100k \| 200k \| 300k \| 400k \| 500k -----------------------+------+------+------+------+----- Pre-patch linear scan \| 9ms \| 24ms \| 37ms \| 49ms \| 61ms Post-patch linear scan \| 9ms \| 24ms \| 38ms \| 49ms \| 61ms Indexed lookup \| 3ms \| 3ms \| 3ms \| 3ms \| 3ms I also tested the performance of ovsdb_query() by wrapping it in a loop and measuring the time it took to perform 1000 linear scans on 1, 10, 100k, and 200k rows. This test showed that the new index checking code did not slow down worst case lookups to a statistically detectable degree. Reported-at: https://issues.redhat.com/browse/FDP-590 Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-15 18:05:32 +02:00
Eelco Chaudron	5c4d60671c	dpif: Fix infinite netlink loop in dpif_execute_helper_cb. When a meter action is encountered and stored in the auxiliary structure, and subsequently, a non-meter action is processed within a nested list during callback execution, an infinite loop is triggered. This patch maintains the current behavior but stores all required meter actions in an ofpbuf for deferred execution. Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20250506022337.3242-1-danieldin186@gmail.com/ Fixes: 076caa2fb077 ("ofproto: Meter translation.") Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-07-15 16:40:45 +02:00
Eelco Chaudron	50e1e57f81	utilities:gdb: Add GDB function to dump Netlink attributes. This commit adds the ovs_dump_nla GDB command, which allows dumping Netlink attributes. Here are some examples: (gdb) ovs_dump_nla 0x7f10e35d8858 172 ovs_check_pkt_len_attr (struct nlattr ) 0x7f10e35d8858:[OVS_CHECK_PKT_LEN_ATTR_PKT_... (struct nlattr ) 0x7f10e35d8860:[OVS_CHECK_PKT_LEN_ATTR_... (struct nlattr ) 0x7f10e35d88b0:[OVS_CHECK_PKT_LEN_ATTR_... (gdb) ovs_dump_nla 0x7f10e35d8858 172 (struct nlattr ) 0x7f10e35d8858: {nla_len = 6, nla_type ... (struct nlattr *) 0x7f10e35d8860: {nla_len = 80, nla_type ... ...len = 80, nla_type = 3}, nl_attr_get() = 0x7f10e35d88b4 (gdb) ovs_dump_nla 0x7f10e35d88b4 80 ovs_action_attr dump ... nla_type = 19}, nl_attr_get() = 0x7f10e35d88b8: 3f 01 00 00 ... Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-07-15 15:21:45 +02:00
Adrian Moreno	6d4044899e	docs: Specify retis dependency on USDT probes. Retis' "--ovs-track" option relies on USDT probes being available. Fixes: 22732c0e6770 ("tests: Add support for running system tests under retis.") Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-10 14:10:36 +02:00
Ilya Maximets	0d9dc8e9ca	dpif-netlink: Provide original upcall pid in 'execute' commands. When a packet enters kernel datapath and there is no flow to handle it, packet goes to userspace through a MISS upcall. With per-CPU upcall dispatch mechanism, we're using the current CPU id to select the Netlink PID on which to send this packet. This allows us to send packets from the same traffic flow through the same handler. The handler will process the packet, install required flow into the kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a recirculation action that will pass the (likely modified) packet through the flow lookup again. And if the flow is not found, the packet will be sent to userspace again through another MISS upcall. However, the handler thread in userspace is likely running on a different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled in the syscall context of that thread. So, when the time comes to send the packet through another upcall, the per-CPU dispatch will choose a different Netlink PID, and this packet will end up processed by a different handler thread on a different CPU. The process continues as long as there are new recirculations, each time the packet goes to a different handler thread before it is sent out of the OVS datapath to the destination port. In real setups the number of recirculations can go up to 4 or 5, sometimes more. There is always a chance to re-order packets while processing upcalls, because userspace will first install the flow and then re-inject the original packet. So, there is a race window when the flow is already installed and the second packet can match it inside the kernel and be forwarded to the destination before the first packet is re-injected. But the fact that packets are going through multiple upcalls handled by different userspace threads makes the reordering noticeably more likely, because we not only have a race between the kernel and a userspace handler (which is hard to avoid), but also between multiple userspace handlers. For example, let's assume that 10 packets got enqueued through a MISS upcall for handler-1, it will start processing them, will install the flow into the kernel and start re-injecting packets back, from where they will go through another MISS to handler-2. Handler-2 will install the flow into the kernel and start re-injecting the packets, while handler-1 continues to re-inject the last of the 10 packets, they will hit the flow installed by handler-2 and be forwarded without going to the handler-2, while handler-2 still re-injects the first of these 10 packets. Given multiple recirculations and misses, these 10 packets may end up completely mixed up on the output from the datapath. Let's provide the original upcall PID via the new netlink attribute OVS_PACKET_ATTR_UPCALL_PID. This way the upcall triggered during the execution will go to the same handler. Packets will be enqueued to the same socket and re-injected in the same order. This doesn't eliminate re-ordering as stated above, since we still have a race between the kernel and the handler thread, but it allows to eliminate races between multiple handlers. The openvswitch kernel module ignores unknown attributes for the OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older kernels. Reported-at: https://issues.redhat.com/browse/FDP-1479 Link: https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maximets@ovn.org/ Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-10 12:20:54 +02:00
Eelco Chaudron	0d5eece55c	mcast-snooping: Properly check MLD packet length. If an MLD packet is not large enough to contain the message-specific data, it may lead to a NULL pointer access. This patch fixes the issue by adding appropriate length checks. Fixes: 06994f879c9d ("mcast-snooping: Add Multicast Listener Discovery support") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-07-07 16:01:56 +02:00
Ilya Maximets	22732c0e67	tests: Add support for running system tests under retis. Retis is very useful for debugging our system tests or debugging kernel issues through our system tests. This change adds a convenient way to run any kernel system test with the retis capture on the background. E.g.: make check-kernel OVS_TEST_WITH_RETIS=yes TESTSUITEFLAGS='167 -d' Retis 1.5 is required, since we're using ifdump profile, and it also will mount debugfs for us in case of running in a different namespace. It should be available in $PATH. In addition to just capturing the retis.data, we're also running the capture with --print to print all the events as they appear, and producing the sorted output in the end. This makes it easier to work across systems with different versions of retis and saves time for running the sort manually. The raw data is still available for advanced processing, if needed. Not specifying any particular collector, capturing everything that's enabled by default. OVS tracking is turned on by default. Since OVS tracking is used, it's required to start retis after the kernel datapath is created, otherwise it will fail to obtain the map of upcall PIDs. That's why we need to start it after the bridge is created. Only adding support for kernel-related test suites for now. For userspace test suites it may also be useful at some point, but currently that requires running without --ovs-track and isn't too important. Startup of the retis capture adds significant amount of time to each test, so not running it by default. Link: https://github.com/retis-org/retis Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-04 17:43:55 +02:00
Ilya Maximets	0491972828	seq: Fix deadlock with the time_init. There is an ABBA deadlock between time_init() and seq_wait(): Thread 1: poll_block() time_poll() time_init() pthread_once() <-- lock A do_time_init() seq_create() pthread_mutex_lock(seq_mutex) <-- lock B Thread 2: seq_wait(different seqno) pthread_mutex_lock(seq_mutex) <-- lock B poll_immediate_wake() poll_timer_wait() time_msec() time_init() pthread_once() <-- lock A This is likely the same deadlock Intel CI saw last year before the lab was shut down. The issue should not happen with normal applications as those would normally have the time module initialized early in the process before waiting on any sequence numbers, but it happens in the test-barrier application from time to time causing the test suite to hang. Fix that by making sure we're not calling poll_immediate_wake() under the seq_mutex. The time and seq modules are independent and it's hard to ensure the dependency without exporting some of their internals. Instead re-defining the prototype of the poll_immediate_wake_at(), adding the thread safety annotation, so we have some basic protection from this deadlock if the code ever changes. Compiler will warn on the prototype mismatch as well if it ever happens, so it's not a big problem. Having this prototype also gives us a spot in the code where we can place a comment explaining the locking order. Reportde-at: https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415436.html Reported-at: https://issues.redhat.com/browse/FDP-1493 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-04 17:39:19 +02:00
David Marchand	1210864a63	netdev-dpdk: Remove unused macro for TSO offloads. This macro is a left over from previous implementation. Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-01 12:57:20 +02:00
Ilya Maximets	3d2f64e5d6	ovsdb-idl: Add new functions to check the column type on the server. Currently, there is no convenient way to know what are the constraints for a particular column in the server's schema in C IDL. This is a problem, because clients may want to know how many elements are allowed in a certain column. For example, we recently increased the allowed number of prefixes configured in the Flow_Table table in OVS, but the client (ovn-controller) has no good way to know how many prefixes are actually supported in the schema of the currently running ovsdb-server. The IDL's code is generated from one schema version, while the actual server may be using newer or older one. If the client specifies too many prefixes, the transaction will fail, and there is also no good way to tell from the ovn-controller why exactly transaction failed. Currently used solution is to create another database connection just to intercept schema changes and parse the schema JSON manually inside the ovn-controller: `89e43f7528` While this approach works, it's not a clean solution. We have the server's schema on the CS level and we can provide the types to the application via IDL functions. This will allow ovn-controller to just use ovsrec_flow_table_prefixes_server_type(idl)->n_max instead of all the awkward schema parsing. Python IDL is more dynamic and has a different way of connecting where the user first obtains the schema and then initializes IDL with that schema. The parsed schema object with all the types is also available through the get_idl_schema() method. So, it is already possible to check the types there. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-01 12:52:17 +02:00
Ilya Maximets	f5819e699e	json: Store short arrays in-place. Similarly to strings, 24 bytes that we have in 'struct json' can fit up to 3 JSON_ARRAY elements. And we can use separate storage types to count them. There are many small arrays in typical databases, for example, every UUID is a 2-element array. So, the change does have a noticeable performance impact. With 350MB OVN Northbound database with 12M atoms: Before After Improvement ovsdb-client dump 16.6 sec 14.9 sec 10.2 % Compaction 13.4 sec 11.0 sec 17.9 % Memory usage (RSS) 2.05 GB 1.90 GB 7.3 % With 615MB OVN Southbound database with 23M atoms: Before After Improvement ovsdb-client dump 43.7 sec 40.5 sec 7.3 % Compaction 32.5 sec 29.4 sec 9.5 % Memory usage (RSS) 4.80 GB 4.46 GB 7.1 % In the results above, 'ovsdb-client dump' is measuring how log it takes for the server to prepare and send a reply, 'Memory usage (RSS)' reflects the RSS of the ovsdb-server after loading the full database. ovn-heater tests report similar reduction in CPU and memory usage on heavy operations like compaction. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-30 16:53:56 +02:00
Ilya Maximets	1de4a08c22	json: Use functions to access json arrays. Internal implementation of JSON array will be changed in the future commits. Add access functions that users can rely on instead of accessing the internals of 'struct json' directly and convert all the users. Structure fields are intentionally renamed to make sure that no code is using the old fields directly. json_array() function is removed, as not needed anymore. Added new functions: json_array_size(), json_array_at(), json_array_set() and json_array_pop(). These are enough to cover all the use cases within OVS. The change is fairly large, however, IMO, it's a much overdue cleanup that we need even without changing the underlying implementation. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-30 16:53:56 +02:00
Ilya Maximets	9669b50f56	json: Store short strings in-place. The 'struct json' contains a union and the largest element of that union is 'struct json_array', which takes 24 bytes. It means, that a lot of space in this structure remains unused whenever the type is not JSON_ARRAY. For example, the 'string' pointer used for JSON_STRING only takes 8 bytes on a 64-bit system leaving 24 - 8 = 16 bytes unused. There is also a 4-byte hole between the 'type' and the 'count'. A pretty common optimization technique for storing strings is to store short ones in place of the pointer and only allocate dynamically the larger strings that do not fit. In our case, we have even larger space of 24 bytes to work with. So, we could use all 24 bytes to store the strings (23 string bytes + '\0') and use the 4 byte unused space outside the union to store the storage type. This approach should allow us to save on memory allocation for short strings and also save on accesses to them, as the content will fit into the same cache line as the 'struct json' itself. In practice, large OVN databases tend to operate with quite large strings. For example, all the logical flow matches and actions in OVN Southbound database would not fit. However, this approach still allows to improve performance with large OVN databases. With 350MB OVN Northbound database with 12M atoms: Before After Improvement ovsdb-client dump 18.6 sec 16.6 sec 10.7 % Compaction 14.0 sec 13.4 sec 4.2 % Memory usage (RSS) 2.28 GB 2.05 GB 10.0 % With 615MB OVN Southbound database with 23M atoms: Before After Improvement ovsdb-client dump 46.1 sec 43.7 sec 5.2 % Compaction 34.8 sec 32.5 sec 6.6 % Memory usage (RSS) 5.29 GB 4.80 GB 9.3 % In the results above, 'ovsdb-client dump' is measuring how log it takes for the server to prepare and send a reply, 'Memory usage (RSS)' reflects the RSS of the ovsdb-server after loading the full database. ovn-heater tests report similar reduction in CPU and memory usage on heavy operations like compaction. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-30 16:53:56 +02:00
Ilya Maximets	6c48b29f52	json: Always use the json_string() method to access the strings. We'll be changing the way strings are stored, so the direct access will not be safe anymore. Change all the users to use the proper API as they should have been doing anyway. This also means splitting the handling of strings and serialized objects in most cases as they will be treated differently. The only code outside of json implementation for which direct access is preserved is substitute_uuids() in test-ovsdb.c. It's an unusual string manipulation that is only needed for the testing, so doesn't seem worthy adding a new API function. We could introduce something like json_string_replace() if this use case will appear somewhere else in the future. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-30 16:53:56 +02:00
Dumitru Ceara	41a4a3723d	sparse/socket.h: Add AF_BRIDGE definition. OVN will be using AF_BRIDGE in the near future as part of the effort to add dynamic routing support for EVPN. Without these definitions compilation (with sparse enabled) fails in OVN. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-25 12:45:04 +02:00
Ilya Maximets	83af8ee6f1	tests: ipsec: Adjust status checks for upcoming Libreswan 5.3. Future Libreswan will start also reporting a number of 'routed' connections: `8f754fe854` Need to adjust our parsing commands accordingly. Instead of just adding '.*' in the sed regex, making it more generic, so we can query 'routed' connections in the future without changing the macro and be more tolerant to future format changes. While at it, also changing some `` into $() to be more consistent with the rest of the file. Acked-by: Mike Pattrick <mkp@redhat.com Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-25 12:43:29 +02:00
Ilya Maximets	83de251fa5	ipsec: libreswan: Remove old certs before importing new ones. If started with --no-restart-ike-daemon, ovs-monitor-ipsec doesn't clear the NSS database. This is not a problem if the certificates do not change while the monitor is down, because completely duplicate entries cannot be added to the NSS database. However, if the monitor is stopped, then certificates change on disk and then the monitor is started back, it will add new tunnel certificates alongside the old ones and will fail to add the new CA certificate. So, we'll end up with multiple certificates for the same tunnel and the outdated CA certificate. This will not allow creating new connections as we'll not be able to verify certificates of the new CA: # certutil -L -d sql:/var/lib/ipsec/nss Certificate Nickname Trust Attributes SSL,S/MIME,JAR/XPI ovs_certkey_c04c352b u,u,u ovs_cert_cacert CT,, ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u ovs_certkey_c04c352b u,u,u pluto: "ovn-c04c35-0-out-1" #459: processing decrypted IKE_AUTH request containing SK{IDi,CERT,CERTREQ,IDr,AUTH,SA, TSi,TSr,N(USE_TRANSPORT_MODE)} pluto: "ovn-c04c35-0-out-1" #459: NSS: ERROR: IPsec certificate CN=c04c352b,OU=kind,O=ovnkubernetes,C=US invalid: SEC_ERROR_UNKNOWN_ISSUER: Peer's Certificate issuer is not recognized. pluto: "ovn-c04c35-0-out-1" #459: NSS: end certificate invalid Fix that by always checking certificates in the NSS database before importing the new one. If they do not match, then remove the old one from the NSS and add the new one. We have to call deletion multiple times in order to remove all the potential duplicates from previous runs. This will be useful on upgrade, but also may save us if one of the deletions ever fail for any reason and we'll end up with a duplicate entry anyway. One alternative might be to always clear the database, even if the --no-restart-ike-daemon option is set, but there is a chance that we'll refresh and ask to re-read secrets before we got all the tunnel information from the database. That may affect dataplane. Even if this is really not possible, the logic seems too far apart to rely on. Also, Libreswan 4.6 seems to have some bug that prevents re-adding deleted connections if we removed and re-add the same certificate (newer versions don't have this issue), so it's better if we do not touch certificates that didn't actually change if we're not restarting the IKE daemon. The clearing may seem redundant now, but it may still be useful to clean up certificates for tunnels that disappeared while the monitor was down. Approach taken in this change doesn't cover this case. Test is added to check the described scenario. The 'on_exit' command is converted to obtain the monitor PID at exit, since we're now killing one monitor and starting another. Fixes: fe5ff26a49f6 ("ovs-monitor-ipsec: Add option to not restart IKE daemon.") Reported-at: https://issues.redhat.com/browse/FDP-1473 Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-25 12:42:08 +02:00
Ilya Maximets	80d723736b	cirrus: Update to FreeBSD 14.3 and 13.5. 13.5 was released a few months back and 14.3 just recently. Older point releases may become unavailable in gcloud in the near future. Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-23 22:08:50 +02:00
David Marchand	6090603703	netdev-dpdk: Remove limit on maximum descriptors count. Using larger rxq can be beneficial in highly bursty setups. Remove the artificial limit on the count of descriptors in rxq and txq. The device driver will limit the values in any case. Reported-at: https://issues.redhat.com/browse/FDP-1415 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-20 21:17:02 +02:00
Ilya Maximets	edecb74043	python: idl: Don't notify the application on _Server database updates. _Server database is not managed by the user and needed mostly for IDL itself to see changes in the schema or cluster leadership. However, we're currently delivering notifications about changes in that database confusing the application (the application didn't subscribe to this database) and also we're increasing the change_seqno potentially returning true for has_ever_connected() call even if we didn't really get any real data yet or even connected to the right database. In the tests these notifications can be seen as two events at the beginning of every test with the notification enabled: 000: event:create, row={}, uuid=<0>, updates=None 000: event:create, row={}, uuid=<1>, updates=None Tests only print the 'simple' table, so the content is omitted, but the data is still there and the empty events are printed out. We should not notify the application nor touch the change_seqno. Tests updated accordingly. Unfortunately, removing first two lines from a test changes the numbers generated by the UUID filter, so the rest of the test needs adjustments as well. Fixes: c39751e44539 ("python: Monitor Database table to manage lifecycle of IDL client.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-20 21:17:02 +02:00
David Marchand	ab062d3cb4	netdev-dpdk: Adjust IPv4 checksum capability for vhost-user. If no L4 checksum can be requested, OVS may as well compute IPv4 checksum when needed. This allows a small optimization where the whole preparation step can be skipped on a batch when a (vhost-user) DPDK port has no offload capability. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:03:20 +02:00
David Marchand	dd443c1a7a	netdev-dpdk: Stop relying on vhost-user Tx flags. vhost-user legacy behavior has been to mark mbuf with Tx offload flags based on what the virtio-net header contained (but provide no Rx information, like IP checksum or L4 checksum validity). Changing to the non legacy mode means that no code out of OVS should set any RTE_MBUF_F_TX_* flag. Had a check accordingly. Link: https://git.dpdk.org/dpdk/commit/?id=ca7036b4af3a Reported-at: https://issues.redhat.com/browse/FDP-1147 Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:03:15 +02:00
David Marchand	b8032fac2c	dp-packet: Remove direct access to DPDK offloads. Now that every use of ol_flags have been reworked, we can remove helper and additional field in dp_packet when not building with DPDK. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:03:12 +02:00
David Marchand	cf7b86db1f	dp-packet: Rework TCP segmentation. Rather than mark with a offload flags + mark with a segmentation size, simply rely on the netdev implementation which sets a segmentation size when appropriate. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:03:09 +02:00
David Marchand	e36793e11f	dp-packet: Resolve unknown checksums. Now that IP and L4 checksum offloading don't require tweaking Tx flags, update checksum status in parts of OVS that validate checksums (in case of unknown status). Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:03:03 +02:00
David Marchand	2956a61265	dp-packet: Rework L4 checksum offloads. The DPDK mbuf API specifies 4 status when it comes to L4 checksums: - RTE_MBUF_F_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum - RTE_MBUF_F_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong - RTE_MBUF_F_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid - RTE_MBUF_F_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet data, but the integrity of the L4 data is verified. Similarly to the IP checksum offloads API, revise OVS L4 offloads API. No information about the L4 protocol is provided by any netdev-* implementation, so OVS needs to mark this L4 protocol during flow extraction. Rename current API for consistency with dp_packet_(inner_)?l4_checksum_. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:02:56 +02:00
David Marchand	3daf04a4c5	dp-packet: Rework IP checksum offloads. As the packet traverses through OVS, offloading Tx flags must be carefully evaluated and updated which results in a bit of complexity because of a separate "outer" Tx offloading flag coming from DPDK API, and a "normal"/"inner" Tx offloading flag. On the other hand, the DPDK mbuf API specifies 4 status when it comes to IP checksums: - RTE_MBUF_F_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum - RTE_MBUF_F_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong - RTE_MBUF_F_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid - RTE_MBUF_F_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet data, but the integrity of the IP header is verified. This patch changes OVS API so that OVS code only tracks the status of the checksum of the "current" L3 header and let the Tx flags aspect to the netdev-* implementations. With this API, the flow extraction can be cleaned up. During packet processing, OVS can simply look for the IP checksum validity (either good, or partial) before changing some IP header, and then mark the checksum as partial. In the conntrack case, when natting packets, the checksum status of the inner part (ICMP error case) must be forced temporarily as unknown to force checksum resolution. When tunneling comes into play, IP checksums status is bit-shifted for future considerations in the processing if, for example, the tunnel header gets decapsulated again, or in the netdev-* implementations that support tunnel offloading. Finally, netdev-* implementations only need to care about packets in partial status: a good checksum does not need touching, a bad checksum has been updated by kept as bad by OVS, an unknown checksum is either an IPv6 or if it was an IPv4, OVS updated it too (keeping it good or bad accordingly). Rename current API for consistency with dp_packet_(inner_)?ip_checksum_. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:00:54 +02:00
David Marchand	67abd51540	dp-packet: Rework tunnel offloads. Rather than set bits in the mbuf ol_flags field, that only makes sense for netdev-dpdk ports, mark packet for tunnel offload in OVS offloads API. While at it, since there is nothing really "hardware" related, rename current API for consistency with dp_packet_tunnel_ prefix. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:00:48 +02:00
David Marchand	e2200485c5	dp-packet: Expand offloads preparation helper. Expand this helper to clearly separate the non tunnel case from the tunnel one. This will make later changes easier to read. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:00:45 +02:00
David Marchand	d29ba0abdc	dp-packet: Add OVS offloading API. As a preparation for tracking inner checksums, separate Rx checksum status from the DPDK ol_flags field. To minimize the cost of translating from DPDK API to OVS API, simply map OVS flags to DPDK Rx mbuf flags. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 21:00:34 +02:00
David Marchand	19ef1b1f0f	dp-packet: Remove DPDK specific IP version. Flagging packets with IP version is only needed at the netdev-dpdk level. In most cases, OVS is already inspecting the IP header in packet data, so maintaining such IP version metadata won't save much cycles (given the cost of additional branches necessary for handling outer/inner flags). Cleanup OVS shared code and only set these flags in netdev-dpdk.c. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 20:59:22 +02:00
David Marchand	52fdeda11a	dp-packet: Remove Linux specific L4 offloads. As the virtio-net offload API is used for netdev-linux ports, but provides no information about the potentially encapsulated protocol concerned by a checksum request, specific information from this netdev- specific implementation is propagated into OVS code, and must be carefully evaluated when some tunnel gets decapsulated. This induces a cost in "normal" processing path, while the netdev-linux path is not performance critical. This patch removes such specific information, yet try harder to parse the packet on the Rx side and set offload flags accordingly for non encapsulated traffic. For encapsulated traffic, the inner checksum is computed. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 20:59:04 +02:00
Terry Wilson	a86ae3c865	python: Add uuid/convert references to uuid for Row.__str__. Row stringification happens a lot in client logs and it is far more useful to have the logged Row's uuid printed. This also adds converting referenced Row objects, and references within set and map columns to UUIDs. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 12:37:29 +02:00
Mark Michelson	8ee7ecb8a2	db-ctl-base: Allow retrieving rows of type OVSDB_TYPE_UUID. The ctl_get_row() function attempts to match a user-provided string to a particular database row. This works by comparing the user-provided string to the values of columns provided by the ctl utility (e.g. ovs-vsctl). Before this commit, this comparison could only be made for columns of type OVSDB_TYPE_INTEGER and OVSDB_TYPE_STRING. If a ctl utility provided a column of a different type, then db-ctl-base.c would assert in get_row_by_id(). This commit enhances the ability of ctl_get_row() to also retrieve rows based on columns of type OVSDB_TYPE_UUID. The user-provided string is converted to a UUID and compared against the column's value. If it matches, then the row matches. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-19 12:36:29 +02:00
Aaron Conole	8a1a0ea7c0	AUTHORS: Add Changliang Wu. Signed-off-by: Aaron Conole <aconole@redhat.com>	2025-06-13 14:09:11 -04:00
Changliang Wu	aea4734299	lldp: Fix out of bound write in chassisid_to_string. snprintf will automatically write \0 at the end of the string, and the last one byte will be out of bound. create a new function ds_put_hex_with_delimiter, instead of chassisid_to string and format_hex_arg. Found in sanitize test. Signed-off-by: Changliang Wu <changliang.wu@smartx.com> Signed-off-by: Aaron Conole <aconole@redhat.com>	2025-06-13 14:06:55 -04:00
Mike Pattrick	614029aac0	conntrack: Allow inner NAT of related fragments. Currently conntrack will refuse to extract metadata from fragmented IPv4 packets. Usually the fragments would be processed by the ipf module, but this isn't the case for ICMP related packets. The current handling will result in these being incorrectly processed. This patch checks for a frag offset instead of just frag flags, which is similar to how conntrack handles fragments in the kernel. Reported-at: https://issues.redhat.com/browse/FDP-136 Reported-by: Ales Musil <amusil@redhat.com> Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.") Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com>	2025-06-13 14:06:07 -04:00
Eelco Chaudron	ca9e67c801	daemon-unix: Handle potential negative values from sysconf(). Coverity reports that daemon_set_new_user() may receive a large unsigned value from get_sysconf_buffer_size(), due to sysconf() returning -1 and being cast to size_t. Although this would likely lead to an allocation failure and abort, it's better to handle the error in place. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-12 15:28:31 +02:00
Eelco Chaudron	99af7f3791	ovsdb: Fix Coverity leak warning by marking code as unreachable. Coverity reports a memory leak on the 'error' variable in ovsdb_trigger_try(). However, this code path is unreachable due to an ovs_assert() in an earlier function call. To make this clear to Coverity and silence the warning, the section is explicitly marked as unreachable. Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-10 17:07:06 +02:00
Eelco Chaudron	2c634482f2	raft: Fix resource leak from ignored ovsdb_log_write_and_free() error. The Raft codebase includes calls to ovsdb_log_write_and_free() that are incorrectly wrapped in ignore(). This causes potential error resources to be leaked. These calls should be wrapped in ovsdb_error_destroy() instead, to ensure that any returned error objects are properly freed and do not result in memory leaks. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-10 17:05:37 +02:00
Eelco Chaudron	b90304bfe7	ovsdb-server: Fix potential memory leak in parse_options(). When duplicate --config-file command-line arguments are passed, the resources for previously specified file path were not freed. This fix ensures unused resources are properly freed while preserving the existing behavior of using the last configuration file path specified. Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-10 17:04:49 +02:00
Eelco Chaudron	d1bd62dae5	ofproto-dpif-upcall: Check odp_tun_key_from_attr() return value. In the IPFIX and flow sample upcall handling, check the validity of the tunnel key returned by odp_tun_key_from_attr(). If the tunnel key is invalid, return an error. This was reported by Coverity, but the change also improves robustness and avoids undefined behavior in the case of malformed tunnel attributes. Fixes: 8b7ea2d48033 ("Extend OVS IPFIX exporter to export tunnel headers") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2025-06-10 17:04:09 +02:00
Eelco Chaudron	88737f02ed	ofproto-dpif-xlate: Fix memory leak in xlate_generic_encap_action(). This is not a real issue, as the initializer function, rewrite_flow_push_nsh(), ensures it returns NULL on error. However, cleaning this up improves code clarity and resolves a Coverity warning about a potential memory leak. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-10 17:03:39 +02:00
Eelco Chaudron	8fca3f99cf	lldp: Fix Coverity warning about resource leak in lldp test. Coverity reported a potential resource leak in the LLDP test code. While this condition should never occur in practice, since the test would crash on out-of-memory, the warning is addressed by ensuring the cleanup function is called on error paths. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-10 17:02:58 +02:00
Ales Musil	d283829477	sparse: Define new AVX10 includes added in GCC >= 15. The GCC >=15 added new AVX10 header files, add defines for them as sparse is not able to understand new types in those. This can be seen with DPDK headers. Tested-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-03 18:48:45 +02:00
Ales Musil	0e419d1b4f	sparse: Add workaround for OpenSSL configuration. sparse fails to process OpenSSL configuration header file in recent OpenSSL version (3.2.x). Add workaround header that will disable the problematic macro. Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-02 17:30:02 +02:00
Ilya Maximets	8224cd47f3	tests: tunnel-push-pop: Fix occasional failure of the drop test. Datapath port zero is normally taken by the 'datapath interface', i.e. the ovs-dummy interface. This makes it not possible to allocate port zero for the p0 interface. So, it will race with p1 for the number 1. If p0 happens to be created first, it will take the 1 and p1 will get the port 2 and then the test passes. However, if p1 is created first, then it will take the 1 and p0 will take the 2. In this case the test fails as the port name in the trace will be different. Use '--names' to avoid this problem, but also fix the port numbers and use the 'add_of_ports' macro instead of plain-coding the port addition. The macro would've made the issue more obvious in the first place. Fixes: 1015b13f054d ("ofproto-dpif-xlate: Add a drop action for native tunnel failure.") Acked-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-06-02 16:03:58 +02:00
David Marchand	e99ce7d5df	flow: Fix checksum offloads with simple match. Packets with L4 partial status for a simple match flow would not get L4 checksums offloads applied. This was not caught in unit tests, because packets from netdev-dummy (calling miniflow_extract) would get Tx flags set early, before parse_tcp_flags() got called during packet processing. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-05-30 18:00:56 +02:00
Kevin Traynor	48ce3a5a52	dpdk: Use DPDK 24.11.2 release. Update the CI and docs to use DPDK 24.11.2. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2025-05-30 16:48:44 +01:00
Roi Dayan	b42f9fde4a	netdev-dpdk: Fix possible memory leak in vhost stats. On error condition need to release the allocated structs. Reported by Coverity. Fixes: 3b29286db1c5 ("netdev-dpdk: Add per virtqueue statistics.") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>	2025-05-30 14:22:23 +01:00

1 2 3 4 5 ...

20531 Commits