mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

Author	SHA1	Message	Date
Eli Britstein	270de5dfb8	ofproto: Move group-modify to mod_start instead of mod_finish. Upon modifying a group, the following steps occur: 1. ofproto_group_mod_start()->modify_group_start(): Find an old group object, create a new one. 2. ofproto_bump_tables_version() 3. ofproto_group_mod_finish(): Modify the new group object with buckets etc. At step #3, the new group object is already in use by revalidators, that may read incorrect data while being modified. Instead, move the group modification of the new object to step #1. Fixes: 0a8f6beb54ab ("ofproto-dpif: Fix dp_hash mapping after select group modification.") Acked-by: Gaetan Rivet <gaetanr@nvidia.com> Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-22 20:06:34 +02:00
Ilya Maximets	0d9dc8e9ca	dpif-netlink: Provide original upcall pid in 'execute' commands. When a packet enters kernel datapath and there is no flow to handle it, packet goes to userspace through a MISS upcall. With per-CPU upcall dispatch mechanism, we're using the current CPU id to select the Netlink PID on which to send this packet. This allows us to send packets from the same traffic flow through the same handler. The handler will process the packet, install required flow into the kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a recirculation action that will pass the (likely modified) packet through the flow lookup again. And if the flow is not found, the packet will be sent to userspace again through another MISS upcall. However, the handler thread in userspace is likely running on a different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled in the syscall context of that thread. So, when the time comes to send the packet through another upcall, the per-CPU dispatch will choose a different Netlink PID, and this packet will end up processed by a different handler thread on a different CPU. The process continues as long as there are new recirculations, each time the packet goes to a different handler thread before it is sent out of the OVS datapath to the destination port. In real setups the number of recirculations can go up to 4 or 5, sometimes more. There is always a chance to re-order packets while processing upcalls, because userspace will first install the flow and then re-inject the original packet. So, there is a race window when the flow is already installed and the second packet can match it inside the kernel and be forwarded to the destination before the first packet is re-injected. But the fact that packets are going through multiple upcalls handled by different userspace threads makes the reordering noticeably more likely, because we not only have a race between the kernel and a userspace handler (which is hard to avoid), but also between multiple userspace handlers. For example, let's assume that 10 packets got enqueued through a MISS upcall for handler-1, it will start processing them, will install the flow into the kernel and start re-injecting packets back, from where they will go through another MISS to handler-2. Handler-2 will install the flow into the kernel and start re-injecting the packets, while handler-1 continues to re-inject the last of the 10 packets, they will hit the flow installed by handler-2 and be forwarded without going to the handler-2, while handler-2 still re-injects the first of these 10 packets. Given multiple recirculations and misses, these 10 packets may end up completely mixed up on the output from the datapath. Let's provide the original upcall PID via the new netlink attribute OVS_PACKET_ATTR_UPCALL_PID. This way the upcall triggered during the execution will go to the same handler. Packets will be enqueued to the same socket and re-injected in the same order. This doesn't eliminate re-ordering as stated above, since we still have a race between the kernel and the handler thread, but it allows to eliminate races between multiple handlers. The openvswitch kernel module ignores unknown attributes for the OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older kernels. Reported-at: https://issues.redhat.com/browse/FDP-1479 Link: https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maximets@ovn.org/ Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-07-10 12:20:54 +02:00
Eelco Chaudron	d1bd62dae5	ofproto-dpif-upcall: Check odp_tun_key_from_attr() return value. In the IPFIX and flow sample upcall handling, check the validity of the tunnel key returned by odp_tun_key_from_attr(). If the tunnel key is invalid, return an error. This was reported by Coverity, but the change also improves robustness and avoids undefined behavior in the case of malformed tunnel attributes. Fixes: 8b7ea2d48033 ("Extend OVS IPFIX exporter to export tunnel headers") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com>	2025-06-10 17:04:09 +02:00
Eelco Chaudron	88737f02ed	ofproto-dpif-xlate: Fix memory leak in xlate_generic_encap_action(). This is not a real issue, as the initializer function, rewrite_flow_push_nsh(), ensures it returns NULL on error. However, cleaning this up improves code clarity and resolves a Coverity warning about a potential memory leak. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-06-10 17:03:39 +02:00
Eli Britstein	1015b13f05	ofproto-dpif-xlate: Add a drop action for native tunnel failure. Upon tunnel output failure, due to routing failure for example, add an explicit drop action, so it will appear in the dp-flows for better visibility for that case. For those, add additional drop reasons. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-04-30 22:41:19 +02:00
Vladislav Odintsov	1724a293d0	ofproto: Log bond rebalancing stats once in rebalance run. Rebalancing stats are printed once in a run with INFO log level. Old detailed per-hash rebalancing stats now printed with DBG log level. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-04-02 13:46:49 +02:00
Eli Britstein	ce77927bfe	ofproto-dpif-xlate: Embed support check in put_drop_action. All the calls to put_drop_action checks first if it is supported. Instead, embed the check inside. Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-04-02 12:01:54 +02:00
Eelco Chaudron	ba675897e4	ofproto-dpif: Fix spelling in comments and the support field macro. Acked-by: Simon Horman <horms@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com>	2025-03-28 10:42:39 -04:00
Ilya Maximets	0b686a29b7	ofproto-dpif: Fix dp_hash mapping after select group modification. When a new bucket is inserted to an existing group, OVS creates a new group with just that one bucket and then copies old buckets into it. The problem is that dp_hash mappings remain initialized for that one bucket and no traffic can be sent to any of the old buckets. If a bucket is removed, then OVS creates an empty new group and then copies old buckets into it, except for the removed one. Mappings are also not updated in this case and the group behaves as if it had no buckets at all. We need to re-hash all the buckets after the copy of the old buckets. ofproto-provider API already has a callback for this case, but it just wasn't implemented for ofproto-dpif. It wasn't necessary in the past, but it became necessary when the hash map construction was moved to the ofproto-dpif layer. No locking in this function is required, because the new group must not be visible during modification. It becomes visible only after the GROUP_MOD processing is finished. Fixes: 2e3fd24c7c44 ("ofproto-dpif: Improve dp_hash selection method for select groups") Reported-at: https://github.com/openvswitch/ovs-issues/issues/357 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com>	2025-03-21 13:22:26 -04:00
Dima Chumak	1898112c8c	ofproto: Add JSON output for 'fdb/show' command. The 'fdb/show' command now supports machine-readable JSON output in addition to the plain-text output for humans. An example would be: ovs-appctl --format=json --pretty fdb/show br-phy [ { "static": true, "mac": "e4:8c:07:08:00:02", "port": 2, "vlan": 0 }, { "age": 14, "mac": "e4:8c:07:08:00:03", "port": 3, "vlan": 0 } ] Acked-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-03-12 15:23:52 +01:00
Ilya Maximets	19b8941620	tunnels: Remove support for deprecated STT and LISP. STT and LISP tunnel types were deprecated and marked for removal in the following commits in the OVS 3.5 release: 3b37a6154a59 ("netdev-vport: Deprecate STT tunnel port type.") 8d7ac031c03d ("netdev-vport: Deprecate LISP tunnel port type.") Main reasons were that STT was rejected in upstream kernel and the LISP was never upstreamed as well and doesn't really have a supported implementation. Both protocols also appear to have lost their former relevance. Removing both now. While at it, also fixing some small documentation issues and comments. Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Alin Serdean <aserdean@ovn.org> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2025-02-28 17:19:41 +01:00
Eelco Chaudron	410e0f519f	ofproto-dpif-xlate: Fix source IP lookup for non-bridge ports. The commit "userspace: Enable non-bridge port as tunnel endpoint" did not account for the ovs_router_get_netdev_source_address() call. This patch ensures that the fix from "ofproto-dpif-xlate: Fix netdev native tunnel neighbor discovery" remains functional by properly calling the function with the output devices. Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.") Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-02-26 13:17:55 +01:00
Eelco Chaudron	ce20ca0631	ofproto-dpif-xlate: Fix memory leak in xlate_generic_encap_action(). Ensure that xlate_generic_encap_action() does not return a buffer on error. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-02-10 09:27:58 +01:00
Eelco Chaudron	f7c85a730b	ofproto: Fix potential NULL pointer dereference in ofproto_type_xx(). Ensure a valid class is found before calling it's function pointer. If it's not found, call ovs_assert(). Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-02-10 09:27:43 +01:00
Eelco Chaudron	9e784ed6c2	ofproto: Fix potential null-ptr dereference in meter_insert_rule(). Add an ovs_assert() in this case, as a valid meter should exists. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-02-10 09:27:23 +01:00
Eelco Chaudron	5f6b8facad	ofproto-dpif-ipfix: Add NULL check to dpif_ipfix_set_options(). Coverity complains that we could potentially dereference a NULL pointer. To prevent this warning, add an ovs_assert(). Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2025-02-10 09:27:07 +01:00
Ales Musil	b272282abb	ofproto: Fix default pmd_id for ofproto/detrace. The system and netdev datapath have different default pmd_id, which resulted in empty output of ofproto/detrace with kernel datapath. Also indicate that the UFID or cache wasn't available. Make sure we use the correct default pmd_id when it's not specified as an argument. At the same time move slightly adjusted test into system tests so it is tested with both datapaths. Fixes: 600125b2c380 ("ofproto: Add ofproto/detrace command to map UFIDs to OpenFlow.") Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-12-12 10:57:39 +01:00
Ilya Maximets	464bc6f9c4	ofproto-dpif-upcall: Fix use of uninitialized missed dumps counter. The first time revalidator checks the value - it is not initialized, so we may end up marking valid flows for deletion. WARNING: MemorySanitizer: use-of-uninitialized-value 0 0x6ee9e9 in revalidator_sweep__ ofproto/ofproto-dpif-upcall.c:3003:25 1 0x6ed671 in revalidator_purge ofproto/ofproto-dpif-upcall.c:3056:5 2 0x6e997d in udpif_stop_threads ofproto/ofproto-dpif-upcall.c:566:17 3 0x6ecf05 in udpif_flush ofproto/ofproto-dpif-upcall.c:756:5 4 0x60323e in flush ofproto/ofproto-dpif.c:2020:9 5 0x56b10e in ofproto_flush__ ofproto/ofproto.c:1669:9 6 0x56a67b in ofproto_destroy ofproto/ofproto.c:1821:5 7 0x4c9012 in bridge_destroy vswitchd/bridge.c:3644:9 8 0x4c7c13 in bridge_exit vswitchd/bridge.c:556:9 9 0x5261a8 in main vswitchd/ovs-vswitchd.c:147:5 10 0x7fa0bb in __libc_start_call_main 11 0x7fa0bb in __libc_start_main@GLIBC_2.2.5 12 0x432b24 in _start (vswitchd/ovs-vswitchd+0x432b24) Fixes: 180ab2fd635e ("ofproto-dpif-upcall: Avoid stale ukeys leaks.") Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-11-29 18:32:52 +01:00
Ilya Maximets	2af7cef264	ofproto: Enable address prefix tracking for IPv6 by default. It is common to have flow tables with OpenFlow rules for both IPv4 and IPv6 traffic at the same time. One major example is OVN. Today only IPv4 addresses and L4 ports are prefix matched by default. Since recently, it's possible to turn on the optimization for all 4 fields at the same time (nw_dst, nw_src, ipv6_dst and ipv6_src), but it is an inconvenience for users. IPv6 configurations become more and more common in the real world, so having IPv6 tables not optimized by default is not good. Enable prefix tree lookups for IPv6 addresses by default in addition to IPv4. This change increases memory consumption slightly as well as takes a few extra cycles per flow to create and later iterate over additional prefix trees. However, IPv4 and IPv6 matches are mutually exclusive, so performance overhead should be minimal, especially in comparison with the benefits this configuration brings to IPv6 setups reducing the number of datapath flows by default. A new test added to check that defaults are working as expected. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-11-27 19:20:04 +01:00
Mike Ovsiannikov	9fa73098c4	ofproto/ofproto: Initialize learn add rule flag. Initialize learn_adds_rule in order to prevent crash in ofproto_flow_mod_learn_finish() due to being invoked with uninitialized rules collections. Here is the stack trace of the failure / segmentation fault: orig_ofproto=orig_ofproto@entry=0x1d39220) at ofproto/ofproto-provider.h:509 xcache=<optimized out>, stats=0x7fff8e76eb10) at ofproto/ofproto-dpif.c:4966 ofproto/ofproto-dpif.c:5811 oh=<optimized out>) at ofproto/ofproto.c:3851 oh=<optimized out>, ofconn=0x20689a0) at ofproto/ofproto.c:8823 <handle_openflow>, ofconn=0x20689a0) at ofproto/connmgr.c:1329 handle_openflow=handle_openflow@entry=0x7f33b423e560 <handle_openflow>) at ofproto/connmgr.c:356 vswitchd/ovs-vswitchd.c:132 The learn.ofm structure appears to be valid at this point, except that learn_adds_rule and old_rules new_rules fields does not appear to be initialized: (gdb) p entry->learn.ofm[0] $3 = { temp_rule = 0x15fb520, criteria = { table_id = 1 '\001', cr = { node = { prev = 0x15bd570, next = { p = 0x15bd570 } }, priority = 31, cls_match = { p = 0x0 }, match = { { { flow = 0x21b77a0, mask = 0x21b77d0 }, flows = {0x21b77a0, 0x21b77d0} }, tun_md = 0x0 } }, version = 18446744073709551614, cookie = 0, cookie_mask = 0, out_port = 65535, out_group = 4294967295, include_hidden = false, include_readonly = false }, conjs = 0x0, n_conjs = 0, command = 2, modify_cookie = true, modify_may_add_flow = true, modify_keep_counts = true, event = 492535614, table_id = 1 '\001', version = 492535619, learn_adds_rule = 2, old_rules = { collection = { objs = 0x1, n = 492535617, capacity = 5, stub = {0x1d5b7f41, 0x30, 0x0, 0x0, 0x0} } }, new_rules = { collection = { objs = 0x0, n = 0, capacity = 0, stub = {0x0, 0x0, 0x0, 0x600000000, 0x0} } } } Signed-off-by: Mike Ovsiannikov <mike.ovsiannikov@nutanix.com> Signed-off-by: Aaron Conole <aconole@redhat.com>	2024-11-11 13:42:28 -05:00
Mike Pattrick	b3e08faf9a	bond: Always revalidate unbalanced bonds when active member changes. Currently a bond will not always revalidate when an active member changes. This can result in counter-intuitive behaviors like the fact that using ovs-appctl bond/set-active-member will cause the bond to revalidate but changing other_config:bond-primary will not trigger a revalidate in the bond. When revalidation is not set but the active member changes in an unbalanced bond, OVS may send traffic out of previously active member instead of the new active member. This change will always mark unbalanced bonds for revalidation if the active member changes. Reported-at: https://issues.redhat.com/browse/FDP-845 Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-10-30 13:09:34 +01:00
Mike Pattrick	49a249fb28	ofproto-dpif-upcall: Fix redundant mirror on metadata modification. Previously a commit attempted to reset the mirror context when packets were modified. However, this commit erroneously also reset the mirror context when only a packet's metadata was modified. An intermediate commit corrected this for tunnel metadata, but now that correction is extended to other forms of metadata as well. Fixes: feed7f677505 ("ofproto-dpif-upcall: Mirror packets that are modified.") Reported-at: https://issues.redhat.com/browse/FDP-699 Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-10-30 12:58:40 +01:00
Ilya Maximets	cb64234786	ofproto-dpif: Improve load balancing in dp_hash select groups. ovn-kubernetes users observe uneven distribution of traffic among service backends. It gets noticeably worse when 9+ backends are configured. For example: Server connections server-6nbg7 634 server-72br8 326 <----- server-7b8rm 605 server-9tnz7 655 server-c84hw 596 server-mjskl 622 server-ptkcc 336 <----- server-rl7pk 592 server-xd2cb 634 Here we can see that out of 5000 connections total, servers '72br8' and 'ptkcc' received about 2 times less connections than any other server. With 10 backends, 4 servers will receive 2x less, etc. The situation is common, because kubernetes users frequently scale their applications with something like kubectl scale --replicas=10. Services are implemented as OVN load-balancers, which are implemented as OpenFlow select groups with a dp_hash selection method. Balancing is implemented by allocating a hash space (number of hashes is a power of 2) and distributing hashes between the group buckets using Webster's method taking into account bucket weights. This works well enough when buckets have different weights, because OVS needs to allocate larger hash space in order to accommodate the weight difference. However, in case of OVN load-balancers, all the buckets (backends) have the same weight. This leads to allocation of a smaller hash space just enough to fit all the backends (closest power of 2). For example, if we have 10 backends (buckets), then 16 hashes will be allocated. After hashes are distributed, we end up with 6 buckets having 2 hashes each and 4 buckets having 1 hash. So, each of the 6 buckets will receive on average 2x more traffic than each of the remaining 4. The problem can be mitigated by expanding the hash space and this is already done for cases with small number of buckets by limiting the hash space to have at least 16 hashes. However, it is just not large enough for 9 or 10 buckets to provide a good distribution. At the same time, blindly increasing the limit is also not a good solution as we do not want too large of a hash space for small number of buckets, simply because each hash is a separate datapath flow and having too many of them may cause performance issues. Approach taken in this change is to ensure that the hash space is at least 4 times larger than the number of buckets, but not larger than the maximum allowed (256). This provides a better distribution while not unnecessarily exploding number of datapath flows for services with not that many backends. Here is some data to demonstrate why the 4 was chosen as a coefficient: coeff. : 1 2 3 4 5 100 ------------------------------------------------------------------- AvgDiff : 43.1 % 27.1 % 18.3 % 15.1 % 13.6 % 10.9 % MaxDiff : 50.0 % 33.3 % 25.0 % 20.0 % 20.0 % 20.0 % AvgDev : 24.9 % 13.4 % 8.5 % 6.9 % 6.1 % 4.8 % MaxDev : 35.4 % 20.4 % 14.4 % 11.2 % 11.2 % 11.2 % --------+---------------------------------------------------------- 16 : 1 1 1 1 1 - 32 : 17 9 6 5 4 - 64 : 33 17 11 9 7 - 128 : 65 33 22 17 13 1 256 : 129 65 43 33 26 2 --------+---------------------------------------------------------- current proposed Table shows average and maximum load difference (Diff) between backends across groups with 1 to 64 equally weighted backends. And it shows average and maximum standard deviation (Dev) of load distribution for the same. For example, with a coefficient 2, the maximum difference between two backends will be 33% and the maximum standard deviation will be 20.4%. With the current logic (coefficient of 1) we have maximum difference as high as 50%, as shown with the example at the beginning, with the standard deviation of 35.4%. The bottom half of the table shows from how many backends we start to use a particular number of buckets. For example, with a coeff. 3 we will have 16 hashes for 1 to 5 buckets, 32 hashes for 6-10, 64 buckets for 11-21 and so on. According to the table, the number 4 is about where we achieve a good enough standard deviation for the load (11.2%) while still not creating too many hashes for cases with low number of backends. The standard deviation also doesn't go down that much with higher coefficient. The data is aggregated for up to 64 backends because with higher numbers we're getting close to the cap of 256 hashes, so deviation increases. However, the load difference with so many backends should have lower impact on a cluster in general. The change improves the cases with 64+ backends, but to get very good numbers we'll need to consider moving the upper limit for the hash space, which is a separate change to think about. Added test checks that standard deviation of the load distribution between buckets is below 12.5% for all cases up to 64 buckets. It's just a pretty number that I've chosen that should be IMO good enough. Note: when number of buckets is a power of 2, then we have an ideal distribution with zero deviation, because hashes can be evenly mapped to buckets. Note 2: The change doesn't affect distribution for 4 or less backends, since there are still minimum 16 hashes for those groups. Reported-at: https://issues.redhat.com/browse/FDP-826 Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-10-07 21:17:06 +02:00
Jonathan Davies	5cb19214e9	ofproto/bond: Preserve active bond member over restarts. The OVS DB has a bond_active_slave field. This gets read by port_configure_bond() into struct bond_settings' active_member_mac field. See commit 3e5aeeb5 ("bridge: Keep bond active slave selection across OVS restart") for the original rationale for preserving the active bond member. But since commit 30353934 ("ofproto/bond: Validate active-slave mac.") the bond_settings' active_member_mac field is ignored by bond_create(), which set bond->active_member_mac to eth_addr_zero. Instead, set it to the value of the bond_settings' active_member_mac so that the selection is preserved across OVS restarts. Also add a test that checks this behaviour. Fixes: 303539348848 ("ofproto/bond: Validate active-slave mac.") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-09-11 15:36:02 +02:00
Eelco Chaudron	252ee0f182	dpif: Fix flow put debug message match content. The odp_flow_format() function applies a wildcard mask when a mask for a given key was not present. However, in the context of installing flows in the datapath, the absence of a mask actually indicates that the key should be ignored, meaning it should not be masked at all. To address this inconsistency, odp_flow_format() now includes an option to skip formatting keys that are missing a mask. This was found during a debug session of the ‘datapath - ping between two ports on cvlan’ test case. The log message was showing the following: put[create] ufid:XX recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(3), skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=12:f6:8b:52:f9:75,dst=6e:48:c8:77:d3:8c),eth_type(0x88a8), vlan(vid=4094,pcp=0/0x0),encap(eth_type(0x8100), vlan(vid=100/0x0,pcp=0/0x0),encap(eth_type(0x0800), ipv4(src=10.2.2.2,dst=10.2.2.1,proto=1,tos=0,ttl=64,frag=no), icmp(type=0,code=0))), actions:2 Where it should have shown the below: put[create] ufid:XX recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(3), skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=12:f6:8b:52:f9:75,dst=6e:48:c8:77:d3:8c),eth_type(0x88a8), vlan(vid=4094,pcp=0/0x0),encap(eth_type(0x8100)), actions:2 Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-09-09 10:22:16 +02:00
Eelco Chaudron	180ab2fd63	ofproto-dpif-upcall: Avoid stale ukeys leaks. It is observed in some environments that there are much more ukeys than actual DP flows. For example: $ ovs-appctl upcall/show system@ovs-system: flows : (current 7) (avg 6) (max 117) (limit 2125) offloaded flows : 525 dump duration : 1063ms ufid enabled : true 23: (keys 3612) 24: (keys 3625) 25: (keys 3485) The revalidator threads are busy revalidating the stale ukeys leading to high CPU and long dump duration. This patch tracks the number of consecutive missed dumps. If four dumps are missed in a row, it is assumed that the datapath flow no longer exists, and the ukey can be deleted. Reported-by: Roi Dayan <roid@nvidia.com> Co-authored-by: Han Zhou <hzhou@ovn.org> Co-authored-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Aaron Conole <aconole@redhat.com>	2024-08-30 07:56:58 -04:00
Mike Pattrick	70bc3baaad	ofproto-dpif-xlate: Add a recursion limit to tunnel address lookup. In deployments with multiple tunnels, it can be possible to enter an infinite loop where traffic generates an arp/nd lookup which is forwarded to a different tunnel, generating a new arp/nd packet that is send back to the first tunnel. Now the depth counter is incremented for new arp/nd packets, just as happens in xlate_table_action(). Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-08-09 22:19:51 +02:00
Ilya Maximets	f9078407a9	ofproto-dpif-xlate: Initialize observe_offset for sample actions. For some reason gcc 14.1.1 from Fedora 41 thinks that the variable may end up not initialized: ofproto/ofproto-dpif-xlate.c: In function 'compose_sample_action': ofproto/ofproto-dpif-xlate.c:3465:40: error: 'observe_offset' may be used uninitialized 3465 \| ctx->xout->last_observe_offset = observe_offset; \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ ofproto/ofproto-dpif-xlate.c:3418:12: note: 'observe_offset' was declared here 3418 \| size_t observe_offset; \| ^~~~~~~~~~~~~~ We have an assertion in the code to ensure that at least one of the actions is present (userspace or psample), so the variable should actually be always initialized. Initialize explicitly just to silence the warning. Fixes: 516569d31fbf ("ofproto: xlate: Make sampled drops explicit.") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-17 20:32:45 +02:00
Ilya Maximets	f973d9543e	ofproto-dpif-xlate: Remove misleading wc NULL check in packet mirror. 'wc' can't be NULL there and if it can we'd already crash a few lines before setting up vlan flags. The check is misleading as it makes people to assume that wc can be NULL. And it makes Coverity think the same: CID 1596572: (#1 of 1): Dereference after null check (FORWARD_NULL) 25. var_deref_op: Dereferencing null pointer ctx->wc. 14. var_compare_op: Comparing ctx->wc to null implies that ctx->wc might be null Remove the check. Fixes: 3b1882261c8b ("ofproto-dpif-mirror: Add support for pre-selection filter.") Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-17 16:29:37 +02:00
Mike Pattrick	3b1882261c	ofproto-dpif-mirror: Add support for pre-selection filter. Currently a bridge mirror will collect all packets and tools like ovs-tcpdump can apply additional filters after they have already been duplicated by vswitchd. This can result in inefficient collection. This patch adds support to apply pre-selection to bridge mirrors, which can limit which packets are mirrored based on flow metadata. This significantly improves overall vswitchd performance during mirroring if only a subset of traffic is required. Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-16 00:28:23 +02:00
Mike Pattrick	04c090c61e	ofproto-dpif-mirror: Reduce number of function parameters. Previously the mirror_set() and mirror_get() functions took a large number of parameters, which was inefficient and difficult to read and extend. This patch moves most of the parameters into a struct. Acked-by: Simon Horman <horms@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 23:34:59 +02:00
Dumitru Ceara	600125b2c3	ofproto: Add ofproto/detrace command to map UFIDs to OpenFlow. It improves the debugging experience if we can easily get a list of OpenFlow rules and groups that contribute to the creation of a datapath flow. The suggested workflow is: a. dump datapath flows (along with UUIDs), this also prints the core IDs (PMD IDs) when applicable. $ ovs-appctl dpctl/dump-flows -m flow-dump from pmd on cpu core: 7 ufid:7460db8f..., recirc_id(0), .... b. dump related OpenFlow rules and groups: $ ovs-appctl ofproto/detrace ufid:7460db8f... pmd=7 cookie=0x12345678, table=0 priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1) cookie=0x0, table=1 priority=200,actions=group:1 group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2)) cookie=0x0, table=2 actions=output:1 The new command only shows rules and groups attached to ukeys that are in states UKEY_VISIBLE or UKEY_OPERATIONAL. That should be fine as all other ukeys should not be relevant for the use case presented above. This commit tries to mimic the output format of the ovs-ofctl dump-flows/dump-groups commands. Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 19:12:52 +02:00
Adrian Moreno	1aa9e137fe	ofp-actions: Load data from fields in sample action. When sample action gets used as a way of sampling traffic with controller-generated metadata (i.e: obs_domain_id and obs_point_id), the controller will have to increase the number of flows to ensure each part of the pipeline contains the right metadata. As an example, if the controller decides to sample stateful traffic, it could store the computed metadata for each connection in the conntrack label. However, for established connections, a flow must be created for each different ct_label value with a sample action that contains a different hardcoded obs_domain and obs_point id. This patch adds a new version of the NXAST_RAW_SAMPLE* action (number 4) that supports specifying the observation point and domain using an OpenFlow field reference, so now the controller can express: sample(... obs_domain_id=NXM_NX_CT_LABEL[0..31], obs_point_id=NXM_NX_CT_LABEL[32..63] ... ) Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:31:09 +02:00
Adrian Moreno	c2e6836460	ofproto-dpif-xlate: Avoid allocating mf_subfield. "enum mf_subfield" (a 128byte object) is dynamically allocated a few times just to set it to an all-ones mask. Avoid dynamically allocating them by creating a static all-ones mask similar to what was done with "exact_match_mask". Acked-by: Eelco Chaudron <echaudro@redhat.com> Suggested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:31:09 +02:00
Adrian Moreno	516569d31f	ofproto: xlate: Make sampled drops explicit. When the flow translation results in a datapath action list whose last action is an "observational" action, i.e: one generated for IPFIX, sFlow or local sampling applications, the packet is actually going to be dropped (and observed). In that case, add an explicit drop action so that drop statistics remain accurate. This behavior is controlled by a configurable boolean knob called "explicit_sampled_drops" Combine the "optimizations" and other odp_actions "tweaks" into a single function. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:31:09 +02:00
Adrian Moreno	c10dbcec75	ofproto-dpif-xlate: Use psample for local sample. Use the newly added psample action to implement OpenFlow sample() actions with local sampling configuration if possible. A bit of refactoring in compose_sample_actions arguments helps make it a bit more readable. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:31:09 +02:00
Adrian Moreno	5b99ebc268	ofproto: Add ofproto-dpif-lsample. Add a new resource in ofproto-dpif and the corresponding API in ofproto_provider.h to represent and local sampling configuration. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:15:30 +02:00
Adrian Moreno	d0afbf0944	ofproto_dpif: Check for psample support. Only kernel datapath supports this action so add a function in dpif.c that checks for that. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-15 11:15:30 +02:00
Adrian Moreno	1a3bd96b4f	odp-util: Add support OVS_ACTION_ATTR_PSAMPLE. Add support for parsing and formatting the new action. Also, flag OVS_ACTION_ATTR_SAMPLE as requiring datapath assistance if it contains a nested OVS_ACTION_ATTR_PSAMPLE. The reason is that the sampling rate from the parent "sample" is made available to the nested "psample" by the kernel. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-14 17:19:52 +02:00
Adrian Moreno	d9de6b01c2	ofproto-dpif: Allow forcing dp features. Datapath features can be set with dpif/set-dp-features unixctl command. This command is not documented and therefore not supported in production but only useful for unit tests. A limitation was put in place originally to avoid enabling features at runtime that were disabled at boot time to avoid breaking the datapath in unexpected ways. But, considering users should not use this command and it should only be used for testing, we can assume whoever runs it knows what they are doing. Therefore, the limitation should be bypass-able. This patch adds a "--force" flag to the unixctl command to allow bypassing the mentioned limitation. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-14 17:19:52 +02:00
Mike Pattrick	d7e77143fb	tunnel: Allow UDP zero checksum with IPv6 tunnels. This patch adopts the proposed RFC 6935 by allowing null UDP checksums even if the tunnel protocol is IPv6. This is already supported by Linux through the udp6zerocsumtx tunnel option. It is disabled by default and IPv6 tunnels are flagged as requiring a checksum, but this patch enables the user to set csum=false on IPv6 tunnels. Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-12 11:54:45 +02:00
Jakob Meng	4935e89325	ofproto: Add JSON output for 'dpif/show' command. The 'dpif/show' command now supports machine-readable JSON output in addition to the plain-text output for humans. An example would be: ovs-appctl --format json dpif/show Reported-at: https://bugzilla.redhat.com/1824861 Signed-off-by: Jakob Meng <code@jakobmeng.de> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-07-09 13:49:44 +02:00
Eelco Chaudron	361d7bce0f	ofproto-dpif: Define age as time_t in ofproto_unixctl_fdb_add(). Fix the warning from Coverity about potential truncation of the time_t value when copying to a local variable by changing the local variable's type to time_t. Fixes: ccc24fc88d59 ("ofproto-dpif: APIs and CLI option to add/delete static fdb entry.") Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-06-03 13:23:41 +02:00
Eelco Chaudron	f673d0cd5f	sflow: Fix check for disabled receive time. Changed sFlowRcvrTimeout to a uint32_t to avoid time_t warnings reported by Coverity. A uint32_t is more than large enough as this is a (seconds) tick counter and OVS is not even using this. Fixes: c72e245a0e2c ("Add InMon's sFlow Agent library to the build system.") Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-06-03 13:23:41 +02:00
Eelco Chaudron	0c8e626401	utilities: Correct deletion reason in flow_reval_monitor.py. The flow_reval_monitor.py script incorrectly reported the reasons for FDR_PURGE and FDR_TOO_EXPENSIVE, as their descriptions were swapped. This patch rectifies the order using a dictionary to avoid similar problems in the future. In addition this patch also syncs the delete reason output of the script, with the comments in the code. Fixes: 86b9e653ef22 ("revalidator: Add a USDT probe during flow deletion with purge reason.") Acked-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>	2024-05-21 13:31:02 +02:00
Ilya Maximets	b91f6788c4	ofproto-dpif-trace: Fix access to an out-of-scope stack memory. While tracing NAT actions, pointer to the action may be stored in the recirculation node for future reference. However, while translating actions for the group bucket in xlate_group_bucket, the action list is allocated temporarily on stack. So, in case the group translation leads to NAT, the stack pointer can be stored in the recirculation node and accessed later by the tracing mechanism when this stack memory is long gone: ==396230==ERROR: AddressSanitizer: stack-use-after-return on address 0x191844 at pc 0x64222a bp 0xa5da10 sp 0xa5da08 READ of size 1 at 0x191844 thread T0 0 0x642229 in ofproto_trace_recirc_node ofproto/ofproto-dpif-trace.c:704:49 1 0x642229 in ofproto_trace ofproto/ofproto-dpif-trace.c:867:9 2 0x6434c1 in ofproto_unixctl_trace ofproto/ofproto-dpif-trace.c:489:9 3 0xc1e491 in process_command lib/unixctl.c:310:13 4 0xc1e491 in run_connection lib/unixctl.c:344:17 5 0xc1e491 in unixctl_server_run lib/unixctl.c:395:21 6 0x53eedf in main ovs/vswitchd/ovs-vswitchd.c:131:9 7 0x2be087 in __libc_start_call_main 8 0x2be14a in __libc_start_main@GLIBC_2.2.5 9 0x42dee4 in _start (vswitchd/ovs-vswitchd+0x42dee4) Address 0x191844 is located in stack of thread T0 at offset 68 in frame 0 0x6d391f in xlate_group_bucket ofproto/ofproto-dpif-xlate.c:4751 This frame has 3 object(s): [32, 1056) 'action_list_stub' (line 4760) <== Memory access at offset 68 is inside this variable [1184, 1248) 'action_list' (line 4761) [1280, 1344) 'action_set' (line 4762) SUMMARY: AddressSanitizer: stack-use-after-return ofproto/ofproto-dpif-trace.c:704:49 in ofproto_trace_recirc_node Fix that by copying the action. Fixes: d072d2de011b ("ofproto-dpif-trace: Improve NAT tracing.") Reported-by: Ales Musil <amusil@redhat.com> Reviewed-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-05-03 14:16:20 +02:00
Lin Huang	120140f891	ofproto: Fix Coverity false positive. Coverity reports a false positive below: Ofproto_class_find__() may return NULL, and dereference it to cause segfault. This patch is made just to avoid false-positive Coverity report. Tested-by: Zhang YuHuang <zhangyuhuang@ruijie.com.cn> Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-04-09 19:36:09 +02:00
Ilya Maximets	ed379a810a	ofproto-dpif-upcall: Fix ukey installation failure logs and counters. ukey_install() returns boolean signaling if the ukey was installed or not. Installation may fail for a few reasons: 1. Conflicting ukey. 2. Mutex contention while trying to replace existing ukey. 3. The same ukey already exists and active. Only the first case here signals an actual problem. Third one is a little odd for userspace datapath, but harmless. Second is the most common one that can easily happen during normal operation since other threads like revalidators may be currently working on this ukey preventing an immediate access. Since only the first case is actually worth logging and it already has its own log message, removing the 'upcall installation fails' warning from the upcall_cb(). This should fix most of the random failures of userspace system tests in CI. While at it, also fixing coverage counters. Mutex contention was mistakenly counted as a duplicate upcall. ukey contention for revalidators was counted only in one of two places. New counter added for the ukey contention on replace. We should not re-use existing upcall_ukey_contention counter for this, since it may lead to double counting. Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.") Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-04-06 00:09:53 +02:00
Eric Garver	3c8d069b9b	dpif: Probe support for OVS_ACTION_ATTR_DROP. Kernel support has been added for this action. As such, we need to probe the datapath for support. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Eric Garver <eric@garver.life> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-04-05 23:10:17 +02:00
Eric Garver	54d94f8f4d	dpif: Support atomic_bool field type. The next commit will convert a dp feature from bool to atomic_bool. As such we have to add support to the macros and functions. We must pass by reference instead of pass by value because all the atomic operations require a reference. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Eric Garver <eric@garver.life> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-04-05 23:10:17 +02:00

1 2 3 4 5 ...

2883 Commits