mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

Author	SHA1	Message	Date
Ilya Maximets	07c27226ee	ovsdb: Monitor: Keep and maintain the initial change set. Change sets in OVSDB monitor are storing all the changes that happened between a particular transaction ID and now. Initial change set basically contains all the data. On each monitor request a new initial change set is created by creating an empty change set and adding all the database rows. Then it is converted into JSON reply and immediately untracked and destroyed. This is causing significant performance issues if many clients are requesting new monitors at the same time. For example, that is happening after database schema conversion, because conversion triggers cancellation of all monitors. After cancellation, every client sends a new monitor request. The server then creates a new initial change set, sends a reply, destroys initial change set and repeats that for each client. On a system with 200 MB database and 500 clients, cluster of 3 servers spends 20 minutes replying to all the clients (200 MB x 500 = 100 GB): timeval\|WARN\|Unreasonably long 1201525ms poll interval Of course, all the clients are already disconnected due to inactivity at this point. When they are re-connecting back, server accepts new connections one at a time, so inactivity probes will not be triggered anymore, but it still takes another 20 minutes to handle all the incoming connections. Let's keep the initial change set around for as long as the monitor itself exists. This will allow us to not construct a new change set on each new monitor request and even utilize the JSON cache in some cases. All that at a relatively small maintenance cost, since we'll need to commit changes to one extra change set on every transaction. Measured memory usage increase due to keeping around a shallow copy of a database is about 10%. Measured CPU usage difference during normal operation is negligible. With this change it takes only 30 seconds to send out all the monitor replies in the example above. So, it's a 40x performance improvement. On a more reasonable setup with 250 nodes, the process takes up to 8-10 seconds instead of 4-5 minutes. Conditional monitoring will benefit from this change as well, however results might be less impressive due to lack of JSON cache. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-24 22:42:25 +02:00
Ilya Maximets	172c935ed9	ovsdb: Avoid converting database twice on an initiator. Cluster member, that initiates the schema conversion, converts the database twice. First time while verifying the possibility of the conversion, and the second time after reading conversion request back from the storage. Keep the converted database from the first time around and use it after reading the request back from the storage. This cuts in half the conversion CPU cost. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-24 22:42:25 +02:00
Ilya Maximets	08449bb470	ovsdb: Perform conversion with no data for clustered databases. Currently, database schema conversion in case of clustered database produces a transaction record with both new schema and converted database data. So, the sequence of events is following: 1. Get the new schema. 2. Convert the database to a new schema. 3. Translate the newly converted database into JSON. 4. Write the schema + data JSON to the storage. 5. Destroy converted version of a database. 6. Read schema + data JSON from the storage and parse. 7. Create a new database from a parsed database data. 8. Replace current database with the new one. Most of these steps are very computationally expensive. Also, conversion to/from JSON is much more expensive than direct database conversion with ovsdb_convert() that can make use of shallow data copies. Instead of doing all that, let's make use of previously introduced ability to not write the converted data into the storage. The process will look like this then: 1. Get the new schema. 2. Convert the database to a new schema (to verify that it is possible). 3. Write the schema to the storage. 4. Destroy converted version of a database. 5. Read the new schema from the storage and parse. 6. Convert the database to a new schema. 7. Replace current database with the new one. Most of the operations here are performed on the small schema object, instead of the actual database data. Two remaining data operations (actual conversion) are noticeably faster than conversion to/from JSON due to reference counting and shallow data copies. Steps 4-6 can be optimized later to not convert twice on the process that initiates the conversion. The change results in following performance improvements in conversion of OVN_Southbound database schema from version 20.23.0 to 20.27.0 (measured on a single-server RAFT cluster with no clients): \| Before \| After +---------+-------------------+---------+------------------ DB size \| Total \| Max poll interval \| Total \| Max poll interval --------+---------+-------------------+---------+------------------ 542 MB \| 47 sec. \| 26 sec. \| 15 sec. \| 10 sec. 225 MB \| 19 sec. \| 10 sec. \| 6 sec. \| 4.5 sec. 542 MB database had 19.5 M atoms, 225 MB database had 7.5 M atoms. Overall performance improvement is about 3x. Also, note that before this change database conversion basically doubles the database file on disk. Now it only writes a small schema JSON. Since the change requires backward-incompatible database file format changes, documentation is updated on how to perform an upgrade. Handled the same way as we did for the previous incompatible format change in 2.15 (column diffs). Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-24 22:42:25 +02:00
Ilya Maximets	4d6cdd8e0d	ovsdb: Allow conversion records with no data in a clustered storage. If the schema with no data was read from the clustered storage, it should mean a database conversion request. In general, we can get: 1. Just data --> Transaction record. 2. Schema + Data --> Database conversion or raft snapshot install. 3. Just schema --> New. Database conversion request. We cannot distinguish between conversion and snapshot installation request in the current implementation, so we will keep handling conversion with data in the same way as before, i.e. if data is provided, we should use it. ovsdb-tool is updated to handle this record type as well while converting cluster to standalone. This change doesn't introduce a way for such records to appear in the database. That will be added in the future commits targeting conversion speed increase. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-24 22:40:18 +02:00
Ilya Maximets	a73b0206ba	ovsdb: Check for ephemeral columns before writing a new schema. Clustered databases do not support ephemeral columns, but ovsdb-server checks for them after the conversion result is read from the storage. It's much easier to recover if this constraint is checked before writing to the storage instead. It's not a big problem, because the check is always performed by the native ovsdb clients before sending a conversion request. But the server, in general, should not trust clients to do the right thing. Check in the update_schema() remains, because we shouldn't blindly trust the storage. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-24 22:34:56 +02:00
Ilya Maximets	5575539f6c	ovsdb-tool: Fix cluster-to-standalone for DB conversion records. If database conversion happens, both schema and the new data are present in the database record. However, the schema is just silently ignored by ovsdb-tool cluster-to-standalone. This creates data inconsistency if the new data contains new columns, for example, so the resulting database file will not be readable, or data will be lost. Fix that by re-setting the database whenever a conversion record is found and actually writing a new schema that will match the actual data. The database file will not be that similar to the original, but there is no way to represent conversion in a standalone database file format otherwise. Fixes: 00de46f9ee42 ("ovsdb-tool: Convert clustered db to standalone db.") Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-24 22:34:49 +02:00
David Marchand	d70688a729	system-offloads-traffic: Fix tc ingress pps check for meter offload. Caught during some code review. SUPPORT_TC_INGRESS_PPS has been replaced with CHECK_TC_INGRESS_PPS(). Fixes: 5f0fdf5e2c2e ("test: Move check for tc ingress pps support to test script.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Simon Horman <simon.horman@corigine.com>	2023-04-19 09:14:54 +02:00
Paolo Valerio	9fa612959c	ovs-dpctl: Add new command dpctl/ct-[sg]et-sweep-interval. Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.") the sweep interval changed as well as the constraints related to the sweeper. Being able to change the default reschedule time may be convenient in some conditions, like debugging. This patch introduces new commands allowing to get and set the sweep interval in ms. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-06 22:59:25 +02:00
Ilya Maximets	75eae65602	github: Test building Fedora RPMs. Testing that RPMs can be built to catch possible spec file issues like missing dependencies. GitHub seems to have an agreement with Docker Hub about rate limiting of image downloads, so it should not affect us. We may switch to quay.io if that will ever become a problem in the future. Reviewed-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-06 22:49:41 +02:00
Ilya Maximets	7864b380d8	AUTHORS: Add Songtao Zhan. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-06 22:45:27 +02:00
Songtao Zhan	8cba7a76d5	ovs-tcpdump: Stdout is shutdown before ovs-tcpdump exit. If there is a pipe behind ovs-tcpdump (such as ovs-tcpdump -i eth0 \| grep "192.168.1.1"), the child process (grep "192.168.1.1") may exit first and close the pipe when received SIGTERM. When farther process (ovs-tcpdump) exit, stdout is flushed into broken pipe, and then received a exception IOError. To avoid such problems, ovs-tcpdump first close stdout before exit. Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-06 22:43:23 +02:00
Aaron Conole	9d840923d3	ofproto-dpif-xlate: Always mask ip proto field. The ofproto layer currently treats nw_proto field as overloaded to mean both that a proper nw layer exists, as well as the value contained in the header for the nw proto. However, this is incorrect behavior as relevant standards permit that any value, including '0' should be treated as a valid value. Because of this overload, when the ofproto layer builds action list for a packet with nw_proto of 0, it won't build the complete action list that we expect to be built for the packet. That will cause a bad behavior where all packets passing the datapath will fall into an incomplete action set. The fix here is to unwildcard nw_proto, allowing us to preserve setting actions for protocols which we know have support for the actions we program. This means that a traffic which contains nw_proto == 0 cannot cause connectivity breakage with other traffic on the link. Reported-by: David Marchand <dmarchand@redhat.com> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2134873 Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-06 13:17:15 +02:00
Lin Huang	e41bdb1761	conntrack-tp: Fix clang warning. Declaration of 'struct conn' will not be visible outside of this function. Declaration of 'struct conntrack' will not be visible outside of this function. Declaration of 'struct timeout_policy' will not be visible outside of this function. Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-04 17:12:34 +02:00
Ilya Maximets	b535476680	AUTHORS: Add Faicker Mo. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-03 20:41:23 +02:00
Faicker Mo	f9507c1ea4	netdev-offload-tc: Del ufid mapping if device not exist. The device may be deleted and added with ifindex changed. The tc rules on the device will be deleted if the device is deleted. The func tc_del_filter will fail when flow del. The mapping of ufid to tc will not be deleted. The traffic will trigger the same flow(with same ufid) to put to tc on the new device. Duplicated ufid mapping will be added. If the hashmap is expanded, the old mapping entry will be the first entry, and now the dp flow can't be deleted. Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-04-03 20:39:33 +02:00
Daniel Alvarez Sanchez	daeab9548a	db-ctl-base: Partially revert b8bf410a5. The commit b8bf410a5 [0] broke the `ovs-vsctl add` command which now overwrites the value if it existed already. This patch reverts the code around the `cmd_add` function to restore the previous behavior. It also adds testing coverage for this functionality. [0] `b8bf410a5c` Fixes: b8bf410a5c94 ("db-ctl-base: Use partial map/set updates for last add/set commands.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182767 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Daniel Alvarez Sanchez <dalvarez@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-30 22:10:40 +02:00
Nobuhiro MIKI	0f34ecbd5a	vswitch.xml: Add description of SRv6 tunnel and related options. The description of SRv6 was missing in vswitch.xml, which is used to generate the man page, so this patch adds it. Fixes: 03fc1ad78521 ("userspace: Add SRv6 tunnel support.") Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-30 22:10:40 +02:00
Mike Pattrick	306583b568	netdev-tc-offloads: Fix misaligned 8 byte read. UB Sanitizer report: lib/netdev-offload-tc.c:1276:19: runtime error: load of misaligned address 0x7f74e801976c for type 'union ovs_u128', which requires 8 byte alignment 0 in netdev_tc_flow_dump_next lib/netdev-offload-tc.c:1276 1 in netdev_flow_dump_next lib/netdev-offload.c:303 2 in dpif_netlink_flow_dump_next lib/dpif-netlink.c:1921 [...] Fixes: 8f7620e6a406 ("netdev-tc-offloads: Implement netdev flow dump api using tc interface") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 23:20:06 +02:00
Nobuhiro MIKI	7381fd440a	odp: Add SRv6 tunnel actions. This patch adds ODP actions for SRv6 and its tests. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 22:16:04 +02:00
Nobuhiro MIKI	03fc1ad785	userspace: Add SRv6 tunnel support. SRv6 (Segment Routing IPv6) tunnel vport is responsible for encapsulation and decapsulation the inner packets with IPv6 header and an extended header called SRH (Segment Routing Header). See spec in: https://datatracker.ietf.org/doc/html/rfc8754 This patch implements SRv6 tunneling in userspace datapath. It uses `remote_ip` and `local_ip` options as with existing tunnel protocols. It also adds a dedicated `srv6_segs` option to define a sequence of routers called segment list. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 22:16:04 +02:00
Nobuhiro MIKI	349112f975	flow: Support rt_hdr in parse_ipv6_ext_hdrs(). Checks whether IPPROTO_ROUTING exists in the IPv6 extension headers. If it exists, the first address is retrieved. If NULL is specified for "frag_hdr" and/or "rt_hdr", those addresses in the header are not reported to the caller. Of course, "frag_hdr" and "rt_hdr" are properly parsed inside this function. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 21:41:28 +02:00
Nobuhiro MIKI	57b9fc50dd	tnl-ports: Support multiple nw_protos. In some tunnels, inner packet needs to support both IPv4 and IPv6. Therefore, this patch improves to allow two protocols to be tied together in one tunneling. Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 21:41:27 +02:00
Nobuhiro MIKI	0db74e0eb4	tests: Define new ADD_VETH_NS macro. The new ADD_VETH_NS macro creates two netns and connects them with a veth pair. We can use it for testing in a generic purpose. e.g. ADD_VETH_NS([ns1], [p1], [1.1.1.1/24], [ns2], [p2], [1.1.1.2/24]) Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-29 21:41:27 +02:00
Adrian Moreno	b354cee2e0	ovs-thread: Fix cpus not read for the first 10s. With the current implementation the available CPUs will not be read until 10s have passed since the system's boot. For systems that boot faster, this can make ovs-vswitchd create fewer handlers than necessary for some time. Fixes: 0d23948a598a ("ovs-thread: Detect changes in number of CPUs.") Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2180460 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-27 22:05:07 +02:00
Adrian Moreno	79f9367449	dpif-netlink: Always create at least 1 handler. Ensure at least 1 handler is created even if something goes wrong during cpu detection or prime numer calculation. Fixes: a5cacea5f988 ("handlers: Create additional handler threads when using CPU isolation.") Suggested-by: Aaron Conole <aconole@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-27 22:04:40 +02:00
Eelco Chaudron	d53ee36aa6	netdev-offload-tc: Fix parse_tc_flower_to_actions() reporting errors. parse_tc_flower_to_actions() was not reporting errors, which would cause parse_tc_flower_to_match() to ignore them. Fixes: dd03672f7bbb ("netdev-offload-tc: Move flower_to_match action handling to isolated function.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-22 20:06:34 +01:00
Mike Pattrick	b3935cf90e	tests/mfex: Retain support for cryptography pre-v37. Prior to v37.0.0, CryptographyDeprecationWarning could not be imported from __init__.py resulting in: Traceback (most recent call last): File "mfex_fuzzy.py", line 9, in <module> category=cryptography.CryptographyDeprecationWarning, AttributeError: module 'cryptography' has no attribute 'CryptographyDeprecationWarning' This import was only added to __init__ to deprecate python3.6. Importing the exception from cryptography.utils is the compatible option. Fixes: c3ed0bf34b8a ("tests/mfex: Silence Blowfish/CAST5 deprecation warnings.") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-22 20:02:13 +01:00
Aaron Conole	07cf5810de	dpdk: Allow retaining CAP_SYS_RAWIO privileges. Open vSwitch generally tries to let the underlying operating system managed the low level details of hardware, for example DMA mapping, bus arbitration, etc. However, when using DPDK, the underlying operating system yields control of many of these details to userspace for management. In the case of some DPDK port drivers, configuring rte_flow or even allocating resources may require access to iopl/ioperm calls, which are guarded by the CAP_SYS_RAWIO privilege on linux systems. These calls are dangerous, and can allow a process to completely compromise a system. However, they are needed in the case of some userspace driver code which manages the hardware (for example, the mlx implementation of backend support for rte_flow). Here, we create an opt-in flag passed to the command line to allow this access. We need to do this before ever accessing the database, because we want to drop all privileges asap, and cannot wait for a connection to the database to be established and functional before dropping. There may be distribution specific ways to do capability management as well (using for example, systemd), but they are not as universal to the vswitchd as a flag. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-22 18:56:02 +01:00
Ales Musil	e90a0727f1	vswitch: Add missing documentation for "ct_flush" capability. Fixes: 08146bf7d9b4 ("openflow: Add extension to flush CT by generic match.") Signed-off-by: Ales Musil <amusil@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-15 21:24:38 +01:00
Ales Musil	ebe98c587d	dpctl: Fix flush-conntrack with datapath as argument. Specifying datapath with "dpctl/flush-conntrack" didn't work as expected and caused error: ovs-dpctl: field system@ovs-system missing value (Invalid argument) To prevent that, check if we have datapath as first argument and use it accordingly. Also add couple of test cases to ensure that everything works as expected. Fixes: a9ae73b916ba ("ofp, dpif: Allow CT flush based on partial match.") Signed-off-by: Ales Musil <amusil@redhat.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-15 21:24:14 +01:00
Eelco Chaudron	a4cd2afea5	ofproto-dpif-upcall: Remove redundant time_msec() in revalidate(). Remove one of two consecutive time_msec() calls in the revalidate() function. We take the time stamp after udpif_get_n_flows(), to avoid any potential delays in getting the number of offloaded flows. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-15 21:23:14 +01:00
Eelco Chaudron	29720e378e	ofproto-dpif-upcall: Wait for valid hw flow stats before applying min-revalidate-pps. Depending on the driver implementation, it can take from 0.2 seconds up to 2 seconds before offloaded flow statistics are updated. This is true for both TC and rte_flow-based offloading. This is causing a problem with min-revalidate-pps, as old statistic values are used during this period. This fix will wait for at least 2 seconds, by default, before assuming no packets where received during this period. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-15 21:22:22 +01:00
Eelco Chaudron	51778134d4	system-traffic: Fix conntrack test cases which are failing with af_xdp. The recently added test cases below are not passing on the af_xdp datapath due to tcpdump not working on the OVS ports with this datapath. conntrack - ICMP related NAT with single port conntrack - ICMPv6 related NAT with single port conntrack - ICMP from different source related with NAT The tests are changed to attach tcpdump on the associated veth port in the netns. Tests are now passing with all datapaths (afxdp, kernel, userspace, and offloads). Fixes: 8bd688063078 ("system-traffic.at: Add icmp error tests while dnatting address and port.") Fixes: 0a7587034dc9 ("conntrack: Properly unNAT inner header of related traffic.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-13 22:36:05 +01:00
Nobuhiro MIKI	49e534cd37	route-table: Retrieving the preferred source address from Netlink. We can use the "ip route add ... src ..." command to set the preferred source address for each entry in the kernel FIB. OVS has a mechanism to cache the FIB, but the preferred source address is ignored and calculated with its own logic. This patch resolves the difference between kernel FIB and OVS route table cache by retrieving the RTA_PREFSRC attribute of Netlink messages. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-07 19:16:20 +01:00
Nobuhiro MIKI	b801f1aa00	ovs-router: Introduce src option in ovs/route/add command. When adding a route with ovs/route/add command, the source address in "ovs_router_entry" structure is always the FIRST address that the interface has. See "ovs_router_get_netdev_source_address" function for more information. If an interface has multiple ipv4 and/or ipv6 addresses, there are use cases where the user wants to control the source address. This patch therefore addresses this issue by adding a src parameter. Note that same constraints also exist when caching routes from Kernel FIB with Netlink, but are not dealt with in this patch. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-07 18:29:16 +01:00
Nobuhiro MIKI	01acf09f74	ofproto: Fix man page for tunnel related commands. Fixed the manual page to indicate that both IPv4/IPv6 are supported. Also added missing pkt_mark on one side and fixed the "gw" and "bridge" notation quirks. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-07 18:23:12 +01:00
Nobuhiro MIKI	915f084b9f	ovs-router: Cleanup parser for ovs/route/add command. This patch cleans up the parser to accept pkt_mark and gw in any order. pkt_mark and gw are normally expected to be specified exactly once. However, as with other tools, if specified multiple times, the last specification is used. Also, pkt_mark and gw have separate prefix strings so they can be parsed in any order. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-07 18:23:01 +01:00
Nobuhiro MIKI	de6589799e	netdev-dummy: Support multiple IP addresses. This is useful in test cases where multiple IPv4/IPv6 addresses are assigned together. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-07 18:22:40 +01:00
Ilya Maximets	f65d1951df	AUTHORS: Add Fangrui Song. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-06 19:57:00 +01:00
Fangrui Song	71ca8393b7	treewide: Remove uses of ATOMIC_VAR_INIT. ATOMIC_VAR_INIT has a trivial definition `#define ATOMIC_VAR_INIT(value) (value)`, is deprecated in C17/C++20, and will be removed in newer standards in newer GCC/Clang (e.g. https://reviews.llvm.org/D144196). Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-06 19:27:19 +01:00
Wilson Peng	e3c821f8ca	netdev-windows: Add checking when creating netdev with system type on Windows In the recent Antrea project testing, some port could not be created on Windows. When doing debug, our team found there is one case happening when multiple ports are waiting for be created with correct port number. Some system type port will be created netdev successfully and it will cause conflict as in the dpif side it will be internal type. So finally the port will be created failed and it could not be easily recovered. With the patch, on Windows the netdev creating will be blocked for system type when the ovs_tyep got on dpif is internal. More detailed case description is in the reported issue No.262 with link below. Reported-at:https://github.com/openvswitch/ovs-issues/issues/262 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>	2023-03-06 12:37:53 +01:00
Eelco Chaudron	bfc0d5da35	ofproto-dpif-upcall: Include hardware offloaded flows in total flows. The revalidator process uses the internal call udpif_get_n_flows() to get the total number of flows installed in the system. It uses this value for various decisions on flow installation and removal. With the tc offload this values is incorrect, as the hardware offloaded are not included. With rte_flow offload this is not a problem as dpif netdev keeps both in sync. This patch will include the hardware offloaded flows if the underlying dpif implementation is not syncing them. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-03 22:27:37 +01:00
Eelco Chaudron	4d69c19000	ofproto-dpif-upcall: Reset ukey's last stats value if the datapath changed. When the ukey's action set changes, it could cause the flow to use a different datapath, for example, when it moves from tc to kernel. This will cause the the cached previous datapath statistics to be used. This change will reset the cached statistics when a change in datapath is discovered. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-03-03 22:27:37 +01:00
Ilya Maximets	489553b1c2	classifier: Fix missing masks on a final stage with ports trie. Flow lookup doesn't include masks of the final stage in a resulting flow wildcards in case that stage had L4 ports match. Only the result of ports trie lookup is added to the mask. It might be sufficient in many cases, but it's not correct, because ports trie is not how we decided that the packet didn't match in this subtable. In fact, we used a full subtable mask in order to determine that, so all the subtable mask bits has to be added. Ports trie can still be used to adjust ports' mask, but it is not sufficient to determine that the packet didn't match. Assuming we have following 2 OpenFlow rules on the bridge: table=0, priority=10,tcp,tp_dst=80,tcp_flags=+psh actions=drop table=0, priority=0 actions=output(1) The first high priority rule supposed to drop all the TCP data traffic sent on port 80. The handshake, however, is allowed for forwarding. Both 'tcp_flags' and 'tp_dst' are on the final stage in the flow. Since the stage mask from that stage is not incorporated into the flow wildcards and only ports mask is getting updated, we have the following megaflow for the SYN packet that has no match on 'tcp_flags': $ ovs-appctl ofproto/trace br0 "in_port=br0,tcp,tp_dst=80,tcp_flags=syn" Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80 Datapath actions: 1 If this flow is getting installed into datapath flow table, all the packets for port 80, regardless of TCP flags, will be forwarded. Incorporating all the looked at bits from the final stage into the stages map in order to get all the necessary wildcards. Ports mask has to be updated as a last step, because it doesn't cover the full 64-bit slot in the flowmap. With this change, in the example above, OVS is producing correct flow wildcards including match on TCP flags: Megaflow: recirc_id=0,eth,tcp,in_port=LOCAL,nw_frag=no,tp_dst=80,tcp_flags=-psh Datapath actions: 1 This way only -psh packets will be forwarded, as expected. This issue affects all other fields on stage 4, not only TCP flags. Tests included to cover tcp_flags, nd_target and ct_tp_src/dst. First two are frequently used, ct ones are sharing the same flowmap slot with L4 ports, so important to test. Before the pre-computation of stage masks, flow wildcards were updated during lookup, so there was no issue. The bits of the final stage was lost with introduction of 'stages_map'. Recent adjustment of segment boundaries exposed 'tcp_flags' to the issue. Reported-at: https://github.com/openvswitch/ovs-issues/issues/272 Fixes: ca44218515f0 ("classifier: Adjust segment boundary to execute prerequisite processing.") Fixes: fa2fdbf8d0c1 ("classifier: Pre-compute stage masks.") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-28 18:57:20 +01:00
Paolo Valerio	8bd6880630	system-traffic.at: Add icmp error tests while dnatting address and port. The two tests verify, for both icmp and icmpv6, that the correct port translation happen in the inner packet in the case an error is received in the reply direction. Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-28 18:43:55 +01:00
Simon Horman	5f0fdf5e2c	test: Move check for tc ingress pps support to test script. Move check for tc ingress pps support to from aclocal to test script This has several problems: 1. Stderror from failing commands is output when executing various make targets. 2. There are various failure conditions that lead to veth0 and veth1 being created by not cleaned up. 3. The check seems to execute for many make targets. And it attempts to temporarily modify system state. This seems inappropriate. 4. veth0 and veth1 seem far too generic and could easily conflict with other parts of the system. All these problems are addressed by this patch. Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Acked-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-28 10:44:40 +01:00
Adrian Moreno	f1f278f5e1	ipfix: Make template and stats interval configurable. Add options to the IPFIX table configure the interval to send statistics and template information. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-27 21:18:59 +01:00
Ilya Maximets	b5313a8cec	ofproto: Fix re-creation of tunnel backing interfaces on restart. Tunnel OpenFlow ports do not exist in the datapath, instead there is a tunnel backing interface that serves all the tunnels of the same type. For example, if the geneve port 'my_tunnel' is added to OVS, it will create 'geneve_sys_6041' datapath port, if it doesn't already exist, and use this port as a tunnel output. However, while creating/opening a new datapath after re-start, ovs-vswitchd only has a list of names of OpenFlow interfaces. And it thinks that each datapath port, that is not on the list, is a stale port that needs to be removed. This is obviously not correct for tunnel backing interfaces that can serve multiple tunnel ports and do not match OpenFlow port names. This is causing removal and re-creation of all the tunnel backing interfaces in the datapath on OVS restart, causing disruption in existing connections. It's hard to tell by only having a name of the interface if this interface is a tunnel backing interface, or someone just named a normal interface this way. So, instead of trying to determine that, not removing any interfaces at all, while we don't know types of actual ports we need. Assuming that all the ports that are currently not in the list of OF ports are tunnel backing ports. Later, revalidation of tunnel backing ports in type_run() will determine which ports are still needed and which should be removed. It's OK to add even a non-tunnel stale ports into tnl_backers, they will be cleaned up the same way as stale tunnel backers. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-February/052215.html Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-27 13:43:14 +01:00
Ilya Maximets	cf288fdfe2	AUTHORS: Add Liang Mancang and Viacheslav Galaktionov. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-21 21:13:53 +01:00
Viacheslav Galaktionov	c156f9bc50	ofproto: Include flow cookies in bridge/dump-flows output. Cookies are an important part of flow descriptions and must be available to the end user. Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@arknetworks.am> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-02-21 21:13:43 +01:00

1 2 3 4 5 ...

19672 Commits