mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-30 13:58:14 +00:00

Author	SHA1	Message	Date
Eelco Chaudron	4056ae4875	ofp-flow: Skip flow reply if it exceeds the maximum message size. Currently, if a flow reply results in a message which exceeds the maximum reply size, it will assert OVS. This would happen when OVN uses OpenFlow15 to add large flows, and they get read using OpenFlow10 with ovs-ofctl. This patch prevents this and adds a test case to make sure the code behaves as expected. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-31 21:11:58 +01:00
Paolo Valerio	77967b53fe	conntrack: Check TCP state while testing established connections pick up. When testing if an established connection is picked up, it could be useful to verify that the protocol state matches the expectation, that is, it moves to ESTABLISHED, as there's a chance that code modifications may break the TCP conn_update() in a way that it returns CT_UPDATE_VALID without moving to the correct state leading to a false positive. Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-31 21:09:33 +01:00
Ilya Maximets	6e13565dd3	ovsdb: transaction: Keep one entry in the transaction history. If a single transaction exceeds the size of the whole database (e.g., a lot of rows got removed and new ones added), transaction history will be drained. This leads to sending UUID_ZERO to the clients as the last transaction id in the next monitor update, because monitor doesn't know what was the actual last transaction id. In case of a re-connect that will cause re-downloading of the whole database, since the client's last_id will be out of sync. One solution would be to store the last transaction ID separately from the actual transactions, but that will require a careful management in cases where database gets reset and the history needs to be cleared. Keeping the one last transaction instead to avoid the problem. That should not be a big concern in terms of memory consumption, because this last transaction will be removed from the history once the next transaction appeared. This is also not a concern for a fast re-sync, because this last transaction will not be used for the monitor reply; it's either client already has it, so no need to send, or it's a history miss. The test updated to not check the number of atoms if there is only one transaction in the history. Fixes: `317b1bfd7d` ("ovsdb: Don't let transaction history grow larger than the database.") Reported-at: https://bugzilla.redhat.com/2044621 Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-31 21:05:20 +01:00
Ilya Maximets	3a05c63702	ovsdb-cs: Fix ignoring of the last id from the initial monitor reply. Current code doesn't use the last id received in the monitor reply. That may result in re-downloading the database content if the re-connection happened after receiving the initial monitor reply, but before receiving any other database updates. Fixes: `1c337c43ac` ("ovsdb-idl: Break into two layers.") Reported-at: https://bugzilla.redhat.com/2044624 Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-28 23:44:45 +01:00
Eelco Chaudron	dadd8357f2	ofproto-dpif: Fix issue with non-reversible actions on a patch ports. For patch ports, the is_last_action value is not propagated and is always set to true. This causes non-reversible actions to modify the packet, and the original content is not preserved when processing the remaining actions. This patch propagates the is_last_action flag for patch port related actions. In addition, it also fixes a general last action propagation to the individual actions. Fixed check_pkt_larger as last action, as it is a valid case for the drop action, so it should not be skipped. Fixes: `feee58b95` ("ofproto-dpif-xlate: Keep track of the last action") Fixes: `5b34f8fc3` ("Add a new OVS action check_pkt_larger") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-25 22:41:42 +01:00
David Marchand	0a395a52d6	NEWS: Fix some typo. The experimantal typo got copy/paste a few times. Fixes: `be56e063d0` ("netdev-offload-dpdk: Support tunnel pop action.") Fixes: `e098c2f966` ("netdev-dpdk-offload: Add vxlan pattern matching function.") Fixes: `7617d0583c` ("netdev-offload-dpdk: Add support for matching on gre fields.") Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-21 20:54:55 +01:00
Antonin Bas	5b3bb16b84	ovs-monitor-ipsec: Fix generated strongSwan ipsec.conf for IPv6. Setting the local address to 0.0.0.0 (v4 address) while setting the remote address to a v6 address results in an invalid configuration. See https://github.com/strongswan/strongswan/discussions/821 Signed-off-by: Antonin Bas <antonin.bas@gmail.com> Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-21 18:45:23 +01:00
David Marchand	8723063c3c	system-dpdk: Fix MFEX logs check. Some warning logs must be waived when using the net/pcap DPDK driver. Those logs can affect different DPDK drivers (like mlx5) and the tests in system-dpdk are not testing MTU and Rx checksum, we might as well ignore those warnings from OVS. Fixes: `d446dcb7e0` ("system-dpdk: Refactor common logs matching.") Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2022-01-20 15:48:34 +00:00
Wilson Peng	0506efbd0a	datapath-windows: Pickup Ct tuple as CT lookup key in function OvsCtSetupLookupCtx CT marks which are loaded in non-first commit will be lost in ovs-windows.In linux OVS, the CT mark setting with same flow could be set successfully. Currenlty Ovs-windows will create one new CT with the flowKey(Extracted from the packet itself) If the packet is already done DNAT action after the 1st round flow processing. So the ct-mark Set on previous Conntrack will be lost.In the fix, it will make use of CT tuple src/dst address stored in the flowKey if the value is not zero and zone in the flowKey is same as the input zone. In the fix, it is also to adjust function OvsProcessDeferredActions to make it clear. //DNAT flow cookie=0x1040000000000, duration=950.326s, table=EndpointDNAT, n_packets=0, n_bytes=0, priority=200,tcp,reg3=0xc0a8fa2b,reg4=0x20050/0x7ffff actions=ct(commit,table=AntreaPolicyEgressRule,zone=65520,nat(dst=192.168.250.43:80),exec(load:0x1->NXM_NX_CT_MARK[2]) // Append ct_mark flow cookie=0x1000000000000, duration=11980.701s, table=SNATConntrackCommit, n_packets=6, n_bytes=396, priority=200,ct_state=+new+trk,ip,reg0=0x2 00/0x200,reg4=0/0xc00000 actions=load:0x3->NXM_NX_REG4[22..23],ct(commit,table=SNATConntrackCommit,zone=65520,exec(load:0x1->NXM_NX_CT_MARK[4 ],load:0x1->NXM_NX_CT_MARK[5])) // SNAT flow cookie=0x1000000000000, duration=11980.701s, table=SNATConntrackCommit, n_packets=6, n_bytes=396, priority=200,ct_state=+new+trk,ip,reg0=0x6 00/0x600,reg4=0xc00000/0xc00000 actions=ct(commit,table=L2Forwarding,zone=65521,nat(src=192.168.250.1),exec(load:0x1->NXM_NX_CT_MARK[2])) Reported-at:https://github.com/openvswitch/ovs-issues/issues/237 Signed-off-by: Wilson Peng <pweisong@vmware.com> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2022-01-20 02:55:15 +02:00
Ilya Maximets	c6f0b623e5	Prepare for post-2.17.0 (2.17.90). Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2022-01-19 02:33:05 +01:00
Ilya Maximets	280d8de05f	Prepare for 2.17.0. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2022-01-19 02:33:05 +01:00
Gaetan Rivet	f20abde5a2	netdev-dpdk: Remove rte-flow API access locks. The rte_flow DPDK API was made thread-safe [1] in release 20.11. Now that the DPDK offload provider in OVS is thread safe, remove the locks. [1]: http://mails.dpdk.org/archives/dev/2020-October/184251.html Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	b0b6b7b465	dpif-netdev: Use one or more offload threads. Read the user configuration in the netdev-offload module to modify the number of threads used to manage hardware offload requests. This allows processing insertion, deletion and modification concurrently. The offload thread structure was modified to contain all needed elements. This structure is multiplied by the number of requested threads and used separately. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	7daa503468	dpif-netdev: Replace port mutex by rwlock. The port mutex protects the netdev mapping, that can be changed by port addition or port deletion. HW offloads operations can be considered read operations on the port mapping itself. Use a rwlock to differentiate between read and write operations, allowing concurrent queries and offload insertions. Because offload queries, deletion, and reconfigure_datapath() calls are all rdlock, the deadlock fixed by [1] is still avoided, as the rdlock side is recursive as prescribed by the POSIX standard. Executing 'reconfigure_datapath()' only requires a rdlock taken, but it is sometimes executed in contexts where wrlock is taken ('do_add_port()' and 'do_del_port()'). This means that the deadlock described in [2] is still valid and should be mitigated. The rdlock is taken using 'tryrdlock()' during offload query, keeping the current behavior. [1]: `81e89d5c26` ("dpif-netdev: Make datapath port mutex recursive.") [2]: `12d0edd75e` ("dpif-netdev: Avoid deadlock with offloading during PMD thread deletion."). Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	d85b9230ac	dpif-netdev: Make megaflow and mark mappings thread objects. In later commits hardware offloads are managed in several threads. Each offload is managed by a thread determined by its flow's 'mega_ufid'. As megaflow to mark and mark to flow mappings are 1:1 and 1:N respectively, then a single mark exists for a single 'mega_ufid', and multiple flows uses the same 'mega_ufid'. Because the managing thread will be chosen using the 'mega_ufid', then each mapping does not need to be shared with other offload threads. The mappings are kept as cmap as upcalls will sometimes query them before enqueuing orders to the offload threads. To prepare this change, move the mappings within the offload thread structure. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	ec4ac62588	dpif-netdev: Use lockless queue to manage offloads. The dataplane threads (PMDs) send offloading commands to a dedicated offload management thread. The current implementation uses a lock and benchmarks show a high contention on the queue in some cases. With high-contention, the mutex will more often lead to the locking thread yielding in wait, using a syscall. This should be avoided in a userland dataplane. The mpsc-queue can be used instead. It uses less cycles and has lower latency. Benchmarks show better behavior as multiple revalidators and one or multiple PMDs writes to a single queue while another thread polls it. One trade-off with the new scheme however is to be forced to poll the queue from the offload thread. Without mutex, a cond_wait cannot be used for signaling. The offload thread is implementing an exponential backoff and will sleep in short increments when no data is available. This makes the thread yield, at the price of some latency to manage offloads after an inactivity period. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	b3e029f7c1	netdev-offload-dpdk: Protect concurrent offload destroy/query. The rte_flow API in DPDK is now thread safe for insertion and deletion. It is not however safe for concurrent query while the offload is being inserted or deleted. Insertion is not an issue as the rte_flow handle will be published to other threads only once it has been inserted in the hardware, so the query will only be able to proceed once it is already available. For the deletion path however, offload status queries can be made while an offload is being destroyed. This would create race conditions and use-after-free if not properly protected. As a pre-step before removing the OVS-level locks on the rte_flow API, mutually exclude offload query and deletion from concurrent execution. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	54dcf60e6f	netdev-offload-dpdk: Lock rte_flow map access. Add a lock to access the ufid to rte_flow map. This will protect it from concurrent write accesses when multiple threads attempt it. At this point, the reason for taking the lock is not to fullfill the needs of the DPDK offload implementation anymore. Rewrite the comments to reflect this change. The lock is still needed to protect against changes to netdev port mapping. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	7851e602c0	netdev-offload-dpdk: Use per-thread HW offload stats. The implementation of hardware offload counters in currently meant to be managed by a single thread. Use the offload thread pool API to manage one counter per thread. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	5b0aa55776	dpif-netdev: Execute flush from offload thread. When a port is deleted, its offloads must be flushed. The operation runs in the thread that initiated it. Offload data is thus accessed jointly by the port deletion thread(s) and the offload thread, which complicates the data access model. To simplify this model, as a pre-step toward introducing parallel offloads, execute the flush operation in the offload thread. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	d68d2ed466	dpif-netdev: Introduce tagged union of offload requests. Offload requests are currently only supporting flow offloads. As a pre-step before supporting an offload flush request, modify the layout of an offload request item, to become a tagged union. Future offload types won't be forced to re-use the full flow offload structure, which consumes a lot of memory. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	73ecf098d2	dpif-netdev: Use id-fpool for mark allocation. Use the netdev-offload multithread API to allow multiple thread allocating marks concurrently. Initialize only once the pool in a multithread context by using the ovsthread_once type. Use the id-fpool module for faster concurrent ID allocation. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	528a8ab627	dpif-netdev: Postpone flow offload item freeing. Profiling the HW offload thread, the flow offload freeing takes approximatively 25% of the time. Most of this time is spent waiting on the futex used by the libc free(), as it triggers a syscall and reschedule the thread. Avoid the syscall and its expensive context switch. Batch the offload messages freeing using the RCU. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	55dc4ef176	dpif-netdev: Quiesce offload thread periodically. Similar to what was done for the PMD threads [1], reduce the performance impact of quiescing too often in the offload thread. After each processed offload, the offload thread currently quiesce and will sync with RCU. This synchronization can be lengthy and make the thread unnecessary slow. Instead attempt to quiesce every 10 ms at most. While the queue is empty, the offload thread remains quiescent. [1]: `81ac8b3b19` ("dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.") Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	62c2d8a675	netdev-offload: Add multi-thread API. Expose functions reporting user configuration of offloading threads, as well as utility functions for multithreading. This will only expose the configuration knob to the user, while no datapath will implement the multiple thread request. This will allow implementations to use this API for offload thread management in relevant layers before enabling the actual dataplane implementation. The offload thread ID is lazily allocated and can as such be in a different order than the offload thread start sequence. The RCU thread will sometime access hardware-offload objects from a provider for reclamation purposes. In such case, it will get a default offload thread ID of 0. Care must be taken that using this thread ID is safe concurrently with the offload threads. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-19 01:35:19 +01:00
Gaetan Rivet	2eac33c6cc	id-fpool: Module for fast ID generation. The current id-pool module is slow to allocate the next valid ID, and can be optimized when restricting some properties of the pool. Those restrictions are: * No ability to add a random ID to the pool. * A new ID is no more the smallest possible ID. It is however guaranteed to be in the range of [floor, last_alloc + nb_user * cache_size + 1]. where 'cache_size' is the number of ID in each per-user cache. It is defined as 'ID_FPOOL_CACHE_SIZE' to 64. * A user should never free an ID that is not allocated. No checks are done and doing so will duplicate the spurious ID. Refcounting or other memory management scheme should be used to ensure an object and its ID are only freed once. This allocator is designed to scale reasonably well in multithread setup. As it is aimed at being a faster replacement to the current id-pool, a benchmark has been implemented alongside unit tests. The benchmark is composed of 4 rounds: 'new', 'del', 'mix', and 'rnd'. Respectively + 'new': only allocate IDs + 'del': only free IDs + 'mix': allocate, sequential free, then allocate ID. + 'rnd': allocate, random free, allocate ID. Randomized freeing is done by swapping the latest allocated ID with any from the range of currently allocated ID, which is reminiscent of the Fisher-Yates shuffle. This evaluates freeing non-sequential IDs, which is the more natural use-case. For this specific round, the id-pool performance is such that a timeout of 10 seconds is added to the benchmark: $ ./tests/ovstest test-id-fpool benchmark 10000 1 Benchmarking n=10000 on 1 thread. type\thread: 1 Avg id-fpool new: 1 1 ms id-fpool del: 1 1 ms id-fpool mix: 2 2 ms id-fpool rnd: 2 2 ms id-pool new: 4 4 ms id-pool del: 2 2 ms id-pool mix: 6 6 ms id-pool rnd: 431 431 ms $ ./tests/ovstest test-id-fpool benchmark 100000 1 Benchmarking n=100000 on 1 thread. type\thread: 1 Avg id-fpool new: 2 2 ms id-fpool del: 2 2 ms id-fpool mix: 3 3 ms id-fpool rnd: 4 4 ms id-pool new: 12 12 ms id-pool del: 5 5 ms id-pool mix: 16 16 ms id-pool rnd: 10000+ -1 ms $ ./tests/ovstest test-id-fpool benchmark 1000000 1 Benchmarking n=1000000 on 1 thread. type\thread: 1 Avg id-fpool new: 15 15 ms id-fpool del: 12 12 ms id-fpool mix: 34 34 ms id-fpool rnd: 48 48 ms id-pool new: 276 276 ms id-pool del: 286 286 ms id-pool mix: 448 448 ms id-pool rnd: 10000+ -1 ms Running only a performance test on the fast pool: $ ./tests/ovstest test-id-fpool perf 1000000 1 Benchmarking n=1000000 on 1 thread. type\thread: 1 Avg id-fpool new: 15 15 ms id-fpool del: 12 12 ms id-fpool mix: 34 34 ms id-fpool rnd: 47 47 ms $ ./tests/ovstest test-id-fpool perf 1000000 2 Benchmarking n=1000000 on 2 threads. type\thread: 1 2 Avg id-fpool new: 11 11 11 ms id-fpool del: 10 10 10 ms id-fpool mix: 24 24 24 ms id-fpool rnd: 30 30 30 ms $ ./tests/ovstest test-id-fpool perf 1000000 4 Benchmarking n=1000000 on 4 threads. type\thread: 1 2 3 4 Avg id-fpool new: 9 11 11 10 10 ms id-fpool del: 5 6 6 5 5 ms id-fpool mix: 16 16 16 16 16 ms id-fpool rnd: 20 20 20 20 20 ms Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 19:30:17 +01:00
Gaetan Rivet	5396ba5b21	mpsc-queue: Module for lock-free message passing. Add a lockless multi-producer/single-consumer (MPSC), linked-list based, intrusive, unbounded queue that does not require deferred memory management. The queue is designed to improve the specific MPSC setup. A benchmark accompanies the unit tests to measure the difference in this configuration. A single reader thread polls the queue while N writers enqueue elements as fast as possible. The mpsc-queue is compared against the regular ovs-list as well as the guarded list. The latter usually offers a slight improvement by batching the element removal, however the mpsc-queue is faster. The average is of each producer threads time: $ ./tests/ovstest test-mpsc-queue benchmark 3000000 1 Benchmarking n=3000000 on 1 + 1 threads. type\thread: Reader 1 Avg mpsc-queue: 167 167 167 ms list(spin): 89 80 80 ms list(mutex): 745 745 745 ms guarded list: 788 788 788 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 2 Benchmarking n=3000000 on 1 + 2 threads. type\thread: Reader 1 2 Avg mpsc-queue: 98 97 94 95 ms list(spin): 185 171 173 172 ms list(mutex): 203 199 203 201 ms guarded list: 269 269 188 228 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 3 Benchmarking n=3000000 on 1 + 3 threads. type\thread: Reader 1 2 3 Avg mpsc-queue: 76 76 65 76 72 ms list(spin): 246 110 240 238 196 ms list(mutex): 542 541 541 539 540 ms guarded list: 535 535 507 511 517 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 4 Benchmarking n=3000000 on 1 + 4 threads. type\thread: Reader 1 2 3 4 Avg mpsc-queue: 73 68 68 68 68 68 ms list(spin): 294 275 279 277 282 278 ms list(mutex): 346 309 287 345 302 310 ms guarded list: 378 319 334 378 351 345 ms Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 19:30:17 +01:00
Gaetan Rivet	5878b92522	ovs-atomic: Expose atomic exchange operation. The atomic exchange operation is a useful primitive that should be available as well. Most compilers already expose or offer a way to use it, but a single symbol needs to be defined. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 19:30:17 +01:00
Gaetan Rivet	83823ae328	dpif-netdev: Implement hardware offloads stats query. In the netdev datapath, keep track of the enqueued offloads between the PMDs and the offload thread. Additionally, query each netdev for their hardware offload counters. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	9ac3d951b4	mov-avg: Add a moving average helper structure. Add a new module offering a helper to compute the Cumulative Moving Average (CMA) and the Exponential Moving Average (EMA) of a series of values. Use the new helpers to add latency metrics in dpif-netdev. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	e4543c7b17	dpif-netdev: Rename offload thread structure. The offload management in userspace is done through a separate thread. The naming of the structure holding the objects used for synchronization with the dataplane is generic and nondescript. Clarify the object function by renaming it. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	9ab104718b	dpctl: Add function to read hardware offload statistics. Expose a function to query datapath offload statistics. This function is separate from the current one in netdev-offload as it exposes more detailed statistics from the datapath, instead of only from the netdev-offload provider. Each datapath is meant to use the custom counters as it sees fit for its handling of hardware offloads. Call the new API from dpctl. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	0e6366c239	netdev-offload-dpdk: Implement hw-offload statistics read. In the DPDK offload provider, keep track of inserted rte_flow and report it when queried. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	adbd4301a2	netdev-offload-dpdk: Use per-netdev offload metadata. Add a per-netdev offload data field as part of netdev hw_info structure. Use this field in netdev-offload-dpdk to map offload metadata (ufid to rte_flow). Use flow API deinit ops to destroy the per-netdev metadata when deallocating a netdev. Use RCU primitives to ensure coherency during port deletion. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	1088f4e7fb	netdev: Add flow API uninit function. Add a new operation for flow API providers to uninitialize when the API is disassociated from a netdev. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	aec1081c7d	tests: Add ovs-barrier unit test. No unit test exist currently for the ovs-barrier type. It is however crucial as a building block and should be verified to work as expected. Create a simple test verifying the basic function of ovs-barrier. Integrate the test as part of the test suite. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	59b8f9f8f4	dpif-netdev: Rename flow offload thread. ovs_strlcpy silently fails to copy the thread name if it is too long. Rename the flow offload thread to differentiate it from the main thread. Fixes: `02bb2824e5` ("dpif-netdev: do hw flow offload in a thread") Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Gaetan Rivet	6207205e58	ovs-thread: Fix barrier use-after-free. When a thread is blocked on a barrier, there is no guarantee regarding the moment it will resume, only that it will at some point in the future. One thread can resume first then proceed to destroy the barrier while another thread has not yet awoken. When it finally happens, the second thread will attempt a seq_read() on the barrier seq, while the first thread have already destroyed it, triggering a use-after-free. Introduce an additional indirection layer within the barrier. A internal barrier implementation holds all the necessary elements for a thread to safely block and destroy. Whenever a barrier is destroyed, the internal implementation is left available to still blocking threads if necessary. A reference counter is used to track threads still using the implementation. Note that current uses of ovs-barrier are not affected: RCU and revalidators will not destroy their barrier immediately after blocking on it. Fixes: `d8043da718` ("ovs-thread: Implement OVS specific barrier.") Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
Kevin Traynor	1b9fd884f5	Documentation: Remove experimental tag for PMD ALB. PMD Auto Load Balance was introduced as an experimental feature in OVS 2.11. It is used to detect that the Rx queue to PMD assignments are no longer balanced and it would be better to reassign. It is disabled by default, and can be enabled with: $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true" Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 13:00:09 +01:00
Kevin Traynor	09192a815e	Documentation: Update PMD Auto Load Balance section. Updates to the PMD Auto Load Balance section to make it more readable. No change to the core content. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 13:00:09 +01:00
Kevin Traynor	5cc0524351	Documentation: Update PMD thread statistics. 'pmd-perf-show' gives some extra information and has nicer formatting than 'pmd-stats-show'. Let the user know they can use that as well to get PMD stats. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 12:26:36 +01:00
Kevin Traynor	f0adea3fce	Documentation: Minor spelling and grammar fixes. Some minor spelling and grammar fixes in pmd.rst. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 12:26:36 +01:00
Kevin Traynor	4da71121da	Documentation: Fix Rx/Tx queue configuration section. ovs-vsctl is used to configure physical Rx queues, not ovs-appctl. Number of Tx queues are configured differently depending on whether physical or virtual. Present documentation does not distinguish. Fixes: `31d0dae22a` ("doc: Add "PMD" topic document") Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 12:26:36 +01:00
Eelco Chaudron	85d3785e6a	utilities: Add netlink flow operation USDT probes and upcall_cost script. This patch adds a series of NetLink flow operation USDT probes. These probes are in turn used in the upcall_cost Python script, which in addition of some kernel tracepoints, give an insight into the time spent on processing upcall. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 00:46:30 +01:00
Eelco Chaudron	51ec98635e	utilities: Add upcall USDT probe and associated script. Added the dpif_recv:recv_upcall USDT probe, which is used by the included upcall_monitor.py script. This script receives all upcall packets sent by the kernel to ovs-vswitchd. By default, it will show all upcall events, which looks something like this: TIME CPU COMM PID DPIF_NAME TYPE PKT_LEN FLOW_KEY_LEN 5952147.003848809 2 handler4 1381158 system@ovs-system 0 98 132 5952147.003879643 2 handler4 1381158 system@ovs-system 0 70 160 5952147.003914924 2 handler4 1381158 system@ovs-system 0 98 152 It can also dump the packet and NetLink content, and if required, the packets can also be written to a pcap file. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 00:46:30 +01:00
Eelco Chaudron	ff4c712d45	Documentation: Add USDT documentation and bpftrace example. Add the USDT documentation and a bpftrace example using the bridge run USDT probes. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 00:46:30 +01:00
Eelco Chaudron	512fab8f21	openvswitch: Define the OVS_STATIC_TRACE() macro. This patch defines the OVS_STATIC_TRACE() macro, and as an example, adds two of them in the bridge run loop. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 00:46:30 +01:00
Eelco Chaudron	191013cae9	configure: Add --enable-usdt-probes option to enable USDT probes. Allow inclusion of User Statically Defined Trace (USDT) probes in the OVS binaries using the --enable-usdt-probes option to the ./configure script. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 00:46:07 +01:00
Maxime Coquelin	844f141814	dpif-netdev.at: Add test for Tx packet steering. This patch introduces a new test for Tx packet steering modes. First test validates the static mode, by checking that all packets are transmitted on a single queue (single PMD thread), then it tests the same with enabling hash based packet steering, ensuring packets are transmitted on both queues. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-17 18:07:00 +01:00
Maxime Coquelin	c18e707b2f	dpif-netdev: Introduce hash-based Tx packet steering mode. This patch adds a new hash Tx steering mode that distributes the traffic on all the Tx queues, whatever the number of PMD threads. It would be useful for guests expecting traffic to be distributed on all the vCPUs. The idea here is to re-use the 5-tuple hash of the packets, already computed to build the flows batches (and so it does not provide flexibility on which fields are part of the hash). There are also no user-configurable indirection table, given the feature is transparent to the guest. The queue selection is just a modulo operation between the packet hash and the number of Tx queues. There are no (at least intentionnally) functionnal changes for the existing XPS and static modes. There should not be noticeable performance changes for these modes (only one more branch in the hot path). For the hash mode, performance could be impacted due to locking when multiple PMD threads are in use (same as XPS mode) and also because of the second level of batching. Regarding the batching, the existing Tx port output_pkts is not modified. It means that at maximum, NETDEV_MAX_BURST can be batched for all the Tx queues. A second level of batching is done in dp_netdev_pmd_flush_output_on_port(), only for this hash mode. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-17 18:07:00 +01:00

... 7 8 9 10 11 ...

19352 Commits