mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-31 06:15:47 +00:00

Author	SHA1	Message	Date
Frode Nordahl	36bad31829	json: Add yielding json create/destroy functions. Creating and destroying JSON objects may be time consuming. Add json_serialized_object_create_with_yield() and json_destroy_with_yield() functions that make use of the cooperative multitasking module to yield during processing, allowing time sensitive tasks in other parts of the program to be completed during processing. We keep these new functions private to OVS by adding a new lib/json.h header file. The include guard in the public include/openvswitch/json.h is updated to contain the OPENVSWITCH prefix to be in line with the other public header files, allowing us to use the non-prefixed version in our private lib/json.h. Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-01-17 14:41:18 +01:00
Frode Nordahl	3c8a4e942d	lib: Introduce cooperative multitasking module. One of the goals of Open vSwitch is to be as resource efficient as possible. Core parts of the program has been implemented as asynchronous state machines, and when absolutely necessary additional threads are used. Introduce cooperative multitasking module which allow us to interleave important processing with long running tasks while avoiding the additional resource consumption of threads and complexity of asynchronous state machines. We will use this module to ensure long running processing in the OVSDB server does not interfere with stable maintenance of the RAFT cluster in subsequent patches. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-01-17 14:41:18 +01:00
Flavio Leitner	8b5fe2dc60	userspace: Add Generic Segmentation Offloading. This provides a software implementation in the case the egress netdev doesn't support segmentation in hardware. The challenge here is to guarantee packet ordering in the original batch that may be full of TSO packets. Each TSO packet can go up to ~64kB, so with segment size of 1440 that means about 44 packets for each TSO. Each batch has 32 packets, so the total batch amounts to 1408 normal packets. The segmentation estimates the total number of packets and then the total number of batches. Then allocate enough memory and finally do the work. Finally each batch is sent in order to the netdev. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Co-authored-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Mike Pattrick <mkp@redhat.com> Acked-by: Simon Horman <horms@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-12-02 01:33:37 +01:00
Ilya Maximets	723cd4c9be	automake: Move build-aux EXTRA_DIST updates to their own file. Otherwise it's hard to keep track of all the scripts we have. Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-By: Ihar Hrachyshka <ihar@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-10-31 19:34:44 +01:00
Ilya Maximets	1776aa17a9	sflow: Always enable _BSD_SOURCE. sFlow library is using BSD-style types like u_char that require _BSD_SOURCE to be defined. Also adding _DEFAULT_SOURCE, because _BSD_SOURCE cannot be used without it with glibc > 2.19: error: "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-08-23 13:46:14 +02:00
Ales Musil	a9ae73b916	ofp, dpif: Allow CT flush based on partial match. Currently, the CT can be flushed by dpctl only by specifying the whole 5-tuple. This is not very convenient when there are only some fields known to the user of CT flush. Add new struct ofp_ct_match which represents the generic filtering that can be done for CT flush. The match is done only on fields that are non-zero with exception to the icmp fields. This allows the filtering just within dpctl, however it is a preparation for OpenFlow extension. Reported-at: https://bugzilla.redhat.com/2120546 Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-16 19:55:21 +01:00
Ilya Maximets	1dcc490d44	netdev-afxdp: Allow building with libxdp and newer libbpf. AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp. Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8 and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach(). Updating configuration and source code to accommodate above changes and allow building OVS with AF_XDP support on newer systems: - Checking the version of libbpf by detecting availability of bpf_xdp_detach. - Checking availability of the libxdp in a system by looking for a library providing libxdp_strerror(), if libbpf is newer than 0.6. And checking for xsk.h header provided by libxdp-dev[el]. - Use xsk.h from libbpf if it is older than 0.7 and not linking with libxdp in this case as there are known incompatible versions of libxdp in distributions. - Check for the NEED_WAKEUP feature replaced with direct checking in the source code if XDP_USE_NEED_WAKEUP is defined. - Checking availability of bpf_xdp_query_id and bpf_xdp_detach and using them instead of deprecated APIs. Fall back to old functions if not found. - Dropped LIBBPF_LDADD variable as it makes library and function detection much harder without providing any actual benefits. AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS. - Header includes moved around to files where they are actually used. - Removed libelf dependency as it is not really used. With these changes it should be possible to build OVS with either: - libbpf built from the kernel sources (5.19 or older). - libbpf < 0.7 provided in distributions. - libxdp and libbpf >= 0.7 provided in newer distributions. While it is technically possible to build with libbpf 0.7+ without libxdp at the moment we're not allowing that for a few reasons. First, required functions in libbpf are deprecated and can be removed in future releases. Second, support for all these combinations makes the detection code fairly complex. AFAIK, most of the distributions packaging libbpf 0.7+ do package libxdp as well. libxdp added as a build dependency for Fedora build since all supported versions of Fedora are packaging this library. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2023-01-03 16:06:21 +01:00
Dumitru Ceara	5a686267d3	lib: Add support for sets of UUIDs. Part of the uuidset implementation is taken from the OVN codebase where it was added via commit 0e77b3bcbfe2 ("ovn-northd-ddlog: New implementation of ovn-northd based on ddlog."). We now extend that, adding a few helpers and tests. Co-authored-by: Leonid Ryzhyk <lryzhyk@vmware.com> Signed-off-by: Leonid Ryzhyk <lryzhyk@vmware.com> Co-authored-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Justin Pettit <jpettit@ovn.org> Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Reviewed-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-09-27 01:17:56 +02:00
Emma Finn	398f80fffc	odp-execute: Add ISA implementation of pop_vlan action. This commit adds the AVX512 implementation of the pop_vlan action. Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2022-07-15 11:40:37 +01:00
Emma Finn	1713fc0116	odp-execute: Add command to switch action implementation. This commit adds a new command to allow the user to switch the active action implementation at runtime. Usage: $ ovs-appctl odp-execute/action-impl-set scalar This commit also adds a new command to retrieve the list of available action implementations. This can be used by to check what implementations of actions are available and what implementation is active during runtime. Usage: $ ovs-appctl odp-execute/action-impl-show Added separate test-case for ovs-actions show/set commands: odp-execute - actions implementation Signed-off-by: Emma Finn <emma.finn@intel.com> Signed-off-by: Kumar Amber <kumar.amber@intel.com> Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com> Co-authored-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2022-07-15 11:39:20 +01:00
Emma Finn	95e4a35b0a	odp-execute: Add function pointers to odp-execute for different action implementations. This commit introduces the initial infrastructure required to allow different implementations for OvS actions. The patch introduces action function pointers which allows user to switch between different action implementations available. This will allow for more performance and flexibility so the user can choose the action implementation to best suite their use case. Signed-off-by: Emma Finn <emma.finn@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2022-07-15 11:38:16 +01:00
Kumar Amber	8cab30a9d2	dpif-netdev/mfex: Add AVX512 ipv6 traffic profiles. Add AVX512 Ipv6 optimized profile for vlan/IPv6/UDP and vlan/IPv6/TCP, IPv6/UDP and IPv6/TCP. MFEX autovalidaton test-case already has the IPv6 support for validating against the scalar mfex. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2022-07-05 15:41:27 +01:00
David Marchand	fe171e4f10	dpif-netdev: Refactor AVX512 runtime checks. As described in the bugzilla below, cpu_has_isa code may be compiled with some AVX512 instructions in it, because cpu.c is built as part of the libopenvswitchavx512. This is a problem when this function (supposed to probe for AVX512 instructions availability) is invoked from generic OVS code, on older CPUs that don't support them. For the same reason, dpcls_subtable_avx512_gather_probe, dp_netdev_input_outer_avx512_probe, mfex_avx512_probe and mfex_avx512_vbmi_probe are potential runtime bombs and can't either be built as part of libopenvswitchavx512. Move cpu.c to be part of the "normal" libopenvswitch. And move other helpers in generic OVS code. Note: - dpcls_subtable_avx512_gather_probe is split in two, because it also needs to do its own magic, - while moving those helpers, prefer direct calls to cpu_has_isa and avoid cast to intermediate integer variables when a simple boolean is enough, Fixes: `352b6c7116` ("dpif-lookup: add avx512 gather implementation.") Fixes: `abb807e27d` ("dpif-netdev: Add command to switch dpif implementation.") Fixes: `250ceddcc2` ("dpif-netdev/mfex: Add AVX512 based optimized miniflow extract") Fixes: `b366fa2f49` ("dpif-netdev: Call cpuid for x86 isa availability.") Reported-at: https://bugzilla.redhat.com/2100393 Reported-by: Ales Musil <amusil@redhat.com> Co-authored-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ales Musil <amusil@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Ales Musil <amusil@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-06-29 11:27:09 +02:00
Cian Ferriter	cb1c640077	acinclude: Add seperate checks for AVX512 ISA. Checking for each of the required AVX512 ISA separately will allow the compiler to generate some AVX512 code where there is some support in the compiler rather than only generating all AVX512 code when all of it is supported or no AVX512 code at all. For example, in GCC 4.9 where there is just support for AVX512F, this patch will allow building the AVX512 DPIF. Another example, in GCC 5 and 6, most AVX512 code can be generated, just without AVX512VPOPCNTDQ support. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Cian Ferriter	fb85ae4340	automake.mk: Remove -mavx512dq CFLAG from AVX512 library. No instructions from the AVX512DQ ISA are used anywhere in OVS. Remove this unnecessary CFLAG. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-05-30 23:12:51 +02:00
Gaetan Rivet	2eac33c6cc	id-fpool: Module for fast ID generation. The current id-pool module is slow to allocate the next valid ID, and can be optimized when restricting some properties of the pool. Those restrictions are: * No ability to add a random ID to the pool. * A new ID is no more the smallest possible ID. It is however guaranteed to be in the range of [floor, last_alloc + nb_user * cache_size + 1]. where 'cache_size' is the number of ID in each per-user cache. It is defined as 'ID_FPOOL_CACHE_SIZE' to 64. * A user should never free an ID that is not allocated. No checks are done and doing so will duplicate the spurious ID. Refcounting or other memory management scheme should be used to ensure an object and its ID are only freed once. This allocator is designed to scale reasonably well in multithread setup. As it is aimed at being a faster replacement to the current id-pool, a benchmark has been implemented alongside unit tests. The benchmark is composed of 4 rounds: 'new', 'del', 'mix', and 'rnd'. Respectively + 'new': only allocate IDs + 'del': only free IDs + 'mix': allocate, sequential free, then allocate ID. + 'rnd': allocate, random free, allocate ID. Randomized freeing is done by swapping the latest allocated ID with any from the range of currently allocated ID, which is reminiscent of the Fisher-Yates shuffle. This evaluates freeing non-sequential IDs, which is the more natural use-case. For this specific round, the id-pool performance is such that a timeout of 10 seconds is added to the benchmark: $ ./tests/ovstest test-id-fpool benchmark 10000 1 Benchmarking n=10000 on 1 thread. type\thread: 1 Avg id-fpool new: 1 1 ms id-fpool del: 1 1 ms id-fpool mix: 2 2 ms id-fpool rnd: 2 2 ms id-pool new: 4 4 ms id-pool del: 2 2 ms id-pool mix: 6 6 ms id-pool rnd: 431 431 ms $ ./tests/ovstest test-id-fpool benchmark 100000 1 Benchmarking n=100000 on 1 thread. type\thread: 1 Avg id-fpool new: 2 2 ms id-fpool del: 2 2 ms id-fpool mix: 3 3 ms id-fpool rnd: 4 4 ms id-pool new: 12 12 ms id-pool del: 5 5 ms id-pool mix: 16 16 ms id-pool rnd: 10000+ -1 ms $ ./tests/ovstest test-id-fpool benchmark 1000000 1 Benchmarking n=1000000 on 1 thread. type\thread: 1 Avg id-fpool new: 15 15 ms id-fpool del: 12 12 ms id-fpool mix: 34 34 ms id-fpool rnd: 48 48 ms id-pool new: 276 276 ms id-pool del: 286 286 ms id-pool mix: 448 448 ms id-pool rnd: 10000+ -1 ms Running only a performance test on the fast pool: $ ./tests/ovstest test-id-fpool perf 1000000 1 Benchmarking n=1000000 on 1 thread. type\thread: 1 Avg id-fpool new: 15 15 ms id-fpool del: 12 12 ms id-fpool mix: 34 34 ms id-fpool rnd: 47 47 ms $ ./tests/ovstest test-id-fpool perf 1000000 2 Benchmarking n=1000000 on 2 threads. type\thread: 1 2 Avg id-fpool new: 11 11 11 ms id-fpool del: 10 10 10 ms id-fpool mix: 24 24 24 ms id-fpool rnd: 30 30 30 ms $ ./tests/ovstest test-id-fpool perf 1000000 4 Benchmarking n=1000000 on 4 threads. type\thread: 1 2 3 4 Avg id-fpool new: 9 11 11 10 10 ms id-fpool del: 5 6 6 5 5 ms id-fpool mix: 16 16 16 16 16 ms id-fpool rnd: 20 20 20 20 20 ms Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 19:30:17 +01:00
Gaetan Rivet	5396ba5b21	mpsc-queue: Module for lock-free message passing. Add a lockless multi-producer/single-consumer (MPSC), linked-list based, intrusive, unbounded queue that does not require deferred memory management. The queue is designed to improve the specific MPSC setup. A benchmark accompanies the unit tests to measure the difference in this configuration. A single reader thread polls the queue while N writers enqueue elements as fast as possible. The mpsc-queue is compared against the regular ovs-list as well as the guarded list. The latter usually offers a slight improvement by batching the element removal, however the mpsc-queue is faster. The average is of each producer threads time: $ ./tests/ovstest test-mpsc-queue benchmark 3000000 1 Benchmarking n=3000000 on 1 + 1 threads. type\thread: Reader 1 Avg mpsc-queue: 167 167 167 ms list(spin): 89 80 80 ms list(mutex): 745 745 745 ms guarded list: 788 788 788 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 2 Benchmarking n=3000000 on 1 + 2 threads. type\thread: Reader 1 2 Avg mpsc-queue: 98 97 94 95 ms list(spin): 185 171 173 172 ms list(mutex): 203 199 203 201 ms guarded list: 269 269 188 228 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 3 Benchmarking n=3000000 on 1 + 3 threads. type\thread: Reader 1 2 3 Avg mpsc-queue: 76 76 65 76 72 ms list(spin): 246 110 240 238 196 ms list(mutex): 542 541 541 539 540 ms guarded list: 535 535 507 511 517 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 4 Benchmarking n=3000000 on 1 + 4 threads. type\thread: Reader 1 2 3 4 Avg mpsc-queue: 73 68 68 68 68 68 ms list(spin): 294 275 279 277 282 278 ms list(mutex): 346 309 287 345 302 310 ms guarded list: 378 319 334 378 351 345 ms Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 19:30:17 +01:00
Gaetan Rivet	9ac3d951b4	mov-avg: Add a moving average helper structure. Add a new module offering a helper to compute the Cumulative Moving Average (CMA) and the Exponential Moving Average (EMA) of a series of values. Use the new helpers to add latency metrics in dpif-netdev. Signed-off-by: Gaetan Rivet <grive@u256.net> Reviewed-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-18 15:12:01 +01:00
David Marchand	b366fa2f49	dpif-netdev: Call cpuid for x86 isa availability. DPIF AVX512 optimizations currently rely on DPDK availability while they can be used without DPDK. Besides, checking for availability of some isa only has to be done once and won't change while a OVS process runs. Resolve isa availability in constructors by using a simplified query based on cpuid API that comes from the compiler. Note: this also fixes the check on BMI2 availability: DPDK had a bug for this isa, see https://git.dpdk.org/dpdk/commit/?id=aae3037ab1e0. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2022-01-03 18:45:40 +01:00
Ilya Maximets	c2fb5bdae6	ovs-actions: Convert man page from xml to rST. This way it's easier to show it on a website as it will be updated automatically along with the rest of the documentation. Sphinx doesn't render everything perfectly, but it looks good enough in both man and html versions. rST is a bit easier to read and it takes less space. Conversion performed manually since I didn't found any good tool that can actually make the process any faster. Along the way I replaced versions like x.y.90 with x.y+1, because it doesn't seem correct to me to refer non-released versions of OVS in the docs. Fixed a couple of small mistakes like duplicated paragraph and reference to a different section by incorrect name. Also removed bits of xml->nroff conversion code that is not needed anymore. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Roi Dayan <roid@nvidia.com>	2021-08-31 20:56:47 +02:00
Mark Gray	b1e517bd2f	dpif-netlink: Introduce per-cpu upcall dispatch. The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is correspondingly a large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. Reported-at: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
Harry van Haaren	250ceddcc2	dpif-netdev/mfex: Add AVX512 based optimized miniflow extract This commit adds AVX512 implementations of miniflow extract. By using the 64 bytes available in an AVX512 register, it is possible to convert a packet to a miniflow data-structure in a small quantity instructions. The implementation here probes for Ether()/IP()/UDP() traffic, and builds the appropriate miniflow data-structure for packets that match the probe. The implementation here is auto-validated by the miniflow extract autovalidator, hence its correctness can be easily tested and verified. Note that this commit is designed to easily allow addition of new traffic profiles in a scalable way, without code duplication for each traffic profile. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:42 +01:00
Kumar Amber	72dd22a0df	dpif-netdev: Add study function to select the best mfex function The study function runs all the available implementations of miniflow_extract and makes a choice whose hitmask has maximum hits and sets the mfex to that function. Study can be run at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set study Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 10:29:52 +01:00
Kumar Amber	3d8f47bc04	dpif-netdev: Add command line and function pointer for miniflow extract This patch introduces the MFEX function pointers which allows the user to switch between different miniflow extract implementations which are provided by the OVS based on optimized ISA CPU. The user can query for the available minflow extract variants available for that CPU by following commands: $ovs-appctl dpif-netdev/miniflow-parser-get Similarly an user can set the miniflow implementation by the following command : $ ovs-appctl dpif-netdev/miniflow-parser-set name This allows for more performance and flexibility to the user to choose the miniflow implementation according to the needs. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 09:56:58 +01:00
Harry van Haaren	abb807e27d	dpif-netdev: Add command to switch dpif implementation. This commit adds a new command to allow the user to switch the active DPIF implementation at runtime. A probe function is executed before switching the DPIF implementation, to ensure the CPU is capable of running the ISA required. For example, the below code will switch to the AVX512 enabled DPIF assuming that the runtime CPU is capable of running AVX512 instructions: $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512 A new configuration flag is added to allow selection of the default DPIF. This is useful for running the unit-tests against the available DPIF implementations, without modifying each unit test. The design of the testing & validation for ISA optimized DPIF implementations is based around the work already upstream for DPCLS. Note however that a DPCLS lookup has no state or side-effects, allowing the auto-validator implementation to perform multiple lookups and provide consistent statistic counters. The DPIF component does have state, so running two implementations in parallel and comparing output is not a valid testing method, as there are changes in DPIF statistic counters (side effects). As a result, the DPIF is tested directly against the unit-tests. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:24 +01:00
Harry van Haaren	9ac84a1a36	dpif-avx512: Add ISA implementation of dpif. This commit adds the AVX512 implementation of DPIF functionality, specifically the dp_netdev_input_outer_avx512 function. This function only handles outer (no re-circulations), and is optimized to use the AVX512 ISA for packet batching and other DPIF work. Sparse is not able to handle the AVX512 intrinsics, causing compile time failures, so it is disabled for this file. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Co-authored-by: Kumar Amber <kumar.amber@intel.com> Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:12 +01:00
Harry van Haaren	5930dfeeb2	dpif-netdev: Refactor to multiple header files. Split the very large file dpif-netdev.c and the datastructures it contains into multiple header files. Each header file is responsible for the datastructures of that component. This logical split allows better reuse and modularity of the code, and reduces the very large file dpif-netdev.c to be more managable. Due to dependencies between components, it is not possible to move component in smaller granularities than this patch. To explain the dependencies better, eg: DPCLS has no deps (from dpif-netdev.c file) FLOW depends on DPCLS (struct dpcls_rule) DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key) THREAD depends on DFC (struct dfc_cache) DFC_PROC depends on THREAD (struct pmd_thread) DPCLS lookup.h/c require only DPCLS DPCLS implementations require only dpif-netdev-lookup.h. - This change was made in 2.12 release with function pointers - This commit only refactors the name to "private-dpcls.h" netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf(). Rename functions specific to dpcls from netdev_* namespace to the dpcls_* namespace, as they are only used by dpcls code. 'inline' is added to the dp_netdev_flow_hash() when it is moved definition to fix a compiler error. One valid checkpatch issue with the use of the EMC_FOR_EACH_POS_WITH_HASH() macro was fixed. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:12:45 +01:00
Jaime Caamaño Ruiz	0c09952382	stream-ssl: Remove unsafe 1024 bit dh params Using 1024 bit params for DH is considered unsafe [1]. Additionally, from [2]: "Modern servers that do not support export ciphersuites are advised to either use SSL_CTX_set_tmp_dh() or alternatively, use the callback but ignore keylength and is_export and simply supply at least 2048-bit parameters in the callback." Additionally, using 1024 bit dh params may block clients running on recent openssl version from connecting given the stricter default security requirements of those new openssl versions. The error message for these clients looks like: error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small:ssl/statem/statem_clnt.c:2150 As a workaround, this error can be suppressed tweaking the cipher list (--ssl-ciphers) to either 'HIGH:!aNULL:!MD5:@SECLEVEL=1' to reduce security requirements or 'HIGH:!aNULL:!MD5:!DH' to avoid using fixed param DH based ciphers. The first option is recommended though as it likely a fixed param DH cipher is the best possible option in that situation. [1] https://weakdh.org/ [2] https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_tmp_dh_callback.html Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2021-07-07 13:25:32 -07:00
Ilya Maximets	fae1ae0434	stream: Add record/replay functionality. For debugging purposes it is useful to be able to record all the incoming transactions and commands and replay them locally under debugger or with additional logging enabled. This patch introduces ability to record all the incoming stream data and replay it via new stream provider named 'stream-replay'. During the record phase all the incoming stream data written to special replay_* files in the application rundir. On replay phase instead of opening real streams application will open replay_* files and read all the incoming data directly from them. If enabled for ovsdb-server, for example, this allows to record all the connections and transactions from the big setup and replay them locally afterwards to debug the behaviour or test performance. To start application in recording mode there is a --record cmdline option. --replay is to replay previously recorded streams. Current version doesn't work well with time-based stream events like inactivity probes or any other events generated internally. This is a point for further improvement. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-06-07 21:03:16 +02:00
Ilya Maximets	610ac1e82c	ovs-replay: New library to create and manage replay files. This library provides interfaces to open replay files and read/write records. Will be used later for stream record/replay functionality, i.e. to record all the incoming connections and data and replay it later for debugging and performance analysis purposes. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-06-07 20:59:34 +02:00
Aidan Shribman	23f9ec9eb0	make: don't prompt during build When running repeated builds using `make build` you get prompts in cases the `mv` command is about to overwrite a file which is write-protect. This patch forced the `mv` w/o prompting for approval. Signed-off-by: Aidan Shribman <aidan.shribman@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2021-04-08 08:57:37 -07:00
Ben Pfaff	a5c067a8b9	ovsdb-cs: New module that factors out code from ovsdb-idl. This new module has a single direct user now. In the future, it will also be used by OVN. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org>	2021-01-21 15:09:05 -08:00
Mark Gray	d409f50062	python: Update build system to ensure dirs.py is created. Update build system to ensure dirs.py is created when it is a dependency for a build target. Also, update setup.py to check for that dependency. Fixes: `943c4a3250` ("python: set ovs.dirs variables with build system values") Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-11-26 12:18:57 +01:00
Mark Gray	943c4a3250	python: set ovs.dirs variables with build system values ovs/dirs.py should be auto-generated using the template ovs/dirs.py.template at build time. This will set the ovs.dirs python variables with a value specified by the environment or, if the environment variable is not set, from the build system. Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-By: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-11-16 15:47:43 +00:00
Harry van Haaren	ba5e311782	dpif-netdev/avx512: add -fPIC flag to enable shared builds In certain scenarios with OVS built with --enable-shared and DPDK enabled as shared build too, Position Independant Code is required to link the avx512.a file into the relocatable .so that it must be linked into. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-08-05 15:57:34 +01:00
Harry van Haaren	4ed57c5028	dpif-netdev/avx512: avoid compiling avx512 code if binutils check fails This commit avoids compiling and linking of avx512 code into the vswitch_la library if the binutils check fails. This avoids compiling code into OVS that will not be executed due to binutils issue. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-08-05 15:57:24 +01:00
David Marchand	9af9dbcec9	dpdk: Add commands to configure log levels. Enabling debug logs in dpdk can be a challenge to be sure of what is actually enabled, add commands to list and change those log levels. However, these commands do not help when tracking issues in dpdk init itself: dump log levels right after init. Example: $ ovs-appctl dpdk/log-list global log level is debug id 0: lib.eal, level is info id 1: lib.malloc, level is info id 2: lib.ring, level is info id 3: lib.mempool, level is info id 4: lib.timer, level is info id 5: pmd, level is info [...] id 37: pmd.net.bnxt.driver, level is notice id 38: pmd.net.e1000.init, level is notice id 39: pmd.net.e1000.driver, level is notice id 40: pmd.net.enic, level is info [...] $ ovs-appctl dpdk/log-set debug pmd.*:notice $ ovs-appctl dpdk/log-list global log level is debug id 0: lib.eal, level is debug id 1: lib.malloc, level is debug id 2: lib.ring, level is debug id 3: lib.mempool, level is debug id 4: lib.timer, level is debug id 5: pmd, level is debug [...] id 37: pmd.net.bnxt.driver, level is notice id 38: pmd.net.e1000.init, level is notice id 39: pmd.net.e1000.driver, level is notice id 40: pmd.net.enic, level is notice [...] Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2020-07-17 01:56:56 +02:00
Harry van Haaren	352b6c7116	dpif-lookup: add avx512 gather implementation. This commit adds an AVX-512 dpcls lookup implementation. It uses the AVX-512 SIMD ISA to perform multiple miniflow operations in parallel. To run this implementation, the "avx512f" and "bmi2" ISAs are required. These ISA checks are performed at runtime while probing the subtable implementation. If a CPU does not provide both "avx512f" and "bmi2", then this code does not execute. The avx512 code is built as a separate static library, with added CFLAGS to enable the required ISA features. By building only this static library with avx512 enabled, it is ensured that the main OVS core library is not using avx512, and that OVS continues to run as before on CPUs that do not support avx512. The approach taken in this implementation is to use the gather instruction to access the packet miniflow, allowing any miniflow blocks to be loaded into an AVX-512 register. This maximizes the usefulness of the register, and hence this implementation handles any subtable with up to miniflow 8 bits. Note that specialization of these avx512 lookup routines still provides performance value, as the hashing of the resulting data is performed in scalar code, and compile-time loop unrolling occurs when specialized to miniflow bits. This commit checks at configure time if the assembling in use has a known bug in assembling AVX512 code. If this bug is present, all AVX512 code is disabled. Checking the version string of the binutils or assembler is not a good method to detect the issue, as back ported fixes would not be reflected. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-07-13 14:55:25 +01:00
Harry van Haaren	e90e115a01	dpif-netdev: implement subtable lookup validation. This commit refactors the existing dpif subtable function pointer infrastructure, and implements an autovalidator component. The refactoring of the existing dpcls subtable lookup function handling, making it more generic, and cleaning up how to enable more implementations in future. In order to ensure all implementations provide identical results, the autovalidator is added. The autovalidator itself implements the subtable lookup function prototype, but internally iterates over all other available implementations. The end result is that testing of each implementation becomes automatic, when the auto- validator implementation is selected. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-07-13 14:54:08 +01:00
William Tu	2078901a4c	userspace: Add conntrack timeout policy support. Commit `1f16131837` ("ct-dpif, dpif-netlink: Add conntrack timeout policy support") adds conntrack timeout policy for kernel datapath. This patch enables support for the userspace datapath. I tested using the 'make check-system-userspace' which checks the timeout policies for ICMP and UDP cases. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>	2020-05-01 08:22:45 -07:00
Flavio Leitner	29cf9c1b3b	userspace: Add TCP Segmentation Offload support Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2020-01-17 22:27:25 +00:00
Ilya Maximets	db54e96720	bridge: Allow manual notifications about interfaces' updates. Sometimes interface updates could happen in a way ifnotifier is not able to catch. For example some heavy operations (device reset) in netdev-dpdk could require re-applying of the bridge configuration. For this purpose new manual notifier introduced. Its function 'if_notifier_manual_report()' could be called directly by the code that aware about changes. This new notifier is thread-safe. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com>	2019-12-18 01:23:50 +01:00
William Tu	0de1b42596	netdev-afxdp: add new netdev type for AF_XDP. The patch introduces experimental AF_XDP support for OVS netdev. AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket type built upon the eBPF and XDP technology. It is aims to have comparable performance to DPDK but cooperate better with existing kernel's networking stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program attached to the netdev, by-passing a couple of Linux kernel's subsystems As a result, AF_XDP socket shows much better performance than AF_PACKET For more details about AF_XDP, please see linux kernel's Documentation/networking/af_xdp.rst. Note that by default, this feature is not compiled in. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>	2019-07-19 17:42:06 +03:00
Harry van Haaren	92c7c870d6	dpif-netdev: Split out generic lookup function This commit splits the generic hash-lookup-verify function to its own file, for cleaner seperation between optimized versions. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Malvika Gupta <malvika.gupta@arm.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-07-19 12:22:01 +01:00
Harry van Haaren	f5ace7cd8a	dpif-netdev: Move dpcls lookup structures to .h This commit moves some data-structures to be available in the dpif-netdev-private.h header. This allows specific implementations of the subtable lookup function to include just that header file, and not require that the code exists in dpif-netdev.c Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Malvika Gupta <malvika.gupta@arm.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-07-19 12:21:37 +01:00
Ilya Maximets	4f746d526d	netdev-offload: Rename offload providers. Flow API providers renamed to be consistent with parent module 'netdev-offload' and look more like each other. '_rte_' replaced with more convenient '_dpdk_'. We'll have following structure: Common code: lib/netdev-offload-provider.h lib/netdev-offload.c lib/netdev-offload.h Providers: lib/netdev-offload-tc.c lib/netdev-offload-dpdk.c 'netdev-offload-dummy' still resides inside netdev-dummy, but it makes no much sence to move it out of there. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	b6cabb8f8f	netdev: Split up netdev offloading to separate module. New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Ilya Maximets	5fc5c50f3d	netdev: Dynamic per-port Flow API. Current issues with Flow API: * OVS calls offloading functions regardless of successful flow API initialization. (ex. on init_flow_api failure) * Static initilaization of Flow API for a netdev_class forbids having different offloading types for different instances of netdev with the same netdev_class. (ex. different vports in 'system' and 'netdev' datapaths at the same time) Solution: * Move Flow API from the netdev_class to netdev instance. * Make Flow API dynamic, i.e. probe the APIs and choose the suitable one. Side effects: * Flow API providers localized as possible in their modules. * Now we have an ability to make runtime checks. For example, we could check if particular device supports features we need, like if dpdk device supports RSS+MARK action. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Roi Dayan <roid@mellanox.com>	2019-06-11 09:39:36 +03:00
Justin Pettit	e6fc4e2ce9	Remove ESX references. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2019-06-01 10:12:02 -07:00
Roni Bar Yanai	3d67b2d28e	netdev-dpdk: Move offloading code to a new file Hardware offloading code is moved to a new file called netdev-rte-offloads.c. The original offloading code is copied from the netdev-dpdk.c file to the new file, where future offloading code should be added as well. The copied code was refactored based on coding style. The netdev-dpdk.c file will remain unchanged as new offloading code is added. Co-authored-by: Ophir Munk <ophirmu@mellanox.com> Reviewed-by: Asaf Penso <asafp@mellanox.com> Signed-off-by: Roni Bar Yanai <roniba@mellanox.com> Signed-off-by: Ophir Munk <ophirmu@mellanox.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2019-03-19 14:33:27 +00:00

1 2 3 4 5 ...

360 Commits