mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

Author	SHA1	Message	Date
Ilya Maximets	43b7d960af	netdev-dummy: Silence the 'may be uninitialized' warning. GCC 11 with -O1 on Feodra 34 emits a false-positive warning like this: lib/netdev-dummy.c: In function ‘dummy_packet_stream_run’: lib/netdev-dummy.c:284:16: error: ‘n’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 284 \| if (retval == n && dp_packet_size(&s->rxbuf) > 2) { \| ^ This breaks the build with --enable-Werror. Initializing 'n' to avoid the warning. Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-23 15:40:08 +02:00
Ben Pfaff	f05d6d623e	ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+. Open vSwitch supports OpenFlow "instructions", which were introduced in OpenFlow 1.1 and act like restricted kinds of actions that can only appear in a particular order and particular circumstances. OVS did not support two of these instructions, "write_metadata" and "goto_table", properly in the case where they appeared in a flow that needed to be frozen for continuations. Both of these instructions had the problem that they couldn't be properly serialized into the stream of actions, because they're not actions. This commit fixes that problem in freeze_unroll_actions() by converting them into equivalent actions for serialization. goto_table had the additional problem that it was being serialized to the frozen stream even after it had been executed. This was already properly handled in do_xlate_actions() for resubmit, which is almost equivalent to goto_table, so this commit applies the same fix to goto_table. (The commit removes an assertion from the goto_table implementation, but there wasn't any real value in that assertion and I thought the code looked cleaner without it.) This commit adds tests that would have found these bugs. This includes adding a variant of each continuation test that uses OF1.3 for monitor/resume (which is necessary to trigger these bugs) plus specific tests for continuations with goto_table and write_metadata. It also improves the continuation test infrastructure to add more detail on the problem if a test fails. Signed-off-by: Ben Pfaff <blp@ovn.org> Reported-by: Grayson Wu <wgrayson@vmware.com> Reported-at: https://github.com/openvswitch/ovs-issues/issues/213 Discussed-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386166.html Acked-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-22 12:26:20 -07:00
Wilson Peng	8e808e7f14	datapath-windows:Correct checksum for DNAT action While testing OVS-windows flows for the DNAT action, the checksum In TCP header is set incorrectly when TCP offload is enabled by Default. As a result, the packet will be dropped on receiver linuxVM. >>>sample flow default configuration on both Windows VM and Linux VM (src=40.0.1.2,dst=10.150.0.1) --dnat--> (src=40.0.1.2,dst==30.1.0.2) Without the fix for some TCP packet(40.0.1.2->30.1.0.2 with payload len 207) the TCP checksum will be pseduo header checksum and the value is 0x01d6. With the fix the checksum will be 0x47ee, it could be got the correct TCP checksum on the receiver Linux VM. Signed-off-by: Wilson Peng<pweisong@vmware.com> Signed-off-by: Anand Kumar<kumaranand@vmware.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-07-21 12:59:06 +03:00
David Marchand	9547987526	Documentation: Remove duplicate words. This is a simple cleanup with a script of mine. Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2021-07-19 09:33:01 -07:00
Ilya Maximets	4703bc67b7	Prepare for post-2.16.0 (2.16.90). Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:39 +02:00
Ilya Maximets	45bd6d93f1	Prepare for 2.16.0. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:23 +02:00
Ilya Maximets	298d4151f4	bond: Fix broken rebalancing after link state changes. There are 3 constraints for moving hashes from one member to another: 1. The load difference exceeds ~ 3% of the load of one member. 2. The difference in load between members exceeds 100,000 bytes. 3. Moving the hash reduces the load difference by more than 10%. In the current implementation, if one of the members transitions to the DOWN state, all hashes assigned to it will be moved to the other members. After that, if the member goes UP, it will wait for rebalancing to get hashes. But in case we have more than 10 equally loaded hashes, it will never meet constraint # 3, because each hash will handle less than 10% of the load. The situation gets worse when the number of flows grows and it is almost impossible to transfer any hash when all 256 hash records are used, which is very likely when we have few hundred/thousand flows. As a result, if one of the members goes down and back up while traffic flows, it will never be used to transmit packets again. This will not be fixed even if we completely stop the traffic and start it again, because the first two constraints will block rebalancing in the earlier stages, while we have low traffic volume. Moving a single hash if the destination does not have any hashes, as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the rebalancing logic"), will not help, because a single hash is not enough to make the difference in load less than 10% of the total load, and this member will handle only that one hash forever. To fix this, let's try to move multiple hashes at the same time to meet constraint # 3. The implementation includes sorting the "records" to be able to collect records with a cumulative load close enough to the ideal value. Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:23 +02:00
Mark Gray	b1e517bd2f	dpif-netlink: Introduce per-cpu upcall dispatch. The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is correspondingly a large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. Reported-at: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
Mark Gray	485e3a13a6	dpif-netlink: Fix report_loss() message. Fixes: 1579cf677fcb ("dpif-linux: Implement the API functions to allow multiple handler threads read upcall.") Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
Mark Gray	1325debb45	ofproto: Change type of n_handlers and n_revalidators. 'n_handlers' and 'n_revalidators' are declared as type 'size_t'. However, dpif_handlers_set() requires parameter 'n_handlers' as type 'uint32_t'. This patch fixes this type mismatch. Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 20:05:03 +02:00
David Marchand	3222a89d9a	dpif-netdev: Report overhead busy cycles per pmd. Users complained that per rxq pmd usage was confusing: summing those values per pmd would never reach 100% even if increasing traffic load beyond pmd capacity. This is because the dpif-netdev/pmd-rxq-show command only reports "pure" rxq cycles while some cycles are used in the pmd mainloop and adds up to the total pmd load. dpif-netdev/pmd-stats-show does report per pmd load usage. This load is measured since the last dpif-netdev/pmd-stats-clear call. On the other hand, the per rxq pmd usage reflects the pmd load on a 10s sliding window which makes it non trivial to correlate. Gather per pmd busy cycles with the same periodicity and report the difference as overhead in dpif-netdev/pmd-rxq-show so that we have all info in a single command. Example: $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 1 core_id 3: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 90 % overhead: 4 % pmd thread numa_id 1 core_id 5: isolated : false port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost1 queue-id: 0 (enabled) pmd usage: 93 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % port: vhost6 queue-id: 0 (enabled) pmd usage: 0 % overhead: 6 % pmd thread numa_id 1 core_id 31: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 86 % overhead: 4 % pmd thread numa_id 1 core_id 33: isolated : false port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % port: vhost4 queue-id: 0 (enabled) pmd usage: 0 % port: vhost5 queue-id: 0 (enabled) pmd usage: 92 % port: vhost7 queue-id: 0 (enabled) pmd usage: 0 % overhead: 7 % Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 17:43:42 +01:00
Kevin Traynor	30bfba0249	tests: Add new test for cross-numa pmd rxq assignments. Add some tests to ensure that if there are numa local PMDs they are used for polling an rxq. Also check that if there are only numa non-local PMDs they will be used ito poll the rxq and but the user will be warned. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:52:09 +01:00
Kevin Traynor	6193e03267	dpif-netdev: Allow pin rxq and non-isolate PMD. Pinning an rxq to a PMD with pmd-rxq-affinity may be done for various reasons such as reserving a full PMD for an rxq, or to ensure that multiple rxqs from a port are handled on different PMDs. Previously pmd-rxq-affinity always isolated the PMD so no other rxqs could be assigned to it by OVS. There may be cases where there is unused cycles on those pmds and the user would like other rxqs to also be able to be assigned to it by OVS. Add an option to pin the rxq and non-isolate the PMD. The default behaviour is unchanged, which is pin and isolate the PMD. In order to pin and non-isolate: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false Note this is available only with group assignment type, as pinning conflicts with the operation of the other rxq assignment algorithms. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:57 +01:00
Kevin Traynor	3dd050909a	dpif-netdev: Add group rxq scheduling assignment type. Add an rxq scheduling option that allows rxqs to be grouped on a pmd based purely on their load. The current default 'cycles' assignment sorts rxqs by measured processing load and then assigns them to a list of round robin PMDs. This helps to keep the rxqs that require most processing on different cores but as it selects the PMDs in round robin order, it equally distributes rxqs to PMDs. 'cycles' assignment has the advantage in that it separates the most loaded rxqs from being on the same core but maintains the rxqs being spread across a broad range of PMDs to mitigate against changes to traffic pattern. 'cycles' assignment has the disadvantage that in order to make the trade off between optimising for current traffic load and mitigating against future changes, it tries to assign and equal amount of rxqs per PMD in a round robin manner and this can lead to a less than optimal balance of the processing load. Now that PMD auto load balance can help mitigate with future changes in traffic patterns, a 'group' assignment can be used to assign rxqs based on their measured cycles and the estimated running total of the PMDs. In this case, there is no restriction about keeping equal number of rxqs per PMD as it is purely load based. This means that one PMD may have a group of low load rxqs assigned to it while another PMD has one high load rxq assigned to it, as that is the best balance of their measured loads across the PMDs. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:47 +01:00
Kevin Traynor	4fb54652e0	dpif-netdev: Assign PMD for failed pinned rxqs. Previously, if pmd-rxq-affinity was used to pin an rxq to a core that was not in pmd-cpu-mask the rxq was not polled for and the user received a warning. This meant that no traffic would be received from that rxq. Now that pinned and non-pinned rxqs are assigned to PMDs in a common call to rxq scheduling, if an invalid core is selected in pmd-rxq-affinity the rxq can be assigned an available PMD (if any). A warning will still be logged as the requested core could not be used. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:35 +01:00
Kevin Traynor	0efefc4f98	dpif-netdev: Sort PMD list by core id for rxq scheduling. The list of PMDs is round robined through for the selection when assigning an rxq to a PMD. The list is based on a hash map, so there is no defined order. It means the same set of PMDs may get assigned different rxqs on different runs for no reason other than how the PMDs are stored in the hash map. This can be easily changed by sorting the PMDs by core id after they are extracted, so the PMDs will be used in a consistent order. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:22 +01:00
Kevin Traynor	58fed7e8d8	dpif-netdev: Make PMD auto load balance use common rxq scheduling. PMD auto load balance had its own separate implementation of the rxq scheduling that it used for dry runs. This was done because previously the rxq scheduling was not made reusable for a dry run. Apart from the code duplication (which is a good enough reason to replace it alone) this meant that if any further rxq scheduling changes or assignment types were added they would also have to be duplicated in the auto load balance code too. This patch replaces the current PMD auto load balance rxq scheduling code to reuse the common rxq scheduling code. The behaviour does not change from a user perspective, except the logs are updated to be more consistent. As the dry run will compare the pmd load variances for current and estimated assignments, new functions are added to populate the current assignments and use the rxq scheduling data structs for variance calculations. Now that the new rxq scheduling data structures are being used in PMD auto load balance, the older rr_* data structs and associated functions can be removed. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:12 +01:00
Kevin Traynor	f577c2d046	dpif-netdev: Rework rxq scheduling code. This reworks the current rxq scheduling code to break it into more generic and reusable pieces. The behaviour does not change from a user perspective, except the logs are updated to be more consistent. From an implementation view, there are some changes with mind to extending functionality. The high level reusable functions added in this patch are: - Generate a list of current numas and pmds - Perform rxq scheduling assignments into that list - Effect the rxq scheduling assignments so they are used Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:01 +01:00
Vasu Dasari	ccc24fc88d	ofproto-dpif: APIs and CLI option to add/delete static fdb entry. Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 side. For L2 side, there is only fdb show command. This commit gives an option to add/del an fdb entry via ovs-appctl. CLI command looks like: To add: ovs-appctl fdb/add <bridge> <port> <vlan> <Mac> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 To del: ovs-appctl fdb/del <bridge> <vlan> <Mac> ovs-appctl fdb/del br0 0 50:54:00:00:00:05 Added two new APIs to provide convenient interface to add and delete static-macs. bool xlate_add_static_mac_entry(const struct ofproto_dpif , ofp_port_t in_port, struct eth_addr dl_src, int vlan); bool xlate_delete_static_mac_entry(const struct ofproto_dpif , struct eth_addr dl_src, int vlan); 1. Static entry should not age. To indicate that entry being programmed is a static entry, 'expires' field in 'struct mac_entry' will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value is made while deleting mac entry as part of regular aging process. 2. Another change to the mac-update logic, when a packet with same dl_src as that of a static-mac entry arrives on any port, the logic will not modify the expires field. 3. While flushing fdb entries, made sure static ones are not evicted. 4. Updated "ovs-appctl fdb/stats-show br0" to display number of static entries in switch Added following tests: ofproto-dpif - static-mac add/del/flush ofproto-dpif - static-mac mac moves Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 Signed-off-by: Vasu Dasari <vdasari@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 16:21:02 +02:00
Rosemarie O'Riorden	ae2424696c	dpdk: Logs to announce removal of defaults for socket-mem and limit. Deprecate current OVS provided defaults for DPDK socket-mem and socket-limit that are planned to be removed in OVS 2.17. At that point DPDK defaults will be used instead. Warnings have been added to alert users in advance. Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 14:00:31 +01:00
David Marchand	15329b728b	flow: Count and dump invalid IP packets. Skipping further processing of invalid IP packets helps avoid crashes but it does not help to figure out if the malformed packets are still present on the network. Add coverage counters for IPv4 and IPv6 sanity checks so that we know there are some invalid packets. Dump such whole packets in debug mode. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 14:23:00 +02:00
Gaetan Rivet	6545977cee	ovs-rcu: Remove unused perthread mutex. A mutex is allocated, initialized and destroyed, without being used in the perthread structure. Signed-off-by: Gaetan Rivet <grive@u256.net> Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 14:19:41 +02:00
Guzowski Adrian	cb4bff6ff8	Don't mangle shebangs when building DKMS RPM package. While building the package, some .in files are being subject to shebang substitution, but the process fails, because given scripts have placeholders in place of shebangs. In order to fix the issue, don't mangle shebangs in this specific package. Signed-off-by: Guzowski Adrian <adrian.guzowski@exatel.pl> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 14:18:10 +02:00
Ilya Maximets	1f38f9dcf2	AUTHORS: Add Adrian Guzowski. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 14:13:59 +02:00
Guzowski Adrian	2abd8148cf	Add ability to override default Release suffix in RPM packages. In some cases, like building OvS packages in Jenkins, it may be useful to set a custom version suffix that will correspond with job's build number, etc. Currently, version number is explicitly set to 1. This change adds a define "release_number" that may be overridden during package bulding process: rpmbuild -ba --define="release_number X" ... Signed-off-by: Guzowski Adrian <adrian.guzowski@exatel.pl> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 14:11:12 +02:00
Terry Wilson	d28c5ca576	python: Add cooperative_yield() API method to Idl. When using eventlet monkey_patch()'d code, greenthreads can be blocked on connection for several seconds while the database contents are parsed. Eventlet recommends adding a sleep(0) call to cooperatively yield in cpu-bound code. asyncio code has asyncio.sleep(0). This patch adds an API method that defaults to doing nothing, but can be overridden to yield as needed. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 14:08:19 +02:00
Timothy Redaelli	487253d5b8	python: Update bundled sortedcontainers to 2.4.0. This is needed since the current bundled version doesn't work on Python 3.10+. Signed-off-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 13:57:21 +02:00
David Marchand	6c41bcb138	ci: Do not dump logs on error for GitHub Actions. GHA webui directly focus on the last lines for a failing step. config and testsuite logs are attached as artifacts in GHA in case of failures, so dumping them just adds noise. Skip dumping those files. Travis is left untouched though Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 13:54:25 +02:00
Eli Britstein	7ab851e1b8	dpif-netdev: Do not execute packet recovery without experimental support. rte_flow_get_restore_info() API is under experimental attribute. Using it has a performance impact that can be avoided for non-experimental compilation. Do not call it without experimental support. Reported-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 13:50:28 +02:00
Harry van Haaren	a72c1dfbd5	dpif/dpcls: limit count subtable search info logs This commit avoids many instances of "using subtable X for miniflow (x,y)" in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs when no specialized subtable is found, and the generic "_any" version of the avx512 subtable search implementation was used. This change logs the subtable usage once, avoiding duplicates. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: kumar Amber <kumar.amber@intel.com> Co-authored-by: kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 12:02:43 +01:00
Ian Stokes	26fbd1a1be	AUTHORS: Add Cian Ferriter. Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:56:24 +01:00
Ian Stokes	83aae83e65	AUTHORS: Add Amber Kumar. Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:51:10 +01:00
Harry van Haaren	aa85a25095	dpif-netdev/mfex: Add more AVX512 traffic profiles This commit adds 3 new traffic profile implementations to the existing avx512 miniflow extract infrastructure. The profiles added are: - Ether()/IP()/TCP() - Ether()/Dot1Q()/IP()/UDP() - Ether()/Dot1Q()/IP()/TCP() The design of the avx512 code here is for scalability to add more traffic profiles, as well as enabling CPU ISA. Note that an implementation is primarily adding static const data, which the compiler then specializes away when the profile specific function is declared below. As a result, the code is relatively maintainable, and scalable for new traffic profiles as well as new ISA, and does not lower performance compared with manually written code for each profile/ISA. Note that confidence in the correctness of each implementation is achieved through autovalidation, unit tests with known packets, and fuzz tested packets. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:49 +01:00
Harry van Haaren	250ceddcc2	dpif-netdev/mfex: Add AVX512 based optimized miniflow extract This commit adds AVX512 implementations of miniflow extract. By using the 64 bytes available in an AVX512 register, it is possible to convert a packet to a miniflow data-structure in a small quantity instructions. The implementation here probes for Ether()/IP()/UDP() traffic, and builds the appropriate miniflow data-structure for packets that match the probe. The implementation here is auto-validated by the miniflow extract autovalidator, hence its correctness can be easily tested and verified. Note that this commit is designed to easily allow addition of new traffic profiles in a scalable way, without code duplication for each traffic profile. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:42 +01:00
Harry van Haaren	32f93dc5ed	dpdk: Add additional CPU ISA detection strings This commit enables OVS to at runtime check for more detailed AVX512 capabilities, specifically Byte and Word (BW) extensions, and Vector Bit Manipulation Instructions (VBMI). These instructions will be used in the CPU ISA optimized implementations of traffic profile aware miniflow extract. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:29 +01:00
Harry van Haaren	dc39608d2a	dpif/stats: Add miniflow extract opt hits counter This commit adds a new counter to be displayed to the user when requesting datapath packet statistics. It counts the number of packets that are parsed and a miniflow built up from it by the optimized miniflow extract parsers. The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an extra entry indicating if the optimized MFEX was hit: - MFEX Opt hits: 6786432 (100.0 %) Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:14 +01:00
Kumar Amber	50be6715c0	test/sytem-dpdk: Add unit test for mfex autovalidator Tests: 6: OVS-DPDK - MFEX Autovalidator 7: OVS-DPDK - MFEX Autovalidator Fuzzy 8: OVS-DPDK - MFEX Configuration Added a new directory to store the PCAP file used in the tests and a script to generate the fuzzy traffic type pcap to be used in fuzzy unit test. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:30:31 +01:00
Kumar Amber	a395b132b7	dpif-netdev: Add packet count and core id paramters for study This commit introduces additional command line paramter for mfex study function. If user provides additional packet out it is used in study to compare minimum packets which must be processed else a default value is choosen. Also introduces a third paramter for choosing a particular pmd core. $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3 Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:11:13 +01:00
Kumar Amber	5324b54e60	dpif-netdev: Add configure to enable autovalidator at build time. This commit adds a new command to allow the user to enable autovalidatior by default at build time thus allowing for runnig unit test by default. $ ./configure --enable-mfex-default-autovalidator Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 10:34:49 +01:00
Kumar Amber	5c5c98cec2	docs/dpdk/bridge: Add miniflow extract section. This commit adds a section to the dpdk/bridge.rst netdev documentation, detailing the added miniflow functionality. The newly added commands are documented, and sample output is provided. The use of auto-validator and special study function is also described in detail as well as running fuzzy tests. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Cian Ferriter <cian.ferriter@intel.com> Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 10:32:41 +01:00
Kumar Amber	72dd22a0df	dpif-netdev: Add study function to select the best mfex function The study function runs all the available implementations of miniflow_extract and makes a choice whose hitmask has maximum hits and sets the mfex to that function. Study can be run at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set study Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 10:29:52 +01:00
Kumar Amber	dd3f5d86d9	dpif-netdev: Add auto validation function for miniflow extract This patch introduced the auto-validation function which allows users to compare the batch of packets obtained from different miniflow implementations against the linear miniflow extract and return a hitmask. The autovaidator function can be triggered at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 09:57:19 +01:00
Kumar Amber	3d8f47bc04	dpif-netdev: Add command line and function pointer for miniflow extract This patch introduces the MFEX function pointers which allows the user to switch between different miniflow extract implementations which are provided by the OVS based on optimized ISA CPU. The user can query for the available minflow extract variants available for that CPU by following commands: $ovs-appctl dpif-netdev/miniflow-parser-get Similarly an user can set the miniflow implementation by the following command : $ ovs-appctl dpif-netdev/miniflow-parser-set name This allows for more performance and flexibility to the user to choose the miniflow implementation according to the needs. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 09:56:58 +01:00
Ilya Maximets	3e82604b7c	docs: Add documentation for ovsdb relay mode. Main documentation for the service model and tutorial with the use case and configuration examples. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:38:52 +02:00
Ilya Maximets	e26bf9726f	ovsdb: Make clients aware of relay service model. Clients needs to re-connect from the relay that has no connection with the database source. Also, relay acts similarly to the follower from a clustered model from the consistency point of view, so it's not suitable for leader-only connections. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:38:49 +02:00
Ilya Maximets	edcf441722	ovsdb: relay: Reflect connection status in _Server database. It might be important for clients to know that relay lost connection with the relay remote, so they could re-connect to other relay. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:38:31 +02:00
Ilya Maximets	7964ffe7d2	ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:38:07 +02:00
Ilya Maximets	026c77c58d	ovsdb: New ovsdb 'relay' service model. New database service model 'relay' that is needed to scale out read-mostly database access, e.g. ovn-controller connections to OVN_Southbound. In this service model ovsdb-server connects to existing OVSDB server and maintains in-memory copy of the database. It serves read-only transactions and monitor requests by its own, but forwards write transactions to the relay source. Key differences from the active-backup replication: - support for "write" transactions (next commit). - no on-disk storage. (probably, faster operation) - support for multiple remotes (connect to the clustered db). - doesn't try to keep connection as long as possible, but faster reconnects to other remotes to avoid missing updates. - No need to know the complete database schema beforehand, only the schema name. - can be used along with other standalone and clustered databases by the same ovsdb-server process. (doesn't turn the whole jsonrpc server to read-only mode) - supports modern version of monitors (monitor_cond_since), because based on ovsdb-cs. - could be chained, i.e. multiple relays could be connected one to another in a row or in a tree-like form. - doesn't increase availability. - cannot be converted to other service models or become a main active server. Some performance test results can be found here: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385825.html Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:38:03 +02:00
Ilya Maximets	b4cef64c83	ovsdb: row: Add support for xor-based row updates. This will be used to apply update3 type updates to ovsdb tables while processing updates for future ovsdb 'relay' service model. 'ovsdb_datum_apply_diff' is allowed to fail, so adding support to return this error. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:37:46 +02:00
Ilya Maximets	85dbbe275b	ovsdb: table: Expose functions to execute operations on ovsdb tables. These functions will be used later for ovsdb 'relay' service model, so moving them to a common code. Warnings translated to ovsdb errors, caller in replication.c only printed inconsistency warnings, but mostly ignored them. Implementing the same logic by checking the error tag. Also ovsdb_execute_insert() previously printed incorrect warning about duplicate row while it was a syntax error in json. Fixing that by actually checking for the duplicate and reporting correct ovsdb error. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:37:43 +02:00

1 2 3 4 5 ...

18716 Commits