mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-29 21:38:13 +00:00

Author	SHA1	Message	Date
Frode Nordahl	bfee9f6c01	netlink: Add support for parsing link layer address. Data retrieved from netlink and friends may include link layer address. Add type to nl_attr_type and min/max functions to allow use of nl_policy_parse with this type of data. While this will not be used by Open vSwitch itself at this time, sibling and derived projects want to use the great netlink library that OVS provides, and it is not possible to safely override the global nl_attr_type symbol at link time. Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2021-08-20 11:32:52 -07:00
Dumitru Ceara	daf627f459	ovsdb-cs: Perform forced reconnects without a backoff. The ovsdb-cs layer triggers a forced reconnect in various cases: - when an inconsistency is detected in the data received from the remote server. - when the remote server is running in clustered mode and transitioned to "follower", if the client is configured in "leader-only" mode. - when explicitly requested by upper layers (e.g., by the user application, through the IDL layer). In such cases it's desirable that reconnection should happen as fast as possible, without the current exponential backoff maintained by the underlying reconnect object. Furthermore, since 3c2d6274bcee ("raft: Transfer leadership before creating snapshots."), leadership changes inside the clustered database happen more often and, therefore, "leader-only" clients need to reconnect more often too. Forced reconnects call jsonrpc_session_force_reconnect() which will not reset backoff. To make sure clients reconnect as fast as possible in the aforementioned scenarios we first call the new API, jsonrpc_session_reset_backoff(), in ovsdb-cs, for sessions that are in state CS_S_MONITORING (i.e., the remote is likely still alive and functioning fine). jsonrpc_session_reset_backoff() resets the number of backoff-free reconnect retries to the number of remotes configured for the session, ensuring that all remotes are retried exactly once with backoff 0. This commit also updates the Python IDL and jsonrpc implementations. The Python IDL wasn't tracking the IDL_S_MONITORING state explicitly, we now do that too. Tests were also added to make sure the IDL forced reconnects happen without backoff. Reported-at: https://bugzilla.redhat.com/1977264 Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-23 17:29:36 +02:00
kumar Amber	69b2bdfd3f	system-dpdk.at: Fix module not found error for pyhton < 3.6. This fixes the flake8 error on pyhton version older than 3.6 as ModuleNotFoundError in not available before 3.6. ../../tests/mfex_fuzzy.py:5:8: F821 undefined name 'ModuleNotFoundError' Makefile:5826: recipe for target 'flake8-check' failed Since it doesn't really make any sense to catch this exception, try-except block is just removed. Additionally the check for scapy replaced with the more reliable one. Imports re-ordered, because standard imports should go first. Fixes: 50be6715c0 ("test/sytem-dpdk: Add unit test for mfex autovalidator") Signed-off-by: kumar Amber <kumar.amber@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-23 17:25:46 +02:00
Ben Pfaff	f05d6d623e	ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+. Open vSwitch supports OpenFlow "instructions", which were introduced in OpenFlow 1.1 and act like restricted kinds of actions that can only appear in a particular order and particular circumstances. OVS did not support two of these instructions, "write_metadata" and "goto_table", properly in the case where they appeared in a flow that needed to be frozen for continuations. Both of these instructions had the problem that they couldn't be properly serialized into the stream of actions, because they're not actions. This commit fixes that problem in freeze_unroll_actions() by converting them into equivalent actions for serialization. goto_table had the additional problem that it was being serialized to the frozen stream even after it had been executed. This was already properly handled in do_xlate_actions() for resubmit, which is almost equivalent to goto_table, so this commit applies the same fix to goto_table. (The commit removes an assertion from the goto_table implementation, but there wasn't any real value in that assertion and I thought the code looked cleaner without it.) This commit adds tests that would have found these bugs. This includes adding a variant of each continuation test that uses OF1.3 for monitor/resume (which is necessary to trigger these bugs) plus specific tests for continuations with goto_table and write_metadata. It also improves the continuation test infrastructure to add more detail on the problem if a test fails. Signed-off-by: Ben Pfaff <blp@ovn.org> Reported-by: Grayson Wu <wgrayson@vmware.com> Reported-at: https://github.com/openvswitch/ovs-issues/issues/213 Discussed-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/386166.html Acked-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-22 12:26:20 -07:00
Ilya Maximets	298d4151f4	bond: Fix broken rebalancing after link state changes. There are 3 constraints for moving hashes from one member to another: 1. The load difference exceeds ~ 3% of the load of one member. 2. The difference in load between members exceeds 100,000 bytes. 3. Moving the hash reduces the load difference by more than 10%. In the current implementation, if one of the members transitions to the DOWN state, all hashes assigned to it will be moved to the other members. After that, if the member goes UP, it will wait for rebalancing to get hashes. But in case we have more than 10 equally loaded hashes, it will never meet constraint # 3, because each hash will handle less than 10% of the load. The situation gets worse when the number of flows grows and it is almost impossible to transfer any hash when all 256 hash records are used, which is very likely when we have few hundred/thousand flows. As a result, if one of the members goes down and back up while traffic flows, it will never be used to transmit packets again. This will not be fixed even if we completely stop the traffic and start it again, because the first two constraints will block rebalancing in the earlier stages, while we have low traffic volume. Moving a single hash if the destination does not have any hashes, as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the rebalancing logic"), will not help, because a single hash is not enough to make the difference in load less than 10% of the total load, and this member will handle only that one hash forever. To fix this, let's try to move multiple hashes at the same time to meet constraint # 3. The implementation includes sorting the "records" to be able to collect records with a cumulative load close enough to the ideal value. Acked-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 21:52:23 +02:00
David Marchand	3222a89d9a	dpif-netdev: Report overhead busy cycles per pmd. Users complained that per rxq pmd usage was confusing: summing those values per pmd would never reach 100% even if increasing traffic load beyond pmd capacity. This is because the dpif-netdev/pmd-rxq-show command only reports "pure" rxq cycles while some cycles are used in the pmd mainloop and adds up to the total pmd load. dpif-netdev/pmd-stats-show does report per pmd load usage. This load is measured since the last dpif-netdev/pmd-stats-clear call. On the other hand, the per rxq pmd usage reflects the pmd load on a 10s sliding window which makes it non trivial to correlate. Gather per pmd busy cycles with the same periodicity and report the difference as overhead in dpif-netdev/pmd-rxq-show so that we have all info in a single command. Example: $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 1 core_id 3: isolated : true port: dpdk0 queue-id: 0 (enabled) pmd usage: 90 % overhead: 4 % pmd thread numa_id 1 core_id 5: isolated : false port: vhost0 queue-id: 0 (enabled) pmd usage: 0 % port: vhost1 queue-id: 0 (enabled) pmd usage: 93 % port: vhost2 queue-id: 0 (enabled) pmd usage: 0 % port: vhost6 queue-id: 0 (enabled) pmd usage: 0 % overhead: 6 % pmd thread numa_id 1 core_id 31: isolated : true port: dpdk1 queue-id: 0 (enabled) pmd usage: 86 % overhead: 4 % pmd thread numa_id 1 core_id 33: isolated : false port: vhost3 queue-id: 0 (enabled) pmd usage: 0 % port: vhost4 queue-id: 0 (enabled) pmd usage: 0 % port: vhost5 queue-id: 0 (enabled) pmd usage: 92 % port: vhost7 queue-id: 0 (enabled) pmd usage: 0 % overhead: 7 % Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 17:43:42 +01:00
Kevin Traynor	30bfba0249	tests: Add new test for cross-numa pmd rxq assignments. Add some tests to ensure that if there are numa local PMDs they are used for polling an rxq. Also check that if there are only numa non-local PMDs they will be used ito poll the rxq and but the user will be warned. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:52:09 +01:00
Kevin Traynor	6193e03267	dpif-netdev: Allow pin rxq and non-isolate PMD. Pinning an rxq to a PMD with pmd-rxq-affinity may be done for various reasons such as reserving a full PMD for an rxq, or to ensure that multiple rxqs from a port are handled on different PMDs. Previously pmd-rxq-affinity always isolated the PMD so no other rxqs could be assigned to it by OVS. There may be cases where there is unused cycles on those pmds and the user would like other rxqs to also be able to be assigned to it by OVS. Add an option to pin the rxq and non-isolate the PMD. The default behaviour is unchanged, which is pin and isolate the PMD. In order to pin and non-isolate: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false Note this is available only with group assignment type, as pinning conflicts with the operation of the other rxq assignment algorithms. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:57 +01:00
Kevin Traynor	3dd050909a	dpif-netdev: Add group rxq scheduling assignment type. Add an rxq scheduling option that allows rxqs to be grouped on a pmd based purely on their load. The current default 'cycles' assignment sorts rxqs by measured processing load and then assigns them to a list of round robin PMDs. This helps to keep the rxqs that require most processing on different cores but as it selects the PMDs in round robin order, it equally distributes rxqs to PMDs. 'cycles' assignment has the advantage in that it separates the most loaded rxqs from being on the same core but maintains the rxqs being spread across a broad range of PMDs to mitigate against changes to traffic pattern. 'cycles' assignment has the disadvantage that in order to make the trade off between optimising for current traffic load and mitigating against future changes, it tries to assign and equal amount of rxqs per PMD in a round robin manner and this can lead to a less than optimal balance of the processing load. Now that PMD auto load balance can help mitigate with future changes in traffic patterns, a 'group' assignment can be used to assign rxqs based on their measured cycles and the estimated running total of the PMDs. In this case, there is no restriction about keeping equal number of rxqs per PMD as it is purely load based. This means that one PMD may have a group of low load rxqs assigned to it while another PMD has one high load rxq assigned to it, as that is the best balance of their measured loads across the PMDs. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:47 +01:00
Kevin Traynor	4fb54652e0	dpif-netdev: Assign PMD for failed pinned rxqs. Previously, if pmd-rxq-affinity was used to pin an rxq to a core that was not in pmd-cpu-mask the rxq was not polled for and the user received a warning. This meant that no traffic would be received from that rxq. Now that pinned and non-pinned rxqs are assigned to PMDs in a common call to rxq scheduling, if an invalid core is selected in pmd-rxq-affinity the rxq can be assigned an available PMD (if any). A warning will still be logged as the requested core could not be used. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:35 +01:00
Kevin Traynor	f577c2d046	dpif-netdev: Rework rxq scheduling code. This reworks the current rxq scheduling code to break it into more generic and reusable pieces. The behaviour does not change from a user perspective, except the logs are updated to be more consistent. From an implementation view, there are some changes with mind to extending functionality. The high level reusable functions added in this patch are: - Generate a list of current numas and pmds - Perform rxq scheduling assignments into that list - Effect the rxq scheduling assignments so they are used Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 16:51:01 +01:00
Vasu Dasari	ccc24fc88d	ofproto-dpif: APIs and CLI option to add/delete static fdb entry. Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3 side. For L2 side, there is only fdb show command. This commit gives an option to add/del an fdb entry via ovs-appctl. CLI command looks like: To add: ovs-appctl fdb/add <bridge> <port> <vlan> <Mac> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05 To del: ovs-appctl fdb/del <bridge> <vlan> <Mac> ovs-appctl fdb/del br0 0 50:54:00:00:00:05 Added two new APIs to provide convenient interface to add and delete static-macs. bool xlate_add_static_mac_entry(const struct ofproto_dpif , ofp_port_t in_port, struct eth_addr dl_src, int vlan); bool xlate_delete_static_mac_entry(const struct ofproto_dpif , struct eth_addr dl_src, int vlan); 1. Static entry should not age. To indicate that entry being programmed is a static entry, 'expires' field in 'struct mac_entry' will be set to a MAC_ENTRY_AGE_STATIC_ENTRY. A check for this value is made while deleting mac entry as part of regular aging process. 2. Another change to the mac-update logic, when a packet with same dl_src as that of a static-mac entry arrives on any port, the logic will not modify the expires field. 3. While flushing fdb entries, made sure static ones are not evicted. 4. Updated "ovs-appctl fdb/stats-show br0" to display number of static entries in switch Added following tests: ofproto-dpif - static-mac add/del/flush ofproto-dpif - static-mac mac moves Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 Signed-off-by: Vasu Dasari <vdasari@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-16 16:21:02 +02:00
Harry van Haaren	dc39608d2a	dpif/stats: Add miniflow extract opt hits counter This commit adds a new counter to be displayed to the user when requesting datapath packet statistics. It counts the number of packets that are parsed and a miniflow built up from it by the optimized miniflow extract parsers. The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an extra entry indicating if the optimized MFEX was hit: - MFEX Opt hits: 6786432 (100.0 %) Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:31:14 +01:00
Kumar Amber	50be6715c0	test/sytem-dpdk: Add unit test for mfex autovalidator Tests: 6: OVS-DPDK - MFEX Autovalidator 7: OVS-DPDK - MFEX Autovalidator Fuzzy 8: OVS-DPDK - MFEX Configuration Added a new directory to store the PCAP file used in the tests and a script to generate the fuzzy traffic type pcap to be used in fuzzy unit test. Signed-off-by: Kumar Amber <kumar.amber@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-16 11:30:31 +01:00
Ilya Maximets	7964ffe7d2	ovsdb: relay: Add support for transaction forwarding. Current version of ovsdb relay allows to scale out read-only access to the primary database. However, many clients are not read-only but read-mostly. For example, ovn-controller. In order to scale out database access for this case ovsdb-server need to process transactions that are not read-only. Relay is not allowed to do that, i.e. not allowed to modify the database, but it can act like a proxy and forward transactions that includes database modifications to the primary server and forward replies back to a client. At the same time it may serve read-only transactions and monitor requests by itself greatly reducing the load on primary server. This configuration will slightly increase transaction latency, but it's not very important for read-mostly use cases. Implementation details: With this change instead of creating a trigger to commit the transaction, ovsdb-server will create a trigger for transaction forwarding. Later, ovsdb_relay_run() will send all new transactions to the relay source. Once transaction reply received from the relay source, ovsdb-relay module will update the state of the transaction forwarding with the reply. After that, trigger_run() will complete the trigger and jsonrpc_server_run() will send the reply back to the client. Since transaction reply from the relay source will be received after all the updates, client will receive all the updates before receiving the transaction reply as it is in a normal scenario with other database models. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:38:07 +02:00
Ilya Maximets	e93fc5db9b	ovsdb: storage: Allow setting the name for the unbacked storage. ovsdb_create() requires schema or storage to be nonnull, but in practice it requires to have schema name or a storage name to use it as a database name. Only clustered storage has a name. This means that only clustered database can be created without schema, Changing that by allowing unbacked storage to have a name. This way we can create database with unbacked storage without schema. Will be used in next commits to create database for ovsdb 'relay' service model. Acked-by: Mark D. Gray <mark.d.gray@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 22:37:32 +02:00
Ilya Maximets	00dda78ed4	ovsdb-cs: Avoid unnecessary re-connections when updating remotes. If a new database server added to the cluster, or if one of the database servers changed its IP address or port, then you need to update the list of remotes for the client. For example, if a new OVN_Southbound database server is added, you need to update the ovn-remote for the ovn-controller. However, in the current implementation, the ovsdb-cs module always closes the current connection and creates a new one. This can lead to a storm of re-connections if all ovn-controllers will be updated simultaneously. They can also start re-dowloading the database content, creating even more load on the database servers. Correct this by saving an existing connection if it is still in the list of remotes after the update. 'reconnect' module will report connection state updates, but that is OK since no real re-connection happened and we only updated the state of a new 'reconnect' instance. If required, re-connection can be forced after the update of remotes with ovsdb_cs_force_reconnect(). Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-15 21:04:49 +02:00
Cian Ferriter	d76a719a7a	dpif-netdev: Add a partial HWOL PMD statistic. It is possible for packets traversing the userspace datapath to match a flow before hitting on EMC by using a mark ID provided by a NIC. Add a PMD statistic for this hit. Signed-off-by: Cian Ferriter <cian.ferriter@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2021-07-09 17:13:55 +01:00
Paolo Valerio	61e48c2d1d	conntrack: Handle SNAT with all-zero IP address. This patch introduces for the userspace datapath the handling of rules like the following: ct(commit,nat(src=0.0.0.0),...) Kernel datapath already handle this case that is particularly handy in scenarios like the following: Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2 A opens a connection toward B on port 80 selecting as source port 10000. B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80). This will result in: tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000), protoinfo=(state=ESTABLISHED) A now tries to establish another connection with C using source port 10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80). This second connection, if processed by conntrack with no SNAT/DNAT involved, collides with the reverse tuple of the first connection, so the entry for this valid connection doesn't get created. With this commit, and adding a SNAT rule with 0.0.0.0 for 10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry: tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001), protoinfo=(state=ESTABLISHED) tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000), protoinfo=(state=ESTABLISHED) The issue exists even in the opposite case (with A trying to connect to C using B's IP after establishing a direct connection from A to C). This commit refactors the relevant function in a way that both of the previously mentioned cases are handled as well. Suggested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Gaetan Rivet <grive@u256.net> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-08 23:49:34 +02:00
Paolo Valerio	1e19f9aa26	conntrack: Handle already natted packets. When a packet gets dnatted and then recirculated, it could be possible that it matches another rule that performs another nat action. The kernel datapath handles this situation turning to a no-op the second nat action, so natting only once the packet. In the userspace datapath instead, when the ct action gets executed, an initial lookup of the translated packet fails to retrieve the connection related to the packet, leading to the creation of a new entry in ct for the src nat action with a subsequent failure of the connection establishment. with the following flows: table=0,priority=30,in_port=1,ip,nw_dst=192.168.2.100,actions=ct(commit,nat(dst=10.1.1.2:80),table=1) table=0,priority=20,in_port=2,ip,actions=ct(nat,table=1) table=0,priority=10,ip,actions=resubmit(,2) table=0,priority=10,arp,actions=NORMAL table=0,priority=0,actions=drop table=1,priority=5,ip,actions=ct(commit,nat(src=10.1.1.240),table=2) table=2,in_port=ovs-l0,actions=2 table=2,in_port=ovs-r0,actions=1 Establishing a connection from 10.1.1.1 to 192.168.2.100 the outcome is: tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=4000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.240,sport=80,dport=4000), protoinfo=(state=ESTABLISHED) tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000), protoinfo=(state=ESTABLISHED) With this patch applied the outcome is: tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80), reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000), protoinfo=(state=ESTABLISHED) The patch performs, for already natted packets, a lookup of the reverse key in order to retrieve the related entry, it also adds a test case that besides testing the scenario ensures that the other ct actions are executed. Reported-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Paolo Valerio <pvalerio@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-08 23:49:34 +02:00
Eelco Chaudron	e6ad4d8d9c	conntrack: Document all-zero IP SNAT behavior and add a test case. Currently, conntrack in the kernel has an undocumented feature referred to as all-zero IP address SNAT. Basically, when a source port collision is detected during the commit, the source port will be translated to an ephemeral port. If there is no collision, no SNAT is performed. This patchset documents this behavior and adds a self-test to verify it's not changing. In addition, a datapath feature flag is added for the all-zero IP SNAT case. This will help applications on top of OVS, like OVN, to determine this feature can be used. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org> Acked-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-08 21:19:14 +02:00
Eelco Chaudron	355fef6f2c	ofproto-dpif-xlate: Avoid successive ct_clear datapath actions. Due to flow lookup optimizations, especially in the resubmit/clone cases, we might end up with multiple ct_clear actions, which are not necessary. This patch only adds the ct_clear action to the datapath if any ct state is tracked. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Timothy Redaelli <tredaelli@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-08 21:19:06 +02:00
Ilya Maximets	b7809111a6	odp-util: Stop key parsing if already oversized. We don't need to continue parsing if already oversized. This is not very important, but fuzzer times out while parsing very long flow. The check could be written as a single 'if' statement, but I found my variant much more readable. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35519 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2021-07-07 23:39:28 +02:00
David Wilder	3da3cc1a0c	ovs-numa: Support non-contiguous numa nodes and offline CPU cores. This change removes the assumption that numa nodes and cores are numbered contiguously in linux. This change is required to support some Power systems. A check has been added to verify that cores are online, offline cores result in non-contiguously numbered cores. DPDK EAL option generation is updated to work with non-contiguous numa nodes. These options can be seen in the ovs-vswitchd.log. For example: a system containing only numa nodes 0 and 8 will generate the following: EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \ --socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0 Tests for pmd and dpif-netdev have been updated to validate non-contiguous numbered nodes. Signed-off-by: David Wilder <dwilder@us.ibm.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-07 23:35:57 +02:00
Ilya Maximets	b57b062f5d	ofp-actions: Report an error if there are too many actions to parse. Not a very important fix, but fuzzer times out trying to test parsing of a huge number of actions. Fixing that by reporting an error as soon as ofpacts oversized. It would be great to use ofpbuf_oversized() function instead of manual size checking, but ofpacts->header here always points to the last pushed action, so the value that ofpbuf_oversized() would check is always small. Adding a unit test for this, plus the extra test for too deep nesting. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20254 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>	2021-07-07 22:48:05 +02:00
Tianyu Yuan	f686957c96	add test cases for ingress_policing_kpkts parameters Exercise OVS setting of ingress_policing_kpkts parameters using ovs-vsctl and verify that the correct values are stored on OVSDB. Verify the ingress_policing parameters with tc command. Also check offload and non-offload in tc software datapath based on tc filter type (matchall and basic). Skip test of pps if OVS or kernel does not support pps rate limit. Example invocation: make check TESTSUITEFLAGS='-k ingress_policing_kpkts' make check-offloads TESTSUITEFLAGS='-k ingress_policing_kpkts' Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>	2021-07-01 20:44:25 +02:00
Aaron Conole	b6c5f30cfa	checkpatch: Ignore macro definitions of FOR_EACH. When defining a FOR_EACH macro, checkpatch freaks out and generates a control block whitespace error. Create an exception so that it doesn't generate errors for this case. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/373509.html Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-07-01 16:31:55 +02:00
Kevin Traynor	f0e4a7338c	tests: Add PMD auto load balance unit tests. These tests focus on enabling/disabling and user parameters. Co-Authored-by: David Marchand <david.marchand@redhat.com> Signed-off-by: David Marchand <david.marchand@redhat.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-24 22:11:10 +02:00
Kevin Traynor	833f1b843d	pmd.at: Get next line number of log. Some tests get the current log line number so they can check that there is a new occurrence of a log entry after a command. 'tail' uses the line number as the starting line number. However, this will include the last line of the log before the command. To prevent any races on logs and possibly checking an existing log entry prior to a command here or in reuse of this method, get the next line number of the log and use that as the starting line for tail. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-24 22:10:40 +02:00
Rosemarie O'Riorden	bd90524550	Remove Python 2 leftovers. Fixes: 1ca0323e7c29 ("Require Python 3 and remove support for Python 2.") Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949875 Signed-off-by: Rosemarie O'Riorden <roriorde@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-22 21:29:57 +02:00
Martin Varghese	e81ed94214	Fix redundant datapath set ethernet action with NSH Decap. When a decap action is applied on NSH header encapsulating a ethernet packet a redundant set mac address action is programmed to the datapath. Fixes: f839892a206a ("OF support and translation of generic encap and decap") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-16 12:44:54 +02:00
Martin Varghese	c2999459d2	tests: Fixed L3 over patch port tests. Normal action is replaced with output to GRE port for sending l3 packets over GRE tunnel. Normal action cannot be used with l3 packets. Fixes: d03d0cf2b71b ("tests: Extend PTAP unit tests with decap action") Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-16 11:06:29 +02:00
Toms Atteka	cca40141a8	netlink: removed incorrect optimization This optimization caused FLOW_TNL_F_UDPIF flag not to be used in hash calculation for geneve tunnel when revalidating flows which resulted in different cache hash values and incorrect behaviour. Added test to prevent regression. CC: Jesse Gross <jesse@nicira.com> Fixes: 6728d578f64e ("dpif-netdev: Translate Geneve options per-flow, not per-packet.") Reported-at: https://github.com/vmware-tanzu/antrea/issues/897 Signed-off-by: Toms Atteka <cpp.code.lv@gmail.com> Acked-by: Ansis Atteka <aatteka@ovn.org>	2021-06-15 13:34:34 -05:00
Ilya Maximets	2afe31169a	odp-util: Return an error on actions overflow while parsing from string. We don't need to continue parsing if already oversized. This is not very important, but fuzzer times out while parsing very long list of actions. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29190 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2021-06-14 21:19:00 +02:00
Ben Pfaff	5fe3ef1a0c	tests: Fix spelling error in test name. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org>	2021-06-14 11:21:23 -07:00
Ilya Maximets	c5a58ec155	python: idl: Allow retry even when using a single remote. As described in commit [1], it's possible that remote IP is backed by a load-balancer and re-connection to this same IP will lead to connection to a different server. This case is supported for C version of IDL and should be supported in a same way for python implementation. [1] ca367fa5f8bb ("ovsdb-idl.c: Allows retry even when using a single remote.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-06-11 01:11:57 +02:00
Tao YunXiang	91cb55bc8a	system-traffic.at:add missing comma Add missing comma. Signed-off-by: Tao YunXiang <taoyunxiang@cmss.chinamobile.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Cc: Joe Stringer <joe@ovn.org>	2021-06-10 16:10:30 -07:00
Ilya Maximets	4275b5b7fb	ovsdb-client: Integrate record/replay functionality. This is primarily to be able to test recording of client connections. Unit test added accordingly. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-06-07 21:03:16 +02:00
Ilya Maximets	0be15ad76f	ovsdb-server.at: Add unit test for record/replay. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>	2021-06-07 21:03:16 +02:00
Ben Pfaff	3012710ec2	tests: Fix PKIDIR checks in AT_SKIP. In Autotest, [xyz] just expands to xyz. To get [xyz] in output, we need [[xyz]] in input. I spotted this based on "expr" reporting an error in testsuite output. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Han Zhou <hzhou@ovn.org>	2021-06-02 09:33:21 -07:00
Ben Pfaff	5da031d6df	tests: Drop support for glibc before version 2.11. The "ldd" call here didn't work if libtool was involved and would print an error message. We could fix that, but the check is only needed for glibc earlier than 2.11. glibc 2.11 was released in 2009, so it should be safe to expect that testers are running it or a newer version. This is a crossport of a patch originally applied to OVN as commit 2870efff89337298. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Numan Siddique <numans@ovn.org>	2021-06-01 11:55:38 -07:00
Adrian Moreno	0b3ff31d35	ofp_actions: Fix set_mpls_tc formatting. Apart from a cut-and-paste typo, the man page claims that mpls_labels can be provided in hexadecimal format but that's currently not the case. Fix mpls ofp-action formatting, add size checks on ofp-action parsing and add some unit tests. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-05-19 12:42:16 +02:00
Ilya Maximets	7731d26144	dpif-netdev: Remove meter rate from the bucket size calculation. Implementation of meters supposed to be a classic token bucket with 2 typical parameters: rate and burst size. Burst size in this schema is the maximum number of bytes/packets that could pass without being rate limited. Recent changes to userspace datapath made meter implementation to be in line with the kernel one, and this uncovered several issues. The main problem is that maximum bucket size for unknown reason accounts not only burst size, but also the numerical value of rate. This creates a lot of confusion around behavior of meters. For example, if rate is configured as 1000 pps and burst size set to 1, this should mean that meter will tolerate bursts of 1 packet at most, i.e. not a single packet above the rate should pass the meter. However, current implementation calculates maximum bucket size as (rate + burst size), so the effective bucket size will be 1001. This means that first 1000 packets will not be rate limited and average rate might be twice as high as the configured rate. This also makes it practically impossible to configure meter that will have burst size lower than the rate, which might be a desirable configuration if the rate is high. Inability to configure low values of a burst size and overall inability for a user to predict what will be a maximum and average rate from the configured parameters of a meter without looking at the OVS and kernel code might be also classified as a security issue, because drop meters are frequently used as a way of protection from DoS attacks. This change removes rate from the calculation of a bucket size, making it in line with the classic token bucket algorithm and essentially making the rate and burst tolerance being predictable from a users' perspective. Same change will be proposed for the kernel implementation. Unit tests changed back to their correct version and enhanced. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>	2021-05-18 22:11:14 +02:00
Mark Gray	5dce24d04d	ipsec: Fix race in system tests. This patch fixes an issue where, depending on timing fluctuations, each node has not fully loaded all connections before the other node begins to establish a connection. In this failure case, the "ovs-monitor-ipsec" instance on the "left" node may `ipsec auto --start` a connection which then gets rejected by the "right" side. Almost, simulaneously, the "right" side may initiate a connection that gets rejected by the "left" side. This can happen as, for all tunnels except for GRE, each node has two connections (an "in" connection and an "out" connection) that get added one after the other. If the "in" connection "starts" on both sides, the "out" connection from the other node may not be available causing the connection to fail. At this point, "Libreswan" will wait to retry the connection. In the interim, the OVS system test times out. This race manifests itself more frequently in a virtualized environment. This patch resolves this issue by waiting for the "left" node to load all connections before starting the "right" side. This will cause the "left" side to fail to establish a connection with the "right" side (as the "right" side connections have not been loaded) but will cause the "right" side to succeed to establish a connection as all connections will have been loaded on the "left" side. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381857.html Fixes: 8fc62df8b135 ("ipsec: Introduce IPsec system tests for Libreswan.") Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Tested-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-20 00:00:22 +02:00
Tianyu Yuan	44ea24427e	Add test cases for ingress_policing parameters tests/ovs-vsctl.at: Add ingress_policing test in ovs-vsctl unit test tests/system-offloads-traffic.at: Check ingress_policing with offloads enabled and disabled Exercise OVS setting of ingress_policing parameters using ovs-vsctl and verify that the correct values are stored on OVSDB. Verify the ingress_policing parameters with tc command. Also check offload and non-offload in tc software datapath based on tc filter type (matchall and basic). Example invocation: make check TESTSUITEFLAGS='-k ingress_policing' make check-offloads TESTSUITEFLAGS='-k ingress_policing' Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>	2021-04-13 15:36:44 +02:00
Mark Gray	8fc62df8b1	ipsec: Introduce IPsec system tests for Libreswan. This patch adds system tests for OVS IPsec using Libreswan. If Libreswan is not present on the system, the tests will be skipped. These tests set up an underlay switch with bridge 'br0' to carry encrypted traffic between two emulated "nodes". Each "node" is a separate network namespace ('left' and 'right') and runs an instance of the Libreswan "pluto" daemon, ovs-monitor-ipsec, ovs-vswitch and ovsdb-server. Each test sets up IPsec between the two emulated "nodes" using various configurations (currently tunnel type, IPv6/IPv6, authentication method, local_ip). After configuration, connectivity between the two nodes is tested and the underlay traffic is also inspected to ensure the traffic is encrypted. All IPsec system tests can be run by using the ipsec keyword: sudo make check-kernel TESTSUITEFLAGS='-k ipsec' Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-01 19:13:31 +02:00
Mark Gray	4ce8bb159e	system-common-macros: clean up veth device on test failure. 'on_exit' should be run directly after creation of veth device. Fixes: 119db2cb18a7 ("kmod-macros: Move some code to traffic-common-macros.") Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-01 19:13:31 +02:00
Dumitru Ceara	ac85cdb38c	ovsdb-idl: Mark arc sources as updated when destination is deleted. Considering two DB rows, 'a' from table A and 'b' from table B (with column 'ref_a' a reference to table A): a = {A._uuid=<U1>} b = {B._uuid=<U2>, B.ref_a=<U1>} When the IDL client processes an update that deletes row 'a', row 'b' is also marked as 'updated' if change tracking is enabled for table B. Fixes: 102781cc02c6 ("ovsdb-idl: Track changes for table references.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-01 14:15:49 +02:00
Dumitru Ceara	95689f1668	ovsdb-idl: Preserve references for deleted rows. Considering two DB rows, 'a' from table A and 'b' from table B (with column 'ref_a' a reference to table A): a = {A._uuid=<U1>} b = {B._uuid=<U2>, B.ref_a=<U1>} Assuming both records are present in the IDL client's in-memory view of the database, depending whether row 'b' is also deleted in the same transaction or not, deletion of row 'a' should generate the following tracked changes: 1. only row 'a' is deleted: - for table A: - deleted records: a = {A._uuid=<U1>} - for table B: - updated records: b = {B._uuid=<U2>, B.ref_a=[]} 2. row 'a' and row 'b' are deleted in the same update: - for table A: - deleted records: a = {A._uuid=<U1>} - for table B: - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>} To ensure this, we now delay reparsing row backrefs for deleted rows until all updates in the current run have been processed. Without this change, in scenario 2 above, the tracked changes for table B would be: - deleted records: b = {B._uuid=<U2>, B.ref_a=[]} In particular, for strong references, row 'a' can never be deleted in a transaction that happens strictly before row 'b' is deleted. In some cases [0] both rows are deleted in the same transaction and having B.ref_a=[] would violate the integrity of the database from client perspective. This would force the client to always validate that strong reference fields are non-NULL. This is not really an option because the information in the original reference is required for incrementally processing the record deletion. [0] with ovn-monitor-all=true, the following command triggers a crash in ovn-controller because a strong reference field becomes NULL: $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp 00:00:00:00:00:01 1.0.0.1/24 $ ovn-nbctl lr-del r Reported-at: https://bugzilla.redhat.com/1932642 Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for deleted rows.") Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-01 14:15:49 +02:00
Dumitru Ceara	4c0d093b17	ovsdb-idl.at: Make test outputs more predictable. IDL tests need predictable output from test-ovsdb. This used to be done by first sorting the output of test-ovsdb and then applying uuidfilt to predictably translate UUIDs. This was not reliable enough in case test-ovsdb processes two or more insert/delete operations in the same iteration because the order of lines in the output depends on the automatically generated UUID values. To fix this we change the way test-ovsdb and test-ovsdb.py generate outputs and prepend the table name and tracking information before printing the contents of a row. All existing ovsdb-idl.at and ovsdb-cluster.at tests are updated to expect the new output format. Signed-off-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2021-04-01 13:53:20 +02:00

... 2 3 4 5 6 ...

3402 Commits