Currently, if a flow reply results in a message which exceeds
the maximum reply size, it will assert OVS. This would happen
when OVN uses OpenFlow15 to add large flows, and they get read
using OpenFlow10 with ovs-ofctl.
This patch prevents this and adds a test case to make sure the
code behaves as expected.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When testing if an established connection is picked up, it could be
useful to verify that the protocol state matches the expectation, that
is, it moves to ESTABLISHED, as there's a chance that code modifications
may break the TCP conn_update() in a way that it returns CT_UPDATE_VALID
without moving to the correct state leading to a false positive.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If a single transaction exceeds the size of the whole database (e.g.,
a lot of rows got removed and new ones added), transaction history will
be drained. This leads to sending UUID_ZERO to the clients as the last
transaction id in the next monitor update, because monitor doesn't
know what was the actual last transaction id. In case of a re-connect
that will cause re-downloading of the whole database, since the
client's last_id will be out of sync.
One solution would be to store the last transaction ID separately
from the actual transactions, but that will require a careful
management in cases where database gets reset and the history needs
to be cleared. Keeping the one last transaction instead to avoid
the problem. That should not be a big concern in terms of memory
consumption, because this last transaction will be removed from the
history once the next transaction appeared. This is also not a concern
for a fast re-sync, because this last transaction will not be used
for the monitor reply; it's either client already has it, so no need
to send, or it's a history miss.
The test updated to not check the number of atoms if there is only
one transaction in the history.
Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
Reported-at: https://bugzilla.redhat.com/2044621
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For patch ports, the is_last_action value is not propagated and is
always set to true. This causes non-reversible actions to modify the
packet, and the original content is not preserved when processing
the remaining actions.
This patch propagates the is_last_action flag for patch port related
actions. In addition, it also fixes a general last action propagation
to the individual actions.
Fixed check_pkt_larger as last action, as it is a valid case for the
drop action, so it should not be skipped.
Fixes: feee58b95 ("ofproto-dpif-xlate: Keep track of the last action")
Fixes: 5b34f8fc3 ("Add a new OVS action check_pkt_larger")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some warning logs must be waived when using the net/pcap DPDK driver.
Those logs can affect different DPDK drivers (like mlx5) and the tests in
system-dpdk are not testing MTU and Rx checksum, we might as well ignore
those warnings from OVS.
Fixes: d446dcb7e03f ("system-dpdk: Refactor common logs matching.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The current id-pool module is slow to allocate the
next valid ID, and can be optimized when restricting
some properties of the pool.
Those restrictions are:
* No ability to add a random ID to the pool.
* A new ID is no more the smallest possible ID.
It is however guaranteed to be in the range of
[floor, last_alloc + nb_user * cache_size + 1].
where 'cache_size' is the number of ID in each per-user
cache. It is defined as 'ID_FPOOL_CACHE_SIZE' to 64.
* A user should never free an ID that is not allocated.
No checks are done and doing so will duplicate the spurious
ID. Refcounting or other memory management scheme should
be used to ensure an object and its ID are only freed once.
This allocator is designed to scale reasonably well in multithread
setup. As it is aimed at being a faster replacement to the current
id-pool, a benchmark has been implemented alongside unit tests.
The benchmark is composed of 4 rounds: 'new', 'del', 'mix', and 'rnd'.
Respectively
+ 'new': only allocate IDs
+ 'del': only free IDs
+ 'mix': allocate, sequential free, then allocate ID.
+ 'rnd': allocate, random free, allocate ID.
Randomized freeing is done by swapping the latest allocated ID with any
from the range of currently allocated ID, which is reminiscent of the
Fisher-Yates shuffle. This evaluates freeing non-sequential IDs,
which is the more natural use-case.
For this specific round, the id-pool performance is such that a timeout
of 10 seconds is added to the benchmark:
$ ./tests/ovstest test-id-fpool benchmark 10000 1
Benchmarking n=10000 on 1 thread.
type\thread: 1 Avg
id-fpool new: 1 1 ms
id-fpool del: 1 1 ms
id-fpool mix: 2 2 ms
id-fpool rnd: 2 2 ms
id-pool new: 4 4 ms
id-pool del: 2 2 ms
id-pool mix: 6 6 ms
id-pool rnd: 431 431 ms
$ ./tests/ovstest test-id-fpool benchmark 100000 1
Benchmarking n=100000 on 1 thread.
type\thread: 1 Avg
id-fpool new: 2 2 ms
id-fpool del: 2 2 ms
id-fpool mix: 3 3 ms
id-fpool rnd: 4 4 ms
id-pool new: 12 12 ms
id-pool del: 5 5 ms
id-pool mix: 16 16 ms
id-pool rnd: 10000+ -1 ms
$ ./tests/ovstest test-id-fpool benchmark 1000000 1
Benchmarking n=1000000 on 1 thread.
type\thread: 1 Avg
id-fpool new: 15 15 ms
id-fpool del: 12 12 ms
id-fpool mix: 34 34 ms
id-fpool rnd: 48 48 ms
id-pool new: 276 276 ms
id-pool del: 286 286 ms
id-pool mix: 448 448 ms
id-pool rnd: 10000+ -1 ms
Running only a performance test on the fast pool:
$ ./tests/ovstest test-id-fpool perf 1000000 1
Benchmarking n=1000000 on 1 thread.
type\thread: 1 Avg
id-fpool new: 15 15 ms
id-fpool del: 12 12 ms
id-fpool mix: 34 34 ms
id-fpool rnd: 47 47 ms
$ ./tests/ovstest test-id-fpool perf 1000000 2
Benchmarking n=1000000 on 2 threads.
type\thread: 1 2 Avg
id-fpool new: 11 11 11 ms
id-fpool del: 10 10 10 ms
id-fpool mix: 24 24 24 ms
id-fpool rnd: 30 30 30 ms
$ ./tests/ovstest test-id-fpool perf 1000000 4
Benchmarking n=1000000 on 4 threads.
type\thread: 1 2 3 4 Avg
id-fpool new: 9 11 11 10 10 ms
id-fpool del: 5 6 6 5 5 ms
id-fpool mix: 16 16 16 16 16 ms
id-fpool rnd: 20 20 20 20 20 ms
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add a lockless multi-producer/single-consumer (MPSC), linked-list based,
intrusive, unbounded queue that does not require deferred memory
management.
The queue is designed to improve the specific MPSC setup. A benchmark
accompanies the unit tests to measure the difference in this configuration.
A single reader thread polls the queue while N writers enqueue elements
as fast as possible. The mpsc-queue is compared against the regular ovs-list
as well as the guarded list. The latter usually offers a slight improvement
by batching the element removal, however the mpsc-queue is faster.
The average is of each producer threads time:
$ ./tests/ovstest test-mpsc-queue benchmark 3000000 1
Benchmarking n=3000000 on 1 + 1 threads.
type\thread: Reader 1 Avg
mpsc-queue: 167 167 167 ms
list(spin): 89 80 80 ms
list(mutex): 745 745 745 ms
guarded list: 788 788 788 ms
$ ./tests/ovstest test-mpsc-queue benchmark 3000000 2
Benchmarking n=3000000 on 1 + 2 threads.
type\thread: Reader 1 2 Avg
mpsc-queue: 98 97 94 95 ms
list(spin): 185 171 173 172 ms
list(mutex): 203 199 203 201 ms
guarded list: 269 269 188 228 ms
$ ./tests/ovstest test-mpsc-queue benchmark 3000000 3
Benchmarking n=3000000 on 1 + 3 threads.
type\thread: Reader 1 2 3 Avg
mpsc-queue: 76 76 65 76 72 ms
list(spin): 246 110 240 238 196 ms
list(mutex): 542 541 541 539 540 ms
guarded list: 535 535 507 511 517 ms
$ ./tests/ovstest test-mpsc-queue benchmark 3000000 4
Benchmarking n=3000000 on 1 + 4 threads.
type\thread: Reader 1 2 3 4 Avg
mpsc-queue: 73 68 68 68 68 68 ms
list(spin): 294 275 279 277 282 278 ms
list(mutex): 346 309 287 345 302 310 ms
guarded list: 378 319 334 378 351 345 ms
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
No unit test exist currently for the ovs-barrier type.
It is however crucial as a building block and should be verified to work
as expected.
Create a simple test verifying the basic function of ovs-barrier.
Integrate the test as part of the test suite.
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch introduces a new test for Tx packet
steering modes. First test validates the static mode,
by checking that all packets are transmitted on a single
queue (single PMD thread), then it tests the same with
enabling hash based packet steering, ensuring packets
are transmitted on both queues.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds Rx and Tx per-queue statistics. It will be
used to test hash-based Tx packet steering. Only "bytes",
and "packets" per-queue custom statistics are added, as
there are no global "errors" counters in netdev-dummy.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The encap & decap actions are extended to support MPLS packet type.
Encap & decap actions adds and removes MPLS header at start of the
packet.
The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS
header between ethernet header and the IP header. Though this behaviour
is fine for L3 VPN where an IP packet is encapsulated inside a MPLS
tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the
ethernet packets must be encapsulated inside MPLS tunnel.
In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the
start of packet as depicted below.
Encapsulation:
Actions - encap(mpls),encap(ethernet)
Incoming packet -> | ETH | IP | Payload |
1 Actions - encap(mpls) [Datapath action - ADD_MPLS:0x8847]
Outgoing packet -> | MPLS | ETH | Payload|
2 Actions - encap(ethernet) [ Datapath action - push_eth ]
Outgoing packet -> | ETH | MPLS | ETH | Payload|
Decapsulation:
Incoming packet -> | ETH | MPLS | ETH | IP | Payload |
Actions - decap(),decap(packet_type(ns=0,type=0))
1 Actions - decap() [Datapath action - pop_eth)
Outgoing packet -> | MPLS | ETH | IP | Payload|
2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558]
Outgoing packet -> | ETH | IP | Payload|
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently ingress policing uses the basic classifier to apply traffic
control filters if hardware offload is not enabled, in which case it
uses matchall. This change changes the behavior to always use matchall,
and fall back onto basic if the kernel is built without matchall
support.
The system tests are modified to allow either basic or matchall
classification on the ingestion filter, and to allow either 10000 or
10240 packets for the packet burst filter. 10000 is accurate for kernel
5.14 and the most recent iproute2, however, 10240 is left for
compatibility with older kernels.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
tests/oss-fuzz/flow_extract_target.c:59:53:
error: too few arguments to function call, expected 4, have 1
uint16_t tcp_flags = parse_tcp_flags(&packet);
~~~~~~~~~~~~~~~ ^
Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match.")
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43498
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::
"in_port=1,actions=2"
"in_port=2,actions=IN_PORT"
"in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
"in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"
There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.
In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules. "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.
Design:
Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):
- recirc_id
- in_port
- packet_type
- dl_type
- vlan_tci (CFI + VID) - in most cases
- nw_frag - for ip packets
Not all of these fields are related to packet itself. We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing. It also seems safe to assume that we're working
with Ethernet packets. So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.
'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map. We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization. CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case. 'nw_frag' takes 2 bits of PCP inside
the simple match mark.
New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only. 'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:
- 'recirc_id' in flow match is 0.
- 'packet_type' in flow match is Ethernet.
- Flow wildcards contains only minimal set of non-wildcarded fields
(listed above).
If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows. This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.
Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required. So, in this case EMC and
SMC lookups are disabled. We may optimize this path in the future by
bypassing the dpcls lookup too.
Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.
Performance results when compared to EMC:
Test setup:
virtio-user OVS virtio-user
Testpmd1 ------------> pmd1 ------------> Testpmd2
(txonly) x<------ pmd2 <------------ (mac swap)
Single stream of 64byte packets. Actions:
in_port=vhost0,actions=vhost1
in_port=vhost1,actions=vhost0
Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt : Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy : Testpmd2 ------> pmd2 --->x Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets. This should be closer to the performance of a
VM-to-Phy scenario.
Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.
+----------------+------------------------+------------------------+
| | Default (-g -O2) | "-Ofast -march=native" |
| Scenario +------------+-----------+------------+-----------+
| | GCC | Clang | GCC | Clang |
+----------------+------------+-----------+------------+-----------+
| Virt-to-Virt | +18.9% | +25.5% | +10.8% | +16.7% |
| Virt-to-NoCopy | +24.3% | +33.7% | +14.9% | +22.0% |
+----------------+------------+-----------+------------+-----------+
For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality. Performance
difference for the non-simple flows is within a margin of error.
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add support for monitor_cond_since / update3 to python-ovs to
allow more efficient reconnections when connecting to clustered
OVSDB servers.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Recently there has been a lot of press about the "trojan source" attack,
where Unicode characters are used to obfuscate the true functionality of
code. This attack didn't effect OVS, but adding the check here will help
guard against it sneaking in later.
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netlink policy unit test contains test fixture data that is
subject to endianness and currently fails on big endian systems.
Store the fixture data in a struct to ensure proper byte order for
the header data.
Also fix improper style for sizeof with expressions.
Fixes: bfee9f6c0115 ("netlink: Add support for parsing link layer address.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Move EAL logs and commonly ignored logs to a common macro.
Remove/update obsolete ones (like i40e [1], timer [2], EAL [3][4] logs).
Set log level for DPDK drivers to error only: the rationale is that we are
not testing DPDK drivers in system-dpdk.
Extend regex on hugepage logs since a check on hugepages availability is
already present on OVS side, and as a consequence, we don't care about
the warnings on availability for certain hugepage size.
Add logs checks for MFEX tests that were missing them.
1: https://git.dpdk.org/dpdk/commit/?id=a075ce2b3e8c
2: https://git.dpdk.org/dpdk/commit/?id=c1077933d45b
3: https://git.dpdk.org/dpdk/commit/?id=e9b3d79b0696
4: https://git.dpdk.org/dpdk/commit/?id=c69150679891
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In case of native tunnel with bfd enabled, if the MAC address of the
remote end's interface changes (e.g. because it got rebooted, and the
MAC address is allocated dynamically), the BFD session will never be
re-established.
This happens because the local tunnel neigh entry doesn't get updated,
and the local end keeps sending BFD packets with the old destination
MAC address. This was not an issue until
b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
because ARP requests were snooped as well avoiding the problem.
Fix this by snooping the incoming packets in the slow path, and
updating the neigh cache accordingly.
Fixes: b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2002430
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
with the command is now possible to change the aging time of the
cache entries.
For the existing entries the aging time is updated only if the
current expiration is greater than the new one. In any case, the next
refresh will set it to the new value.
This is intended mostly for debugging purpose.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Waiting only on the vhost user port to be ready is not enough since a
tap is also initialized by testpmd and is used to inject/receive packets
in/from the kernel.
Wait on the tap link status.
Fixes: 18db7ec5eb83 ("system-dpdk: Improve vhost-user ping tests reliability.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds support for DPDK v21.11, it includes the following
changes.
1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*
5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*
In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.
For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Instead of waiting 10 seconds for testpmd to start, this
patch makes use of OVS_WAIT_UNTIL() macro to wait for
the virtio device readiness notification in ovs-vswitchd
logs.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
It seems that on slow system with high concurrency and cpu contention
time/warp is not accurate enough for the ALB unit tests with the minimum
time/warp that was used to hit an amount of events. This results in some
intermittent test failures.
As those tests are just waiting for a certain amount of events to occur
and there is no functional change during that time let's do the time/warp
again with higher values.
With this no failures are seen in several hundred runs.
Fixes: a83a406096e9 ("dpif-netdev: Sync PMD ALB state with user commands.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The next log line number should be updated to ensure that the
anticipated log has occurred again after more time has passed.
Fixes: a83a406096e9 ("dpif-netdev: Sync PMD ALB state with user commands.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
'dataofs' field of TCP header indicates the TCP header length. The
length should be >= 20 bytes/4 and <= TCP data length. This patch is
to test the 'dataofs' and not parse layer 4 fields when meet bad
dataofs.
This behavior is consistent with the openvswitch kernel module.
Fixes: 5a51b2cd3483 ("lib/ofpbuf: Remove 'l7' pointer.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Without this fix, flowgen.py generates bad tcp pkts.
tcpdump reports "bad hdr length 4 - too short" with the pcap
generated by flowgen.py
This patch is to correct pkt data endianness
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Rows that refer to rows that were inserted in the current IDL run should
only be reparsed if they don't get deleted (become orphan) in the current
IDL run.
Fixes: 7b8aeadd60c8 ("ovsdb-idl: Re-parse backrefs of inserted rows only once.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb_atom_string and json_string are basically the same data structure
and ovsdb-server frequently needs to convert one to another. We can
avoid that by using json_string from the beginning for all ovsdb
strings. So, the conversion turns into simple json_clone(), i.e.
increment of a reference counter. This change gives a moderate
performance boost in some scenarios, improves the code clarity and
may be useful for future development.
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Length in Data Link Header for these packets should not include
source and destination MACs or the length field itself.
Therefore, it should be 14 bytes less, otherwise other network
tools like wireshark complains:
Expert Info (Error/Malformed):
Length field value goes past the end of the payload
Additionally fixing the printing of the packet/flow configuration,
as it currently prints '%s=%s' strings without any real data.
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The match keyword "igmp" is not supported in ofp-parse, which means
that flow dumps cannot be restored. Previously a workaround was
added to ovs-save to avoid changing output in stable branches.
This patch changes the output to print igmp match in the accepted
ofp-parse format (ip,nw_proto=2) and print igmp_type/code as generic
tp_src/dst. Tests are added, and NEWS is updated to reflect this change.
The workaround in ovs-save is still included to ensure that flows
can be restored when upgrading an older ovs-vswitchd. This workaround
should be removed in later versions.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Salvatore Daniele <sdaniele@redhat.com>
Co-authored-by: Salvatore Daniele <sdaniele@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Commit dc0bd12f5b04 removed restriction that tunnel endpoint must be a
bridge port. So, currently OVS has to check if the native tunnel needs
to be terminated regardless of the output port. Unfortunately, there
is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst'
match to the megaflow that ends up in the corresponding datapath flow.
And since tunneling works on L3 level and not restricted by any
particular bridge, this extra match criteria is added to every
datapath flow on every bridge even if that bridge cannot be part of
a tunnel processing.
For example, if OVS has at least one tunnel configured and we're
adding a completely separate bridge with 2 ports and simple rules
to forward packets between two ports, there still will be a match on
a destination mac address:
1. <create a tunnel configuration in OVS>
2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev
3. ovs-vsctl add-port br-non-tunnel port0
-- add-port br-non-tunnel port1
4. ovs-ofctl del-flows br-non-tunnel
5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1
6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0
# ovs-appctl ofproto/trace br-non-tunnel in_port=port0
Flow: in_port=1,vlan_tci=0x0000,
dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000
bridge("br-non-tunnel")
-----------------------
0. in_port=1, priority 32768
output:2
Final flow: unchanged
Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000
Datapath actions: 5 ^^^^^^^^^^^^^^^^^^^^^^^^
This increases the number of upcalls and installed datapath flows,
since separate flow needs to be installed per destination MAC, reducing
the switching performance. This also blocks datapath performance
optimizations that are based on the datapath flow simplicity.
In general, in order to be a tunnel endpoint, port has to have an IP
address. Hence native tunnel termination should be attempted only
for such ports. This allows to avoid extra matches in most cases.
Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.")
Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
xlate_check_pkt_larger() sets ctx->exit to 'true' at the end
causing the translation to stop. This results in incomplete
datapath rules.
For example, for the below OF rules configured on a bridge,
table=0,in_port=1 actions=load:0x1->NXM_NX_REG1[[]],resubmit(,1),
load:0x2->NXM_NX_REG1[[]],resubmit(,1),
load:0x3->NXM_NX_REG1[[]],resubmit(,1)
table=1,in_port=1,reg1=0x1 actions=check_pkt_larger(200)->NXM_NX_REG0[[0]],
resubmit(,4)
table=1,in_port=1,reg1=0x2 actions=output:2
table=1,in_port=1,reg1=0x3 actions=output:4
table=4,in_port=1 actions=output:3
The datapath flow should be:
check_pkt_len(size=200,gt(3),le(3)),2,4
But right now it is:
check_pkt_len(size=200,gt(3),le(3))
Actions after the first resubmit(,1) in the first flow in table 0
are never applied. This patch fixes this issue.
Fixes: 5b34f8fc3b38 ("Add a new OVS action check_pkt_larger")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2018365
Reported-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously, when a user enabled PMD auto load balancer with
pmd-auto-lb="true", some conditions such as number of PMDs/RxQs
that were required for a rebalance to take place were checked.
If the configuration meant that a rebalance would not take place
then PMD ALB was logged as 'disabled' and not run.
Later, if the PMD/RxQ configuration changed whereby a rebalance
could be effective, PMD ALB was logged as 'enabled' and would run at
the appropriate time.
This worked ok from a functional view but it is unintuitive for the
user reading the logs.
e.g. with one PMD (PMD ALB would not be effective)
User enables ALB, but logs say it is disabled because it won't run.
$ ovs-vsctl set open_vSwitch . other_config:pmd-auto-lb="true"
|dpif_netdev|INFO|PMD auto load balance is disabled
No dry run takes place.
Add more PMDs (PMD ALB may be effective).
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=50
|dpif_netdev|INFO|PMD auto load balance is enabled ...
Dry run takes place.
|dpif_netdev|DBG|PMD auto load balance performing dry run.
A better approach is to simply reflect back the user enable/disable
state in the logs and deal with whether the rebalance will be effective
when needed. That is the approach taken in this patch.
To cut down on unneccessary work, some basic checks are also made before
starting a PMD ALB dry run and debug logs can indicate this to the user.
e.g. with one PMD (PMD ALB would not be effective)
User enables ALB, and logs confirm the user has enabled it.
$ ovs-vsctl set open_vSwitch . other_config:pmd-auto-lb="true"
|dpif_netdev|INFO|PMD auto load balance is enabled...
No dry run takes place.
|dpif_netdev|DBG|PMD auto load balance nothing to do, not enough non-isolated PMDs or RxQs.
Add more PMDs (PMD ALB may be effective).
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=50
Dry run takes place.
|dpif_netdev|DBG|PMD auto load balance performing dry run.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds a general way of viewing/configuring datapath
cache sizes. With an implementation for the netlink interface.
The ovs-dpctl/ovs-appctl show commands will display the
current cache sizes configured:
$ ovs-dpctl show
system@ovs-system:
lookups: hit:25 missed:63 lost:0
flows: 0
masks: hit:282 total:0 hit/pkt:3.20
cache: hit:4 hit-rate:4.54%
caches:
masks-cache: size:256
port 0: ovs-system (internal)
port 1: br-int (internal)
port 2: genev_sys_6081 (geneve: packet_type=ptap)
port 3: br-ex (internal)
port 4: eth2
port 5: sw0p1 (internal)
port 6: sw0p3 (internal)
A specific cache can be configured as follows:
$ ovs-appctl dpctl/cache-set-size DP CACHE SIZE
$ ovs-dpctl cache-set-size DP CACHE SIZE
For example to disable the cache do:
$ ovs-dpctl cache-set-size system@ovs-system masks-cache 0
Setting cache size successful, new size 0.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
If user frequently changes a lot of rows in a database, transaction
history could grow way larger than the database itself. This wastes
a lot of memory and also makes monitor_cond_since slower than
usual monotor_cond if the transaction id is old enough, because
re-construction of the changes from a history is slower than just
creation of initial database snapshot. This is also the case if
user deleted a lot of data, so transaction history still holds all of
it while the database itself doesn't.
In case of current lb-per-service model in ovn-kubernetes, each
load-balancer is added to every logical switch/router. Such a
transaction touches more than a half of a OVN_Northbound database.
And each of these transactions is added to the transaction history.
Since transaction history depth is 100, in worst case scenario,
it will hold 100 copies of a database increasing memory consumption
dramatically. In tests with 3000 LBs and 120 LSs, memory goes up
to 3 GB, while holding at 30 MB if transaction history disabled in
the code.
Fixing that by keeping count of the number of ovsdb_atom's in the
database and not allowing the total number of atoms in transaction
history to grow larger than this value. Counting atoms is fairly
cheap because we don't need to iterate over them, so it doesn't have
significant performance impact. It would be ideal to measure the
size of individual atoms, but that will hit the performance.
Counting cells instead of atoms is not sufficient, because OVN
users are adding hundreds or thousands of atoms to a single cell,
so they are largely different in size.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Currently, there are some patches with the tags wrongly written (with
space instead of dash ) and this may prevent some automatic system or CI
to detect them correctly.
This commit adds a check in checkpatch to be sure the tag is written
correctly with dash and not with space.
The tags supported by the commit are:
Acked-by, Reported-at, Reported-by, Requested-by, Reviewed-by, Submitted-at
and Suggested-by.
It's not necessary to add "Signed-off-by" since it's already checked in
checkpatch.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, pyOpenSSL is half-deprecated upstream and so it's removed on
some distributions (for example on CentOS Stream 9,
https://issues.redhat.com/browse/CS-336), but since OVS only
supports Python 3 it's possible to replace pyOpenSSL with "import ssl"
included in base Python 3.
Stream recv and send had to be splitted as _recv and _send, since SSLError
is a subclass of socket.error and so it was not possible to except for
SSLWantReadError and SSLWantWriteError in recv and send of SSLStream.
TCPstream._open cannot be used in SSLStream, since Python ssl module
requires the SSL socket to be created before connecting it, so
SSLStream._open needs to create the socket, create SSL socket and then
connect the SSL socket.
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-at: https://bugzilla.redhat.com/1988429
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Terry Wilson <twilson@redhat.com>
Tested-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
CT zone could be set from a field that is not included in frozen
metadata. Consider the example rules which are typically seen in
OpenStack security group rules:
priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
The zone is set from the first rule's ct action. These two rules will
generate two megaflows: the first one uses zone=5 to query the CT module,
the second one sets the zone-id from the first megaflow and commit to CT.
The current implementation will generate a megaflow that does not use
ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
set by an Imm not a field.
Consider a situation that one changes the zone id (for example to 15)
in the first rule, however, still keep the second rule unchanged. During
this change, there is traffic hitting the two generated megaflows, the
revaldiator would revalidate all megaflows, however, the revalidator will
not change the second megaflow, because zone=5 is recorded in the
megaflow, so the xlate will still translate the commit action into zone=5,
and the new traffic will still commit to CT as zone=5, not zone=15,
resulting in taffic drops and other issues.
Just like OVS set-field convention, if a field X is set by Y
(Y is a variable not an Imm), we should also mask Y as a match
in the generated megaflow. An exception is that if the zone-id is
set by the field that is included in the frozen state (i.e. regs) and this
upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
as the recirc_id is a hash of the values in these fields, and it will change
following the changes of these fields. When the recirc_id changes,
all megaflows with the old recirc id will be invalid later.
Fixes: 07659514c3 ("Add support for connection tracking.")
Reported-by: Sai Su <susai.ss@bytedance.com>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Source port is based on a packet hash and hash depends on a chosen
implementation. Masking it to avoid test failures with '-msse4.2'.
Fixes: 7e6b41ac8d9d ("dpif-netdev: Fix crash when PACKET_OUT is metered.")
Reported-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Added 2 separate test-cases for DPCLS and DPIF commands:
1018: PMD - dpcls configuration
1017: PMD - dpif configuration
The above added tests are to test the commands which are used
to either get or set the dpcls and dpif function pointers to
various different implementations like AVX512 or auto-validator
based on different CPU ISA supported.
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.
Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.
This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.
Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but
it's linear in terms of number of comparisons. And it also clones
all the atoms along the way. In most cases size of a diff is much
smaller than the size of the original datum, this allows to perform
the same operation in-place with only O(diff->n * log2(old->n))
comparisons and O(old->n + diff->n) memory copies with memcpy.
Using this function while applying diffs read from the storage gives
a significant performance boost and allows to execute much more
transactions per second.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
The ipf collect original fragment packets and reass a new pkt
to do the conntrack logic. After finish the conntrack things
copy the ct meta info to each original packet and modify the
l4 header in the first fragment. It should modify the ip src/dst
info for all the fragments.
Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Co-authored-by: luke.li <luke.li@ucloud.cn>
Signed-off-by: luke.li <luke.li@ucloud.cn>
Reviewed-by: wangze <wangze712@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Test case:
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As Frode Nordahl points out in [0], it is possible for the
python regex module to enter a case of catastrophic backtracking
which causes oscillation between states and hangs the checkpatch
script.
One suggested solution to these cases is to use an anchor[1] in
the regex, which should force the backtrack to exit early.
However, when I tested this, it didn't seem to improve anything
(since the start is already anchored, and trying to anchor the
end results in the same hang).
Instead, we explicitly check that the line ends with '\\' before
trying to match on the 'if-inside-a-macro' check. A new check
is added to catch the case in checkpatch.
0: https://mail.openvswitch.org/pipermail/ovs-dev/2021-August/386881.html
1: https://stackoverflow.com/questions/22072406/preventing-any-backtracking-in-regex-past-a-specific-pattern
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When a PACKET_OUT has output port of OFPP_TABLE, and the rule
table includes a meter and this causes the packet to be deleted,
execute with a clone of the packet, restoring the original packet
if it is changed by the execution.
Add tests to verify the original issue is fixed, and that the fix
doesn't break tunnel processing.
Reported-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Current string serialization code puts all characters one by one.
This is slow because dynamic string needs to perform length checks
on every ds_put_char() and it's also doesn't allow compiler to use
better memory copy operations, i.e. doesn't allow copying few bytes
at once.
Special symbols are rare in a typical database. Quotes are frequent,
but not too frequent. In databases created by ovn-kubernetes, for
example, usually there are at least 10 to 50 chars between quotes.
So, it's better to count characters that doesn't require escaping
and use fast data copy for the whole sequential block.
Testing with a synthetic benchmark (included) on my laptop shows
following performance improvement:
Size Q S Before After Diff
-----------------------------------------------------
100000 0 0 : 0.227 ms 0.142 ms -37.4 %
100000 2 1 : 0.277 ms 0.186 ms -32.8 %
100000 10 1 : 0.361 ms 0.309 ms -14.4 %
10000000 0 0 : 22.720 ms 12.160 ms -46.4 %
10000000 2 1 : 27.470 ms 19.300 ms -29.7 %
10000000 10 1 : 37.950 ms 31.250 ms -17.6 %
100000000 0 0 : 239.600 ms 126.700 ms -47.1 %
100000000 2 1 : 292.400 ms 188.600 ms -35.4 %
100000000 10 1 : 387.700 ms 321.200 ms -17.1 %
Here Q - probability (%) for a character to be a '\"' and
S - probability (%) to be a special character ( < 32).
Testing with a closer to real world scenario shows overall decrease
of the time needed for database compaction by ~5-10 %. And this
change also decreases CPU consumption in general, because string
serialization is used in many different places including ovsdb
monitors and raft.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
FreeBSD tests in Cirrus CI are broken and, I guess, windows tests too:
89. library.at:258: testing netlink policy ...
./library.at:259: ovstest test-netlink-policy ll_addr
--- /dev/null 2021-08-20 19:02:41.907547000 +0000
+++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/89/stderr
@@ -0,0 +1 @@
+ovstest: unknown command 'test-netlink-policy'; use --help for help
./library.at:259: exit code was 1, expected 0
89. library.at:258: 89. netlink policy (library.at:258): FAILED
'tests/test-netlink-policy.c' is built only on Linux, test
must be skipped on all other platforms to unblock CI.
Fixes: bfee9f6c0115 ("netlink: Add support for parsing link layer address.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Frode Nordahl <frode.nordahl@canonical.com>
This patch adds 2 new APIs in the ovsdb-idl client library
- ovsdb_idl_server_has_table() and ovsdb_idl_server_has_column() to
query if a table and a column is present in the IDL or not. This
patch also adds IDL helper functions which are auto generated from
the schema which makes it easier for the clients.
These APIs are required for scenarios where the server schema is old and
missing a table or column and the client (built with a new schema
version) does a transaction with the missing table or column. This
results in a continuous loop of transaction failures.
Related-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1992705
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>