2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

3402 Commits

Author SHA1 Message Date
Eelco Chaudron
4056ae4875 ofp-flow: Skip flow reply if it exceeds the maximum message size.
Currently, if a flow reply results in a message which exceeds
the maximum reply size, it will assert OVS. This would happen
when OVN uses OpenFlow15 to add large flows, and they get read
using OpenFlow10 with ovs-ofctl.

This patch prevents this and adds a test case to make sure the
code behaves as expected.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:11:58 +01:00
Paolo Valerio
77967b53fe conntrack: Check TCP state while testing established connections pick up.
When testing if an established connection is picked up, it could be
useful to verify that the protocol state matches the expectation, that
is, it moves to ESTABLISHED, as there's a chance that code modifications
may break the TCP conn_update() in a way that it returns CT_UPDATE_VALID
without moving to the correct state leading to a false positive.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:09:33 +01:00
Ilya Maximets
6e13565dd3 ovsdb: transaction: Keep one entry in the transaction history.
If a single transaction exceeds the size of the whole database (e.g.,
a lot of rows got removed and new ones added), transaction history will
be drained.  This leads to sending UUID_ZERO to the clients as the last
transaction id in the next monitor update, because monitor doesn't
know what was the actual last transaction id.  In case of a re-connect
that will cause re-downloading of the whole database, since the
client's last_id will be out of sync.

One solution would be to store the last transaction ID separately
from the actual transactions, but that will require a careful
management in cases where database gets reset and the history needs
to be cleared.  Keeping the one last transaction instead to avoid
the problem.  That should not be a big concern in terms of memory
consumption, because this last transaction will be removed from the
history once the next transaction appeared.  This is also not a concern
for a fast re-sync, because this last transaction will not be used
for the monitor reply; it's either client already has it, so no need
to send, or it's a history miss.

The test updated to not check the number of atoms if there is only
one transaction in the history.

Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
Reported-at: https://bugzilla.redhat.com/2044621
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:05:20 +01:00
Eelco Chaudron
dadd8357f2 ofproto-dpif: Fix issue with non-reversible actions on a patch ports.
For patch ports, the is_last_action value is not propagated and is
always set to true. This causes non-reversible actions to modify the
packet, and the original content is not preserved when processing
the remaining actions.

This patch propagates the is_last_action flag for patch port related
actions. In addition, it also fixes a general last action propagation
to the individual actions.

Fixed check_pkt_larger as last action, as it is a valid case for the
drop action, so it should not be skipped.

Fixes: feee58b95 ("ofproto-dpif-xlate: Keep track of the last action")
Fixes: 5b34f8fc3 ("Add a new OVS action check_pkt_larger")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-25 22:41:42 +01:00
David Marchand
8723063c3c system-dpdk: Fix MFEX logs check.
Some warning logs must be waived when using the net/pcap DPDK driver.
Those logs can affect different DPDK drivers (like mlx5) and the tests in
system-dpdk are not testing MTU and Rx checksum, we might as well ignore
those warnings from OVS.

Fixes: d446dcb7e03f ("system-dpdk: Refactor common logs matching.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-01-20 15:48:34 +00:00
Gaetan Rivet
2eac33c6cc id-fpool: Module for fast ID generation.
The current id-pool module is slow to allocate the
next valid ID, and can be optimized when restricting
some properties of the pool.

Those restrictions are:

  * No ability to add a random ID to the pool.

  * A new ID is no more the smallest possible ID.
    It is however guaranteed to be in the range of

       [floor, last_alloc + nb_user * cache_size + 1].

    where 'cache_size' is the number of ID in each per-user
    cache.  It is defined as 'ID_FPOOL_CACHE_SIZE' to 64.

  * A user should never free an ID that is not allocated.
    No checks are done and doing so will duplicate the spurious
    ID.  Refcounting or other memory management scheme should
    be used to ensure an object and its ID are only freed once.

This allocator is designed to scale reasonably well in multithread
setup.  As it is aimed at being a faster replacement to the current
id-pool, a benchmark has been implemented alongside unit tests.

The benchmark is composed of 4 rounds: 'new', 'del', 'mix', and 'rnd'.
Respectively

  + 'new': only allocate IDs
  + 'del': only free IDs
  + 'mix': allocate, sequential free, then allocate ID.
  + 'rnd': allocate, random free, allocate ID.

Randomized freeing is done by swapping the latest allocated ID with any
from the range of currently allocated ID, which is reminiscent of the
Fisher-Yates shuffle.  This evaluates freeing non-sequential IDs,
which is the more natural use-case.

For this specific round, the id-pool performance is such that a timeout
of 10 seconds is added to the benchmark:

   $ ./tests/ovstest test-id-fpool benchmark 10000 1
   Benchmarking n=10000 on 1 thread.
    type\thread:       1    Avg
   id-fpool new:       1      1 ms
   id-fpool del:       1      1 ms
   id-fpool mix:       2      2 ms
   id-fpool rnd:       2      2 ms
    id-pool new:       4      4 ms
    id-pool del:       2      2 ms
    id-pool mix:       6      6 ms
    id-pool rnd:     431    431 ms

   $ ./tests/ovstest test-id-fpool benchmark 100000 1
   Benchmarking n=100000 on 1 thread.
    type\thread:       1    Avg
   id-fpool new:       2      2 ms
   id-fpool del:       2      2 ms
   id-fpool mix:       3      3 ms
   id-fpool rnd:       4      4 ms
    id-pool new:      12     12 ms
    id-pool del:       5      5 ms
    id-pool mix:      16     16 ms
    id-pool rnd:  10000+     -1 ms

   $ ./tests/ovstest test-id-fpool benchmark 1000000 1
   Benchmarking n=1000000 on 1 thread.
    type\thread:       1    Avg
   id-fpool new:      15     15 ms
   id-fpool del:      12     12 ms
   id-fpool mix:      34     34 ms
   id-fpool rnd:      48     48 ms
    id-pool new:     276    276 ms
    id-pool del:     286    286 ms
    id-pool mix:     448    448 ms
    id-pool rnd:  10000+     -1 ms

Running only a performance test on the fast pool:

   $ ./tests/ovstest test-id-fpool perf 1000000 1
   Benchmarking n=1000000 on 1 thread.
    type\thread:       1    Avg
   id-fpool new:      15     15 ms
   id-fpool del:      12     12 ms
   id-fpool mix:      34     34 ms
   id-fpool rnd:      47     47 ms

   $ ./tests/ovstest test-id-fpool perf 1000000 2
   Benchmarking n=1000000 on 2 threads.
    type\thread:       1      2    Avg
   id-fpool new:      11     11     11 ms
   id-fpool del:      10     10     10 ms
   id-fpool mix:      24     24     24 ms
   id-fpool rnd:      30     30     30 ms

   $ ./tests/ovstest test-id-fpool perf 1000000 4
   Benchmarking n=1000000 on 4 threads.
    type\thread:       1      2      3      4    Avg
   id-fpool new:       9     11     11     10     10 ms
   id-fpool del:       5      6      6      5      5 ms
   id-fpool mix:      16     16     16     16     16 ms
   id-fpool rnd:      20     20     20     20     20 ms

Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 19:30:17 +01:00
Gaetan Rivet
5396ba5b21 mpsc-queue: Module for lock-free message passing.
Add a lockless multi-producer/single-consumer (MPSC), linked-list based,
intrusive, unbounded queue that does not require deferred memory
management.

The queue is designed to improve the specific MPSC setup.  A benchmark
accompanies the unit tests to measure the difference in this configuration.
A single reader thread polls the queue while N writers enqueue elements
as fast as possible.  The mpsc-queue is compared against the regular ovs-list
as well as the guarded list.  The latter usually offers a slight improvement
by batching the element removal, however the mpsc-queue is faster.

The average is of each producer threads time:

   $ ./tests/ovstest test-mpsc-queue benchmark 3000000 1
   Benchmarking n=3000000 on 1 + 1 threads.
    type\thread:  Reader      1    Avg
     mpsc-queue:     167    167    167 ms
     list(spin):      89     80     80 ms
    list(mutex):     745    745    745 ms
   guarded list:     788    788    788 ms

   $ ./tests/ovstest test-mpsc-queue benchmark 3000000 2
   Benchmarking n=3000000 on 1 + 2 threads.
    type\thread:  Reader      1      2    Avg
     mpsc-queue:      98     97     94     95 ms
     list(spin):     185    171    173    172 ms
    list(mutex):     203    199    203    201 ms
   guarded list:     269    269    188    228 ms

   $ ./tests/ovstest test-mpsc-queue benchmark 3000000 3
   Benchmarking n=3000000 on 1 + 3 threads.
    type\thread:  Reader      1      2      3    Avg
     mpsc-queue:      76     76     65     76     72 ms
     list(spin):     246    110    240    238    196 ms
    list(mutex):     542    541    541    539    540 ms
   guarded list:     535    535    507    511    517 ms

   $ ./tests/ovstest test-mpsc-queue benchmark 3000000 4
   Benchmarking n=3000000 on 1 + 4 threads.
    type\thread:  Reader      1      2      3      4    Avg
     mpsc-queue:      73     68     68     68     68     68 ms
     list(spin):     294    275    279    277    282    278 ms
    list(mutex):     346    309    287    345    302    310 ms
   guarded list:     378    319    334    378    351    345 ms

Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 19:30:17 +01:00
Gaetan Rivet
aec1081c7d tests: Add ovs-barrier unit test.
No unit test exist currently for the ovs-barrier type.
It is however crucial as a building block and should be verified to work
as expected.

Create a simple test verifying the basic function of ovs-barrier.
Integrate the test as part of the test suite.

Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 15:12:01 +01:00
Maxime Coquelin
844f141814 dpif-netdev.at: Add test for Tx packet steering.
This patch introduces a new test for Tx packet
steering modes. First test validates the static mode,
by checking that all packets are transmitted on a single
queue (single PMD thread), then it tests the same with
enabling hash based packet steering, ensuring packets
are transmitted on both queues.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-17 18:07:00 +01:00
Maxime Coquelin
e97112ce78 netdev-dummy: Introduce per rxq/txq statistics.
This patch adds Rx and Tx per-queue statistics. It will be
used to test hash-based Tx packet steering. Only "bytes",
and "packets" per-queue custom statistics are added, as
there are no global "errors" counters in netdev-dummy.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-17 18:07:00 +01:00
Martin Varghese
1917ace893 Encap & Decap actions for MPLS packet type.
The encap & decap actions are extended to support MPLS packet type.
Encap & decap actions adds and removes MPLS header at start of the
packet.

The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS
header between ethernet header and the IP header. Though this behaviour
is fine for L3 VPN where an IP packet is encapsulated inside a MPLS
tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the
ethernet packets must be encapsulated inside MPLS tunnel.

In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the
start of packet as depicted below.

Encapsulation:

Actions - encap(mpls),encap(ethernet)

Incoming packet -> | ETH | IP | Payload |

1 Actions -  encap(mpls) [Datapath action - ADD_MPLS:0x8847]

        Outgoing packet -> | MPLS | ETH | Payload|

2 Actions - encap(ethernet) [ Datapath action - push_eth ]

        Outgoing packet -> | ETH | MPLS | ETH | Payload|

Decapsulation:

Incoming packet -> | ETH | MPLS | ETH | IP | Payload |

Actions - decap(),decap(packet_type(ns=0,type=0))

1 Actions -  decap() [Datapath action - pop_eth)

        Outgoing packet -> | MPLS | ETH | IP | Payload|

2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558]

        Outgoing packet -> | ETH  | IP | Payload|

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-17 02:04:20 +01:00
Mike Pattrick
eb1ab5357b netdev-linux: Use matchall classifier for ingress policing.
Currently ingress policing uses the basic classifier to apply traffic
control filters if hardware offload is not enabled, in which case it
uses matchall. This change changes the behavior to always use matchall,
and fall back onto basic if the kernel is built without matchall
support.

The system tests are modified to allow either basic or matchall
classification on the ingestion filter, and to allow either 10000 or
10240 packets for the packet burst filter. 10000 is accurate for kernel
5.14 and the most recent iproute2, however, 10240 is left for
compatibility with older kernels.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-12 15:23:29 +01:00
Ilya Maximets
356f362068 tests/oss-fuzz: Fix the arguments of parse_tcp_flags.
tests/oss-fuzz/flow_extract_target.c:59:53:
     error: too few arguments to function call, expected 4, have 1
         uint16_t tcp_flags = parse_tcp_flags(&packet);
                              ~~~~~~~~~~~~~~~        ^

Fixes: e7e9973b80d3 ("dpif-netdev: Forwarding optimization for flows with a simple match.")
Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43498
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
2022-01-10 23:31:29 +01:00
Ilya Maximets
e7e9973b80 dpif-netdev: Forwarding optimization for flows with a simple match.
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::

  "in_port=1,actions=2"
  "in_port=2,actions=IN_PORT"
  "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
  "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"

There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.

In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules.  "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.

Design:

Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci (CFI + VID) - in most cases
  - nw_frag - for ip packets

Not all of these fields are related to packet itself.  We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing.  It also seems safe to assume that we're working
with Ethernet packets.  So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.

'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization.  CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
the simple match mark.

New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only.  'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:

  - 'recirc_id' in flow match is 0.
  - 'packet_type' in flow match is Ethernet.
  - Flow wildcards contains only minimal set of non-wildcarded fields
    (listed above).

If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows.  This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required.  So, in this case EMC and
SMC lookups are disabled.  We may optimize this path in the future by
bypassing the dpcls lookup too.

Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Performance results when compared to EMC:

Test setup:

             virtio-user   OVS    virtio-user
  Testpmd1  ------------>  pmd1  ------------>  Testpmd2
  (txonly)       x<------  pmd2  <------------ (mac swap)

Single stream of 64byte packets.  Actions:
  in_port=vhost0,actions=vhost1
  in_port=vhost1,actions=vhost0

Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt   :     Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy :     Testpmd2 ------> pmd2 --->x   Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets.  This should be closer to the performance of a
VM-to-Phy scenario.

Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.

 +----------------+------------------------+------------------------+
 |                |    Default (-g -O2)    | "-Ofast -march=native" |
 |   Scenario     +------------+-----------+------------+-----------+
 |                |     GCC    |   Clang   |     GCC    |   Clang   |
 +----------------+------------+-----------+------------+-----------+
 | Virt-to-Virt   |    +18.9%  |   +25.5%  |    +10.8%  |   +16.7%  |
 | Virt-to-NoCopy |    +24.3%  |   +33.7%  |    +14.9%  |   +22.0%  |
 +----------------+------------+-----------+------------+-----------+

For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality.  Performance
difference for the non-simple flows is within a margin of error.

Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-07 20:32:20 +01:00
Terry Wilson
46d44cf3be python: idl: Add monitor_cond_since support.
Add support for monitor_cond_since / update3 to python-ovs to
allow more efficient reconnections when connecting to clustered
OVSDB servers.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-06 16:45:56 +01:00
Mike Pattrick
0d1ffb7756 checkpatch: Detect "trojan source" attack.
Recently there has been a lot of press about the "trojan source" attack,
where Unicode characters are used to obfuscate the true functionality of
code. This attack didn't effect OVS, but adding the check here will help
guard against it sneaking in later.

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-04 19:14:11 +01:00
Frode Nordahl
2f2ae5b6bd tests: Fix endianness in netlink policy test fixtures.
The netlink policy unit test contains test fixture data that is
subject to endianness and currently fails on big endian systems.

Store the fixture data in a struct to ensure proper byte order for
the header data.

Also fix improper style for sizeof with expressions.

Fixes: bfee9f6c0115 ("netlink: Add support for parsing link layer address.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-04 19:14:11 +01:00
David Marchand
d446dcb7e0 system-dpdk: Refactor common logs matching.
Move EAL logs and commonly ignored logs to a common macro.
Remove/update obsolete ones (like i40e [1], timer [2], EAL [3][4] logs).
Set log level for DPDK drivers to error only: the rationale is that we are
not testing DPDK drivers in system-dpdk.
Extend regex on hugepage logs since a check on hugepages availability is
already present on OVS side, and as a consequence, we don't care about
the warnings on availability for certain hugepage size.
Add logs checks for MFEX tests that were missing them.

1: https://git.dpdk.org/dpdk/commit/?id=a075ce2b3e8c
2: https://git.dpdk.org/dpdk/commit/?id=c1077933d45b
3: https://git.dpdk.org/dpdk/commit/?id=e9b3d79b0696
4: https://git.dpdk.org/dpdk/commit/?id=c69150679891

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-03 19:47:43 +01:00
Paolo Valerio
ec2aa2ab46 ofproto-dpif-xlate: Snoop ingress packets and update neigh cache if needed.
In case of native tunnel with bfd enabled, if the MAC address of the
remote end's interface changes (e.g. because it got rebooted, and the
MAC address is allocated dynamically), the BFD session will never be
re-established.

This happens because the local tunnel neigh entry doesn't get updated,
and the local end keeps sending BFD packets with the old destination
MAC address. This was not an issue until
b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
because ARP requests were snooped as well avoiding the problem.

Fix this by snooping the incoming packets in the slow path, and
updating the neigh cache accordingly.

Fixes: b23ddcc57d41 ("tnl-neigh-cache: tighten arp and nd snooping.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2002430
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-17 20:32:13 +01:00
Paolo Valerio
02f95638a4 tnl-neigh-cache: Add tnl/neigh/aging command.
with the command is now possible to change the aging time of the
cache entries.

For the existing entries the aging time is updated only if the
current expiration is greater than the new one. In any case, the next
refresh will set it to the new value.

This is intended mostly for debugging purpose.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-17 20:31:57 +01:00
David Marchand
b5d2dbdbb5 system-dpdk: Fix race in vhost-user tests.
Waiting only on the vhost user port to be ready is not enough since a
tap is also initialized by testpmd and is used to inject/receive packets
in/from the kernel.
Wait on the tap link status.

Fixes: 18db7ec5eb83 ("system-dpdk: Improve vhost-user ping tests reliability.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-15 18:20:01 +01:00
Ian Stokes
17346b3899 dpdk: Update to use DPDK v21.11.
This commit adds support for DPDK v21.11, it includes the following
changes.

1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*

5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*

In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.

For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-12-09 18:40:14 +00:00
Maxime Coquelin
18db7ec5eb system-dpdk: Improve vhost-user ping tests reliability.
Instead of waiting 10 seconds for testpmd to start, this
patch makes use of OVS_WAIT_UNTIL() macro to wait for
the virtio device readiness notification in ovs-vswitchd
logs.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-09 14:36:37 +01:00
Kevin Traynor
4a7b58163f alb.at: Increase time/warp.
It seems that on slow system with high concurrency and cpu contention
time/warp is not accurate enough for the ALB unit tests with the minimum
time/warp that was used to hit an amount of events. This results in some
intermittent test failures.

As those tests are just waiting for a certain amount of events to occur
and there is no functional change during that time let's do the time/warp
again with higher values.

With this no failures are seen in several hundred runs.

Fixes: a83a406096e9 ("dpif-netdev: Sync PMD ALB state with user commands.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-07 15:10:36 +01:00
Kevin Traynor
09c4449b2d alb.at: Check for log from correct line number.
The next log line number should be updated to ensure that the
anticipated log has occurred again after more time has passed.

Fixes: a83a406096e9 ("dpif-netdev: Sync PMD ALB state with user commands.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-07 15:10:36 +01:00
lic121
1f5749c790 flow: Consider dataofs when parsing TCP packets.
'dataofs' field of TCP header indicates the TCP header length.  The
length should be >= 20 bytes/4 and <= TCP data length.  This patch is
to test the 'dataofs' and not parse layer 4 fields when meet bad
dataofs.

This behavior is consistent with the openvswitch kernel module.

Fixes: 5a51b2cd3483 ("lib/ofpbuf: Remove 'l7' pointer.")
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-03 23:45:26 +01:00
lic121
d4bed95963 tests/flowgen: Fix packet data endianness.
Without this fix, flowgen.py generates bad tcp pkts.
tcpdump reports "bad hdr length 4 - too short" with the pcap
generated by flowgen.py

This patch is to correct pkt data endianness

Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-03 23:20:15 +01:00
Dumitru Ceara
91e1ff5dde ovsdb-idl: Don't reparse orphaned rows.
Rows that refer to rows that were inserted in the current IDL run should
only be reparsed if they don't get deleted (become orphan) in the current
IDL run.

Fixes: 7b8aeadd60c8 ("ovsdb-idl: Re-parse backrefs of inserted rows only once.")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-30 13:44:47 +01:00
Ilya Maximets
dec4291684 ovsdb-data: Consolidate ovsdb atom and json strings.
ovsdb_atom_string and json_string are basically the same data structure
and ovsdb-server frequently needs to convert one to another.  We can
avoid that by using json_string from the beginning for all ovsdb
strings.  So, the conversion turns into simple json_clone(), i.e.
increment of a reference counter.  This change gives a moderate
performance boost in some scenarios, improves the code clarity and
may be useful for future development.

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-30 13:34:03 +01:00
Ilya Maximets
19aa70168b tests/flowgen: Fix length field of 802.2 data link header.
Length in Data Link Header for these packets should not include
source and destination MACs or the length field itself.

Therefore, it should be 14 bytes less, otherwise other network
tools like wireshark complains:

  Expert Info (Error/Malformed):
    Length field value goes past the end of the payload

Additionally fixing the printing of the packet/flow configuration,
as it currently prints '%s=%s' strings without any real data.

Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-30 13:33:08 +01:00
Adrian Moreno
f88ee78e0a match: Do not print "igmp" match keyword.
The match keyword "igmp" is not supported in ofp-parse, which means
that flow dumps cannot be restored. Previously a workaround was
added to ovs-save to avoid changing output in stable branches.

This patch changes the output to print igmp match in the accepted
ofp-parse format (ip,nw_proto=2) and print igmp_type/code as generic
tp_src/dst. Tests are added, and NEWS is updated to reflect this change.

The workaround in ovs-save is still included to ensure that flows
can be restored when upgrading an older ovs-vswitchd. This workaround
should be removed in later versions.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Salvatore Daniele <sdaniele@redhat.com>
Co-authored-by: Salvatore Daniele <sdaniele@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-29 22:45:48 +01:00
Ilya Maximets
fb7a75e523 ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.
Commit dc0bd12f5b04 removed restriction that tunnel endpoint must be a
bridge port.  So, currently OVS has to check if the native tunnel needs
to be terminated regardless of the output port.  Unfortunately, there
is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst'
match to the megaflow that ends up in the corresponding datapath flow.
And since tunneling works on L3 level and not restricted by any
particular bridge, this extra match criteria is added to every
datapath flow on every bridge even if that bridge cannot be part of
a tunnel processing.

For example, if OVS has at least one tunnel configured and we're
adding a completely separate bridge with 2 ports and simple rules
to forward packets between two ports, there still will be a match on
a destination mac address:

 1. <create a tunnel configuration in OVS>
 2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev
 3. ovs-vsctl add-port br-non-tunnel port0
           -- add-port br-non-tunnel port1
 4. ovs-ofctl del-flows br-non-tunnel
 5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1
 6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0

 # ovs-appctl ofproto/trace br-non-tunnel in_port=port0

 Flow: in_port=1,vlan_tci=0x0000,
       dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000

 bridge("br-non-tunnel")
 -----------------------
  0. in_port=1, priority 32768
     output:2

 Final flow: unchanged
 Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000
 Datapath actions: 5                 ^^^^^^^^^^^^^^^^^^^^^^^^

This increases the number of upcalls and installed datapath flows,
since separate flow needs to be installed per destination MAC, reducing
the switching performance.  This also blocks datapath performance
optimizations that are based on the datapath flow simplicity.

In general, in order to be a tunnel endpoint, port has to have an IP
address.  Hence native tunnel termination should be attempted only
for such ports.  This allows to avoid extra matches in most cases.

Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.")
Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mike Pattrick <mkp@redhat.com>
2021-11-19 17:25:18 +01:00
Numan Siddique
9fe0ce4f72 ofproto-dpif-xlate: Fix check_pkt_larger incomplete translation.
xlate_check_pkt_larger() sets ctx->exit to 'true' at the end
causing the translation to stop.  This results in incomplete
datapath rules.

For example, for the below OF rules configured on a bridge,

  table=0,in_port=1 actions=load:0x1->NXM_NX_REG1[[]],resubmit(,1),
                            load:0x2->NXM_NX_REG1[[]],resubmit(,1),
                            load:0x3->NXM_NX_REG1[[]],resubmit(,1)
  table=1,in_port=1,reg1=0x1 actions=check_pkt_larger(200)->NXM_NX_REG0[[0]],
                                     resubmit(,4)
  table=1,in_port=1,reg1=0x2 actions=output:2
  table=1,in_port=1,reg1=0x3 actions=output:4
  table=4,in_port=1 actions=output:3

The datapath flow should be:

  check_pkt_len(size=200,gt(3),le(3)),2,4

But right now it is:

  check_pkt_len(size=200,gt(3),le(3))

Actions after the first resubmit(,1) in the first flow in table 0
are never applied.  This patch fixes this issue.

Fixes: 5b34f8fc3b38 ("Add a new OVS action check_pkt_larger")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2018365
Reported-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-17 22:49:51 +01:00
Kevin Traynor
a83a406096 dpif-netdev: Sync PMD ALB state with user commands.
Previously, when a user enabled PMD auto load balancer with
pmd-auto-lb="true", some conditions such as number of PMDs/RxQs
that were required for a rebalance to take place were checked.

If the configuration meant that a rebalance would not take place
then PMD ALB was logged as 'disabled' and not run.

Later, if the PMD/RxQ configuration changed whereby a rebalance
could be effective, PMD ALB was logged as 'enabled' and would run at
the appropriate time.

This worked ok from a functional view but it is unintuitive for the
user reading the logs.

e.g. with one PMD (PMD ALB would not be effective)

User enables ALB, but logs say it is disabled because it won't run.
$ ovs-vsctl set open_vSwitch . other_config:pmd-auto-lb="true"
|dpif_netdev|INFO|PMD auto load balance is disabled

No dry run takes place.

Add more PMDs (PMD ALB may be effective).
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=50
|dpif_netdev|INFO|PMD auto load balance is enabled ...

Dry run takes place.
|dpif_netdev|DBG|PMD auto load balance performing dry run.

A better approach is to simply reflect back the user enable/disable
state in the logs and deal with whether the rebalance will be effective
when needed. That is the approach taken in this patch.

To cut down on unneccessary work, some basic checks are also made before
starting a PMD ALB dry run and debug logs can indicate this to the user.

e.g. with one PMD (PMD ALB would not be effective)

User enables ALB, and logs confirm the user has enabled it.
$ ovs-vsctl set open_vSwitch . other_config:pmd-auto-lb="true"
|dpif_netdev|INFO|PMD auto load balance is enabled...

No dry run takes place.
|dpif_netdev|DBG|PMD auto load balance nothing to do, not enough non-isolated PMDs or RxQs.

Add more PMDs (PMD ALB may be effective).
$ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=50

Dry run takes place.
|dpif_netdev|DBG|PMD auto load balance performing dry run.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-17 21:19:11 +01:00
lin huang
513ed65700 system-traffic.at: Fix typo in conntrack zones tests.
Signed-off-by: Lin Huang <linhuang@ruijie.com.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-16 21:06:53 +01:00
Eelco Chaudron
9b20df73a6 dpctl: dpif: Allow viewing and configuring dp cache sizes.
This patch adds a general way of viewing/configuring datapath
cache sizes. With an implementation for the netlink interface.

The ovs-dpctl/ovs-appctl show commands will display the
current cache sizes configured:

 $ ovs-dpctl show
 system@ovs-system:
   lookups: hit:25 missed:63 lost:0
   flows: 0
   masks: hit:282 total:0 hit/pkt:3.20
   cache: hit:4 hit-rate:4.54%
   caches:
     masks-cache: size:256
   port 0: ovs-system (internal)
   port 1: br-int (internal)
   port 2: genev_sys_6081 (geneve: packet_type=ptap)
   port 3: br-ex (internal)
   port 4: eth2
   port 5: sw0p1 (internal)
   port 6: sw0p3 (internal)

A specific cache can be configured as follows:

 $ ovs-appctl dpctl/cache-set-size DP CACHE SIZE
 $ ovs-dpctl cache-set-size DP CACHE SIZE

For example to disable the cache do:

 $ ovs-dpctl cache-set-size system@ovs-system masks-cache 0
 Setting cache size successful, new size 0.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-08 21:48:05 +01:00
Ilya Maximets
317b1bfd7d ovsdb: Don't let transaction history grow larger than the database.
If user frequently changes a lot of rows in a database, transaction
history could grow way larger than the database itself.  This wastes
a lot of memory and also makes monitor_cond_since slower than
usual monotor_cond if the transaction id is old enough, because
re-construction of the changes from a history is slower than just
creation of initial database snapshot.  This is also the case if
user deleted a lot of data, so transaction history still holds all of
it while the database itself doesn't.

In case of current lb-per-service model in ovn-kubernetes, each
load-balancer is added to every logical switch/router.  Such a
transaction touches more than a half of a OVN_Northbound database.
And each of these transactions is added to the transaction history.
Since transaction history depth is 100, in worst case scenario,
it will hold 100 copies of a database increasing memory consumption
dramatically.  In tests with 3000 LBs and 120 LSs, memory goes up
to 3 GB, while holding at 30 MB if transaction history disabled in
the code.

Fixing that by keeping count of the number of ovsdb_atom's in the
database and not allowing the total number of atoms in transaction
history to grow larger than this value.  Counting atoms is fairly
cheap because we don't need to iterate over them, so it doesn't have
significant performance impact.  It would be ideal to measure the
size of individual atoms, but that will hit the performance.
Counting cells instead of atoms is not sufficient, because OVN
users are adding hundreds or thousands of atoms to a single cell,
so they are largely different in size.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2021-11-05 22:33:03 +01:00
Timothy Redaelli
c5d384f77b checkpatch: Check if some tags are wrongly written.
Currently, there are some patches with the tags wrongly written (with
space instead of dash ) and this may prevent some automatic system or CI
to detect them correctly.

This commit adds a check in checkpatch to be sure the tag is written
correctly with dash and not with space.

The tags supported by the commit are:
Acked-by, Reported-at, Reported-by, Requested-by, Reviewed-by, Submitted-at
and Suggested-by.

It's not necessary to add "Signed-off-by" since it's already checked in
checkpatch.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-04 22:26:00 +01:00
Timothy Redaelli
68543dd523 python: Replace pyOpenSSL with ssl.
Currently, pyOpenSSL is half-deprecated upstream and so it's removed on
some distributions (for example on CentOS Stream 9,
https://issues.redhat.com/browse/CS-336), but since OVS only
supports Python 3 it's possible to replace pyOpenSSL with "import ssl"
included in base Python 3.

Stream recv and send had to be splitted as _recv and _send, since SSLError
is a subclass of socket.error and so it was not possible to except for
SSLWantReadError and SSLWantWriteError in recv and send of SSLStream.

TCPstream._open cannot be used in SSLStream, since Python ssl module
requires the SSL socket to be created before connecting it, so
SSLStream._open needs to create the socket, create SSL socket and then
connect the SSL socket.

Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-at: https://bugzilla.redhat.com/1988429
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Terry Wilson <twilson@redhat.com>
Tested-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-03 16:00:04 +01:00
Peng He
c1fdb83471 ofproto-dpif-xlate: Fix zone set from non-frozen-metadata fields.
CT zone could be set from a field that is not included in frozen
metadata. Consider the example rules which are typically seen in
OpenStack security group rules:

priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2

The zone is set from the first rule's ct action. These two rules will
generate two megaflows: the first one uses zone=5 to query the CT module,
the second one sets the zone-id from the first megaflow and commit to CT.

The current implementation will generate a megaflow that does not use
ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
set by an Imm not a field.

Consider a situation that one changes the zone id (for example to 15)
in the first rule, however, still keep the second rule unchanged. During
this change, there is traffic hitting the two generated megaflows, the
revaldiator would revalidate all megaflows, however, the revalidator will
not change the second megaflow, because zone=5 is recorded in the
megaflow, so the xlate will still translate the commit action into zone=5,
and the new traffic will still commit to CT as zone=5, not zone=15,
resulting in taffic drops and other issues.

Just like OVS set-field convention, if a field X is set by Y
(Y is a variable not an Imm), we should also mask Y as a match
in the generated megaflow. An exception is that if the zone-id is
set by the field that is included in the frozen state (i.e. regs) and this
upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
as the recirc_id is a hash of the values in these fields, and it will change
following the changes of these fields. When the recirc_id changes,
all megaflows with the old recirc id will be invalid later.

Fixes: 07659514c3 ("Add support for connection tracking.")
Reported-by: Sai Su <susai.ss@bytedance.com>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-10-13 22:17:30 +02:00
Ilya Maximets
01bca6dab9 tunnel-push-pop.at: Mask source port in tunnel header.
Source port is based on a packet hash and hash depends on a chosen
implementation.  Masking it to avoid test failures with '-msse4.2'.

Fixes: 7e6b41ac8d9d ("dpif-netdev: Fix crash when PACKET_OUT is metered.")
Reported-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2021-10-12 21:41:27 +02:00
Kumar Amber
cc0a87b11c pmd.at: Add test-cases for DPCLS and DPIF commands.
Added 2 separate test-cases for DPCLS and DPIF commands:
1018: PMD - dpcls configuration
1017: PMD - dpif configuration

The above added tests are to test the commands which are used
to either get or set the dpcls and dpif function pointers to
various different implementations like AVX512 or auto-validator
based on different CPU ISA supported.

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-10-12 17:49:11 +02:00
Ilya Maximets
429b114c5a ovsdb-data: Deduplicate string atoms.
ovsdb-server spends a lot of time cloning atoms for various reasons,
e.g. to create a diff of two rows or to clone a row to the transaction.
All atoms, except for strings, contains a simple value that could be
copied in efficient way, but duplicating strings every time has a
significant performance impact.

Introducing a new reference-counted structure 'ovsdb_atom_string'
that allows to not copy strings every time, but just increase a
reference counter.

This change allows to increase transaction throughput in benchmarks
up to 2x for standalone databases and 3x for clustered databases, i.e.
number of transactions that ovsdb-server can handle per second.
It also noticeably reduces memory consumption of ovsdb-server.

Next step will be to consolidate this structure with json strings,
so we will not need to duplicate strings while converting database
objects to json and back.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
2021-09-24 15:53:46 +02:00
Ilya Maximets
32b51326ef ovsdb-data: Add function to apply diff in-place.
ovsdb_datum_apply_diff() is heavily used in ovsdb transactions, but
it's linear in terms of number of comparisons.  And it also clones
all the atoms along the way.  In most cases size of a diff is much
smaller than the size of the original datum, this allows to perform
the same operation in-place with only O(diff->n * log2(old->n))
comparisons and O(old->n + diff->n) memory copies with memcpy.
Using this function while applying diffs read from the storage gives
a significant performance boost and allows to execute much more
transactions per second.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
2021-09-24 15:01:38 +02:00
wenxu
ebcbb534e1 ipf: Fix only nat the first fragment in the reass process.
The ipf collect original fragment packets and reass a new pkt
to do the conntrack logic.  After finish the conntrack things
copy the ct meta info to each original packet and modify the
l4 header in the first fragment. It should modify the ip src/dst
info for all the fragments.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Co-authored-by: luke.li <luke.li@ucloud.cn>
Signed-off-by: luke.li <luke.li@ucloud.cn>
Reviewed-by: wangze <wangze712@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>

Test case:
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-15 22:01:25 +02:00
Aaron Conole
00d3d4a7d3 checkpatch: Avoid catastrophic backtracking.
As Frode Nordahl points out in [0], it is possible for the
python regex module to enter a case of catastrophic backtracking
which causes oscillation between states and hangs the checkpatch
script.

One suggested solution to these cases is to use an anchor[1] in
the regex, which should force the backtrack to exit early.
However, when I tested this, it didn't seem to improve anything
(since the start is already anchored, and trying to anchor the
end results in the same hang).

Instead, we explicitly check that the line ends with '\\' before
trying to match on the 'if-inside-a-macro' check.  A new check
is added to catch the case in checkpatch.

0: https://mail.openvswitch.org/pipermail/ovs-dev/2021-August/386881.html
1: https://stackoverflow.com/questions/22072406/preventing-any-backtracking-in-regex-past-a-specific-pattern

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-08 21:29:03 +02:00
Tony van der Peet
7e6b41ac8d dpif-netdev: Fix crash when PACKET_OUT is metered.
When a PACKET_OUT has output port of OFPP_TABLE, and the rule
table includes a meter and this causes the packet to be deleted,
execute with a clone of the packet, restoring the original packet
if it is changed by the execution.

Add tests to verify the original issue is fixed, and that the fix
doesn't break tunnel processing.

Reported-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Tony van der Peet <tony.vanderpeet@alliedtelesis.co.nz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-08 17:52:35 +02:00
Ilya Maximets
748010ff30 json: Optimize string serialization.
Current string serialization code puts all characters one by one.
This is slow because dynamic string needs to perform length checks
on every ds_put_char() and it's also doesn't allow compiler to use
better memory copy operations, i.e. doesn't allow copying few bytes
at once.

Special symbols are rare in a typical database.  Quotes are frequent,
but not too frequent.  In databases created by ovn-kubernetes, for
example, usually there are at least 10 to 50 chars between quotes.
So, it's better to count characters that doesn't require escaping
and use fast data copy for the whole sequential block.

Testing with a synthetic benchmark (included) on my laptop shows
following performance improvement:

   Size      Q  S       Before       After       Diff
 -----------------------------------------------------
 100000      0  0 :    0.227 ms     0.142 ms   -37.4 %
 100000      2  1 :    0.277 ms     0.186 ms   -32.8 %
 100000      10 1 :    0.361 ms     0.309 ms   -14.4 %
 10000000    0  0 :   22.720 ms    12.160 ms   -46.4 %
 10000000    2  1 :   27.470 ms    19.300 ms   -29.7 %
 10000000    10 1 :   37.950 ms    31.250 ms   -17.6 %
 100000000   0  0 :  239.600 ms   126.700 ms   -47.1 %
 100000000   2  1 :  292.400 ms   188.600 ms   -35.4 %
 100000000   10 1 :  387.700 ms   321.200 ms   -17.1 %

Here Q - probability (%) for a character to be a '\"' and
S - probability (%) to be a special character ( < 32).

Testing with a closer to real world scenario shows overall decrease
of the time needed for database compaction by ~5-10 %.  And this
change also decreases CPU consumption in general, because string
serialization is used in many different places including ovsdb
monitors and raft.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Numan Siddique <numans@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2021-08-31 19:04:08 +02:00
Ilya Maximets
7847bf89ea tests: Skip netlink policy test on non-Linux platforms.
FreeBSD tests in Cirrus CI are broken and, I guess, windows tests too:

  89. library.at:258: testing netlink policy ...
  ./library.at:259: ovstest test-netlink-policy ll_addr
  --- /dev/null	2021-08-20 19:02:41.907547000 +0000
  +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/89/stderr
  @@ -0,0 +1 @@
  +ovstest: unknown command 'test-netlink-policy'; use --help for help
  ./library.at:259: exit code was 1, expected 0
  89. library.at:258: 89. netlink policy (library.at:258): FAILED

'tests/test-netlink-policy.c' is built only on Linux, test
must be skipped on all other platforms to unblock CI.

Fixes: bfee9f6c0115 ("netlink: Add support for parsing link layer address.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Frode Nordahl <frode.nordahl@canonical.com>
2021-08-28 16:10:02 +02:00
Numan Siddique
7502849e95 ovsdb-idl: Add APIs to query if a table and a column is present.
This patch adds 2 new APIs in the ovsdb-idl client library
 - ovsdb_idl_server_has_table() and ovsdb_idl_server_has_column() to
query if a table and a column is present in the IDL or not.  This
patch also adds IDL helper functions which are auto generated from
the schema which makes it easier for the clients.

These APIs are required for scenarios where the server schema is old and
missing a table or column and the client (built with a new schema
version) does a transaction with the missing table or column.  This
results in a continuous loop of transaction failures.

Related-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1992705
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-28 02:59:04 +02:00