2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

19894 Commits

Author SHA1 Message Date
Mike Pattrick
531c17023c netdev-dummy: Allocate dummy_packet_stream on cacheline boundary.
UB Sanitizer report:

lib/netdev-dummy.c:197:15: runtime error: member access within
misaligned address 0x00000217a7f0 for type 'struct
dummy_packet_stream', which requires 64 byte alignment
              ^
    #0 dummy_packet_stream_init lib/netdev-dummy.c:197
    #1 dummy_packet_stream_create lib/netdev-dummy.c:208
    #2 dummy_packet_conn_set_config lib/netdev-dummy.c:436
    [...]

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 22:18:15 +01:00
Eelco Chaudron
b1f58f5072 netdev-offload-tc: Preserve tc statistics when flow gets modified.
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.

This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.

It also updates the check_pkt_len tc test to purge the flows, so we do
not use existing updated tc flow counters, but start with fresh installed
set of datapath flows.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 16:39:59 +01:00
Ilya Maximets
d6501c6605 sparse: Fix numa.h for libnuma >= 2.0.13.
Current numa.h header for sparse re-defines functions in a way
that breaks the header from libnuma 2.0.13+, because the original
issue was fixed in that version:
  25dcde021d

Sparse errors as a result:

  lib/netdev-afxdp.c: note: in included file (through include/sparse/numa.h):
  /usr/include/numa.h:346:26: error: macro "numa_get_interleave_mask_compat"
                                     passed 1 arguments, but takes just 0
  /usr/include/numa.h:376:26: error: macro "numa_get_membind_compat"
                                     passed 1 arguments, but takes just 0
  /usr/include/numa.h:406:26: error: macro "numa_get_run_node_mask_compat"
                                     passed 1 arguments, but takes just 0
  /usr/include/numa.h:347:1: error: Expected ; at end of declaration
  /usr/include/numa.h:347:1: error: got {
  /usr/include/numa.h:351:9: error: 'tp' has implicit type

It's hard to adjust defines to work with both versions of a header.
Just defining all the functions we actually use in OVS instead and
not including the original header.

Fixes: e8568993e062 ("netdev-afxdp: NUMA-aware memory allocation for XSK related memory.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 15:58:29 +01:00
Ilya Maximets
4fd2d46c01 AUTHORS: Add wangchuanlei.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-31 17:40:50 +01:00
wangchuanlei
e22e1f6725 dpctl: Add support to count upcall packets.
Add support to count upcall packets per port, both succeed and failed,
which is a better way to see how many packets upcalled on each interface.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-31 17:40:50 +01:00
Eelco Chaudron
e1e5eac5b0 tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().
A long long time ago, an effort was made to make tc flower
rtnl_lock() free. However, on the OVS part we forgot to add
the TCA_KIND "flower" attribute, which tell the kernel to skip
the lock. This patch corrects this by adding the attribute for
the delete and get operations.

The kernel code calls tcf_proto_is_unlocked() to determine the
rtnl_lock() is needed for the specific tc protocol. It does this
in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter().

If the name is not set, tcf_proto_is_unlocked() will always return
false. If set, the specific protocol is queried for unlocked support.

Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-30 21:12:31 +01:00
Simon Horman
3f85b11d50 system-offloads-traffic: Skip tests if nc is not present.
The following tests use the nc command and should be skipped if
nc is not present.

- "offloads - check interface meter offloading -  offloads disabled"
- "offloads - check interface meter offloading -  offloads enabled"

Fixes: 5660b89a309d ("dpif-netlink: Offloading meter to tc police action")
Reported-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2023-01-30 14:49:16 +01:00
Simon Horman
6e5661d17d system-traffic: Remove unnecessary dependency on nc.
The conntrack - ICMP related to original direction" test does not
use nc and therefore does not need to be skipped if nc is not present.

Fixes: d0e4206230b3 ("tests: ICMP related to original direction test.")
Reported-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2023-01-30 14:49:16 +01:00
Ilya Maximets
9117f4d54f netdev-offload-tc: Fix misaligned access to ct label.
UndefinedBehaviorSanitizer:

  lib/netdev-offload-tc.c:1356:50: runtime error:
   member access within misaligned address 0x60700001a89c for type
   'const struct (unnamed struct at lib/netdev-offload-tc.c:1350:27)',
   which requires 8 byte alignment 0x60700001a89c: note: pointer points here
   24 00 04 00 01 00 00 05  00 00 0d 00 0a 00 00 00  00 00 00 00 ...
               ^
   0 0xd5d183 in parse_put_flow_ct_action lib/netdev-offload-tc.c:1356:50
   1 0xd5783f in netdev_tc_parse_nl_actions lib/netdev-offload-tc.c:2015:19
   2 0xd4027c in netdev_tc_flow_put lib/netdev-offload-tc.c:2355:11
   3 0x9666d7 in netdev_flow_put lib/netdev-offload.c:318:14
   4 0xcd4c0a in parse_flow_put lib/dpif-netlink.c:2297:11
   5 0xcd4c0a in try_send_to_netdev lib/dpif-netlink.c:2384:15
   6 0xcd4c0a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
   7 0x87d40e in dpif_operate lib/dpif.c:1372:13
   8 0x6d43e9 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
   9 0x6d43e9 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
   10 0x6cf6ea in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
   11 0xb6d7ea in ovsthread_wrapper lib/ovs-thread.c:423:12
   12 0x7f5ccf017801 in start_thread
   13 0x7f5ccefb744f in __GI___clone3

Fixes: 9221c721bec0 ("netdev-offload-tc: Add conntrack label and mark support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Kevin Traynor
3beff0a6b0 dpif-netdev-perf: Add metric averages when no iterations.
pmd-perf-show with pmd-perf-metrics=true displays a histogram
with averages. However, averages were not displayed when there
is no iterations.

They will be all zero so it is not hiding useful information
but the stats look incomplete without them, especially when
they are displayed for some PMD thread cores and not others.

The histogram print is large and this is just an extra couple
of lines, so might as well print them all the time to ensure
that the user does not think there is something missing from
the display.

Before patch:
  Histograms
     cycles/it
     499       0
     716       0
     1025      0
     1469      0
  <snip>

After patch:
  Histograms
     cycles/it
     499       0
     716       0
     1025      0
     1469      0
  <snip>
  ---------------
     cycles/it
     0

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Kevin Traynor
7db18054ff dpif-netdev-perf: Remove not a number stat value.
Some stats in pmd-perf-show don't check for divide by zero
which results in not a number (-nan).

This is a normal case for some of the stats when there are
no Rx queues assigned to the PMD thread core.

It is not obvious what -nan is to a user so add a check for
divide by zero and set stat to 0 if present.

Before patch:
pmd thread numa_id 1 core_id 9:

  Iterations:                    0  (-nan us/it)
  - Used TSC cycles:             0  (  0.0 % of total cycles)
  - idle iterations:             0  ( -nan % of used cycles)
  - busy iterations:             0  ( -nan % of used cycles)

After patch:
pmd thread numa_id 1 core_id 9:

  Iterations:                    0  (0.00 us/it)
  - Used TSC cycles:             0  (  0.0 % of total cycles)
  - idle iterations:             0  (  0.0 % of used cycles)
  - busy iterations:             0  (  0.0 % of used cycles)

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Ilya Maximets
4f0a728a59 system-traffic.at: Skip the 'ICMP6 Related' test if nc is missing.
Test fails is 'nc' is not available, it should be skipped instead.

Fixes: b020a416e24c ("System Tests: Enhance NAT tests.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Eelco Chaudron
6ad35dd80e utilities: Add revalidator measurement script and needed USDT probes.
This patch adds a Python script that can be used to analyze the
revalidator runs by providing statistics (including some real time
graphs).

The USDT events can also be captured to a file and used for
later offline analysis.

The following blog explains the Open vSwitch revalidator
implementation and how this tool can help you understand what is
happening in your system.

https://developers.redhat.com/articles/2022/10/19/open-vswitch-revalidator-process-explained

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Robin Jarry
c3ed0bf34b tests/mfex: Silence Blowfish/CAST5 deprecation warnings.
On Fedora 37 (at least), MFEX unit tests are failing because of
deprecation warnings:
$ python3 tests/mfex_fuzzy.py test_traffic.pcap 2000
/usr/lib/python3.11/site-packages/scapy/layers/ipsec.py:471:
   CryptographyDeprecationWarning: Blowfish has been deprecated
  cipher=algorithms.Blowfish,
/usr/lib/python3.11/site-packages/scapy/layers/ipsec.py:485:
   CryptographyDeprecationWarning: CAST5 has been deprecated
  cipher=algorithms.CAST5,

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:13:07 +01:00
Han Zhou
e5b3cb9995 revalidator: Allow min-revalidator-pps to be 0.
Today the minimum value for this setting is 1. This patch allows it to
be 0, meaning not checking pps at all, and always do revalidation.

This is particularly useful for environments where some of the
applications with long-lived connections may have very low traffic for
certain period but have high rate of burst periodically. It is desirable
to keep the datapath flows instead of periodically deleting them to
avoid burst of packet miss to userspace.

When setting to 0, there may be more datapath flows to be revalidated,
resulting in higher CPU cost of revalidator threads. This is the
downside but in certain cases this is still more desirable than packet
misses to user space.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:09:10 +01:00
Ilya Maximets
ebaee44624 netdev-dpdk: Free mbufs in bulk.
rte_pktmbuf_free_bulk() function was introduced in 19.11 and became
stable in 21.11.  Use it to free arrays of mbufs instead of freeing
packets one by one.

In simple V2V testing with 64B packets, 2 PMD threads and bidirectional
traffic this change improves performance by 3.5 - 4.5 %.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:07:39 +01:00
Ilya Maximets
b7f540129b ovsdb: Don't convert unchanged columns during database conversion.
Column conversion involves converting it to json and back.  These are
heavy operations and completely unnecessary if the column type didn't
change.  Most of the time schema changes only add new columns/tables
without changing existing ones at all.  Clone the column instead to
save some time.

This will also save time while destroying the original database since
we will only need to reduce reference counters on unchanged datum
objects that were cloned instead of actually freeing them.

Additionally, moving the column lookup into a separate loop, so we
don't perform an shash lookup for each column of each row.

Testing with 440 MB OVN_Southbound database shows 70% speed up of the
ovsdb_convert() function.  Execution time reduced from 15 to 4.4
seconds, 3.5 of which is a post-conversion transaction replay.  Overall
time required for the online database conversion reduced from 37 to 25
seconds.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 15:53:33 +01:00
Ilya Maximets
e0e4266a90 ovsdb-types: Add functions to compare types for equality.
Will be used in the next commit to optimize database conversion.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 15:53:06 +01:00
Kevin Traynor
948767a18d dpif-netdev: Set PMD load based sleep start/inc to 1 us.
Now that the timer slack for the PMD threads is reduced we can also
reduce the start/increment for PMD load based sleeping to match it.

This will further reduce initial sleep times making it more resilient
to interfaces that might be sensitive to large sleep times.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-23 17:23:28 +01:00
David Marchand
f62629a558 dpif-netdev: Set timer slack for PMD threads.
The default Linux timer slack groups timer expires into 50 uS intervals.

With some traffic patterns this can mean that returning to process
packets after a sleep takes too long and packets are dropped.

Add a helper to util.c and set use it to reduce the timer slack
for PMD threads, so that sleeps with smaller resolutions can be done
to prevent sleeping for too long.

Fixes: de3bbdc479a9 ("dpif-netdev: Add PMD load based sleeping.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401121.html
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-23 17:23:20 +01:00
David Marchand
e24b68fa70 netdev-dpdk: Fix deadlock due to virtqueue stats retrieval.
As Ilya reported, we have a ABBA deadlock between DPDK vq->access_lock
and OVS dev->mutex when OVS main thread refreshes statistics, while a
vring state change event is being processed for a same vhost port.

To break from this situation, move vring state change notifications
handling from the vhost-events DPDK thread to a dedicated thread
using a lockless queue.

Besides, for the case when a bogus/malicious guest is sending continuous
updates, add a counter of pending updates in the queue and warn if a
threshold of 1000 entries is reached.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401101.html
Fixes: 3b29286db1c5 ("netdev-dpdk: Add per virtqueue statistics.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-19 20:32:04 +01:00
Ilya Maximets
7402dae8f4 ovsdb: Fix database statistics during the database replacement.
The counter for the number of atoms has to be re-set to the number from
the new database, otherwise the value will be incorrect.  For example,
this is causing the atom counter doubling after online conversion of
a clustered database.

Miscounting may also lead to increased memory consumption by the
transaction history or otherwise too aggressive transaction history
sweep.

Fixes: 317b1bfd7dd3 ("ovsdb: Don't let transaction history grow larger than the database.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-18 13:54:42 +01:00
Ilya Maximets
b02356ebbf Prepare for post-3.1.0 (3.1.90).
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 21:37:09 +01:00
Ilya Maximets
8986d4d556 Prepare for 3.1.0.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 21:37:03 +01:00
Han Zhou
43266915a4 ovs-vsctl: Do not sent 'set_db_change_aware'.
ovs-vsctl's connections are short-lived, so it doesn't care about db
status changes.

Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 20:44:41 +01:00
Han Zhou
8833e7c8ed ovsdb-idl: Provide API to disable set_db_change_aware request.
For ovsdb clients that are short-lived, e.g. when using
ovn-nbctl/ovn-sbctl to read some metrics from the OVN NB/SB server, they
don't really need to be aware of db changes, because they exit
immediately after getting the initial response for the requested data.
In such use cases, however, the clients still send 'set_db_change_aware'
request, which results in server side error logs when the server tries
to send out the response for the 'set_db_change_aware' request, because
at the moment the client that is supposed to receive the request has
already closed the connection and exited. E.g.:

2023-01-10T18:23:29.431Z|00007|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
2023-01-10T18:23:29.431Z|00008|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)

To avoid such problems, this patch provides an API to allow a client to
choose to not send the 'set_db_change_aware' request.

There was an earlier attempt to fix this [0], but it was not accepted
back then as discussed in the email [1]. It was also discussed in the
emails that an alternative approach is to use notification instead of
request, but that would require protocol changes and taking backward
compatibility into consideration. So this patch takes a different
approach and tries to keep the change small.

[0] http://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-git-send-email-dceara@redhat.com/

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050919.html

Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 20:14:10 +01:00
Ales Musil
08146bf7d9 openflow: Add extension to flush CT by generic match.
Add extension that allows to flush connections from CT
by specifying fields that the connections should be
matched against. This allows to match only some fields
of the connection e.g. source address for orig direction.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 19:58:08 +01:00
Ales Musil
a9ae73b916 ofp, dpif: Allow CT flush based on partial match.
Currently, the CT can be flushed by dpctl only by specifying
the whole 5-tuple.  This is not very convenient when there are
only some fields known to the user of CT flush.  Add new struct
ofp_ct_match which represents the generic filtering that can
be done for CT flush. The match is done only on fields that are
non-zero with exception to the icmp fields.

This allows the filtering just within dpctl, however it is a
preparation for OpenFlow extension.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 19:55:21 +01:00
Kevin Traynor
de3bbdc479 dpif-netdev: Add PMD load based sleeping.
Sleep for an incremental amount of time if none of the Rx queues
assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
on an polling iteration of the PMD.

Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
sleep time to zero (i.e. no sleep).

Sleep time will be increased on each iteration where the low load
conditions remain up to a total of the max sleep time which is set
by the user e.g:
ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500

The default pmd-maxsleep value is 0, which means that no sleeps
will occur and the default behaviour is unchanged from previously.

Also add new stats to pmd-perf-show to get visibility of operation
e.g.
...
   - sleep iterations:       153994  ( 76.8 % of iterations)
   Sleep time (us):         9159399  ( 59 us/iteration avg.)
...

Reviewed-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-12 18:56:05 +01:00
Kevin Traynor
f4c8841351 util: Add non quiesce xnanosleep.
xnanosleep forces the thread into quiesce state in anticipation that
it will be sleeping for a considerable time and that the thread may
need to quiesce before the sleep is finished.

In some cases, a very short sleep may be requested and in that case
the overhead of going to into quiesce state may be unnecessary.

To allow for those cases add a xnanosleep_no_quiesce() variant.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-12 12:21:20 +01:00
David Marchand
4de6b009cf Documentation: Remove link to obsolete sources.
This archive website disappeared.
On the other hand, the link to an obsolete dpif-provider man page
probably did not provide much info and we can simply mention the current
file.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-12 11:55:03 +01:00
David Marchand
68ff5e9811 Documentation: Remove reference to RST online editor.
rst.ninjs.org is not available anymore, but there are alternatives
listed in this doc.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 20:26:29 +01:00
David Marchand
8ef198425b Documentation: Fix link to Netperf.
netperf.org was shut down in favor of some HP related resources.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 20:01:01 +01:00
David Marchand
61e2259cf4 Documentation: Fix link to AppVeyor.
Sphinx linkcheck complains with:

Warning, treated as error:
.../Documentation/intro/install/windows.rst:1093:broken link:
	www.appveyor.com ()

Add a https scheme in link to AppVeyor website.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 20:01:01 +01:00
David Marchand
7e18ae63a6 Documentation: Fix link to iproute2 git repository.
iproute2 git repositories were split and moved around v4.15 [1].
It is time to fix the link in OVS documentation.

1: https://lore.kernel.org/netdev/20180129082052.0eb85e9b@xeon-e3/

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 19:59:47 +01:00
David Marchand
c7da49bc64 netdev-offload-dpdk: Fix transfer flows.
Following DPDK commit bd2a4d4b2e3a ("ethdev: forbid direction attribute
in transfer flow rules"), the ingress attribute presence is rejected for
transfer flows.

Fixes: a77c7796f23a ("dpdk: Update to use v22.11.1.")
Acked-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 18:00:41 +01:00
Adrian Moreno
bd14aa31e3 tests: Add unit tests to rculist.
Low test coverage on this area caused some errors to remain unnoticed.
Add basic functional test of rculist.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 17:44:55 +01:00
Ilya Maximets
e5d92c1a54 cirrus: Update to use FreeBSD 12.4.
12.4 was released in December.  That means that 12.3
will become unavailable in a near future.  Updating.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 19:43:32 +01:00
Eelco Chaudron
264ae342dc system-dpdk: Fix error message in ping vhost-user ports.
In some environments, ovs-vswitchd gets shutdown before the pkill of
testpmd has been completed, which results in the following error messages:

  Removing port 'dpdkvhostuser0' while vhost device still attached.
  To restore connectivity after re-adding of port, VM on socket '' must be restarted.

This patch will wait for the socket disconnect to be handled by the
vhost-user before shutting down OVS.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 19:39:18 +01:00
David Marchand
c9e10ac57f netdev-dpdk: Drop coverage counter for vhost IRQs.
The vhost library now provides finegrained statistics for guest
notifications:
- notifications for buffer reclaim by the guest,
- notifications for buffer availability to the guest,

Example before this patch:
$ ovs-appctl coverage/show |
  grep vhost_notification
vhost_notification         0.0/sec     0.000/sec        2.0283/sec   total: 7302

$ ovs-vsctl get interface vhost4 statistics |
  sed -e 's#[{}]##g' -e 's#, #\n#g' |
  grep guest_notifications
rx_q0_guest_notifications=66
tx_q0_guest_notifications=7236

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 18:15:21 +01:00
David Marchand
3b29286db1 netdev-dpdk: Add per virtqueue statistics.
The DPDK vhost-user library maintains more granular per queue stats
which can replace what OVS was providing for vhost-user ports.

The benefits for OVS:
- OVS can skip parsing packet sizes on the rx side,
- dev->stats_lock won't be taken in rx/tx code unless some packet is
  dropped,
- vhost-user is aware of which packets are transmitted to the guest,
  so per *transmitted* packet size stats can be reported,
- more internal stats from vhost-user may be exposed, without OVS
  needing to understand them,

Note: the vhost-user library does not provide global stats for a port.
The proposed implementation is to have the global stats (exposed via
netdev_get_stats()) computed by querying and aggregating all per queue
stats.
Since per queue stats are exposed via another netdev ops
(netdev_get_custom_stats()), this may lead to some race and small
discrepancies.
This issue might already affect other netdev classes.

Example:
$ ovs-vsctl get interface vhost4 statistics |
  sed -e 's#[{}]##g' -e 's#, #\n#g' |
  grep -v =0$
rx_1_to_64_packets=12
rx_256_to_511_packets=15
rx_65_to_127_packets=21
rx_broadcast_packets=15
rx_bytes=7497
rx_multicast_packets=33
rx_packets=48
rx_q0_good_bytes=242
rx_q0_good_packets=3
rx_q0_guest_notifications=3
rx_q0_multicast_packets=3
rx_q0_size_65_127_packets=2
rx_q0_undersize_packets=1
rx_q1_broadcast_packets=15
rx_q1_good_bytes=7255
rx_q1_good_packets=45
rx_q1_guest_notifications=45
rx_q1_multicast_packets=30
rx_q1_size_256_511_packets=15
rx_q1_size_65_127_packets=19
rx_q1_undersize_packets=11
tx_1_to_64_packets=36
tx_256_to_511_packets=45
tx_65_to_127_packets=63
tx_broadcast_packets=45
tx_bytes=22491
tx_multicast_packets=99
tx_packets=144
tx_q0_broadcast_packets=30
tx_q0_good_bytes=14994
tx_q0_good_packets=96
tx_q0_guest_notifications=96
tx_q0_multicast_packets=66
tx_q0_size_256_511_packets=30
tx_q0_size_65_127_packets=42
tx_q0_undersize_packets=24
tx_q1_broadcast_packets=15
tx_q1_good_bytes=7497
tx_q1_good_packets=48
tx_q1_guest_notifications=48
tx_q1_multicast_packets=33
tx_q1_size_256_511_packets=15
tx_q1_size_65_127_packets=21
tx_q1_undersize_packets=12

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 18:14:57 +01:00
Mike Pattrick
006e1c6dbf tc: Add support for TCA_STATS_PKT64.
Currently tc offload flow packet counters will roll over every ~4
billion packets. This is because the packet counter in struct
tc_stats provided by TCA_STATS_BASIC is a 32bit integer.

Now we check for the optional TCA_STATS_PKT64 attribute which provides
the full 64bit packet counter if the 32bit one has rolled over. Because
the TCA_STATS_PKT64 attribute may appear multiple times in a netlink
message, the method of parsing attributes was changed.

Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Reported-at: https://bugzilla.redhat.com/1776816
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 16:09:45 +01:00
Ilya Maximets
a7826d05b8 Documentation: Fix links in maintainers.rst.
GitHub and Sphinx are parsing links differently.  Sphinx knows about
the overall documentation structure and all the sections defined in
other docs, while GitHub is using direct rst 2 html conversion and
doesn't know any of that.  Sphinx wants links to sections in other
docs to be defined with a :doc: field, but GitHub can't parse that
and requires having a direct link to the other rST document.

The problem is that we have a top level MAINTAINERS.rst, that should
be parseable by GitHub, included in the maintainers.rst in the
main documentation section that is used by Sphinx to generate html,
pdf and other docs.  So, it's hard to make links work in both.

Working around that limitation by using rST substitutions for the
links.  Cutting off the substitutions for actual links and adding
:doc: links instead during the file inclusion for Sphinx.

Reported-by: Igor Zhukov <ivzhukov@sbercloud.ru>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 16:07:58 +01:00
Ilya Maximets
1584062b99 Documentation: Fix links in the DPDK guide on physical ports.
The text enclosed in '<...>' supposed to be an actual link and not the
name of the link.  This generates incorrect links that lead nowhere.

Also, a single underscore supposed to be used for external links.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 16:04:21 +01:00
Ilya Maximets
461ab419ea treewide: Don't use non-portable '==' with test command.
'==' is not defined by POSIX and not supported by some shells.
This is causing test failures and potential other issues:

  ./tests/testsuite: 54: test: X2: unexpected operator
  ./tests/testsuite: 54: test: X157: unexpected operator
  ./tests/testsuite: 54: test: X116: unexpected operator

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052157.html
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 15:56:12 +01:00
Eelco Chaudron
182b9cb352 dpif: Fix tunnel key set for IPv6 tunnels with SLOW_ACTION.
The dpif_execute_helper_cb() function is supposed to add the
OVS_ACTION_ATTR_SET(OVS_KEY_ATTR_TUNNEL()) action to the
list of actions when passing it down to the kernel.

This function was only checking if the IPv4 destination
address was set, not both. This patch fixes this, including
a datapath testcase.

Fixes: 076caa2fb077 ("ofproto: Meter translation.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 15:42:15 +01:00
Eelco Chaudron
62e85106b4 utilities: Add USDT script to monitor dpif netlink execute message queuing.
This patch adds the dpif_nl_exec_monitor.py script that will used the
existing dpif_netlink_operate__:op_flow_execute USDT probe to show
all DPIF_OP_EXECUTE operations being queued for transmission over
the netlink interface.

Here is an example, truncated output:

Display DPIF_OP_EXECUTE operations being queued for transmission...
TIME               CPU  COMM             PID        NL_SIZE
3124.516679897     1    ovs-vswitchd     8219       180
    nlmsghdr  : len = 0, type = 36, flags = 1, seq = 0, pid = 0
    genlmsghdr: cmd = 3, version = 1, reserver = 0
    ovs_header: dp_ifindex = 21
      > Decode OVS_PACKET_ATTR_* TLVs:
      nla_len 46, nla_type OVS_PACKET_ATTR_PACKET[1], data: 00 00 00...
      nla_len 20, nla_type OVS_PACKET_ATTR_KEY[2], data: 08 00 02 00...
          > Decode OVS_KEY_ATTR_* TLVs:
          nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00...
          nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00...
      nla_len 88, nla_type OVS_PACKET_ATTR_ACTIONS[3], data: 4c 00 03...
          > Decode OVS_ACTION_ATTR_* TLVs:
          nla_len 76, nla_type OVS_ACTION_ATTR_SET[3], data: 48 00...
                  > Decode OVS_TUNNEL_KEY_ATTR_* TLVs:
                  nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_ID[0], data:...
                  nla_len 20, nla_type OVS_TUNNEL_KEY_ATTR_IPV6_DST[13], ...
                  nla_len 5, nla_type OVS_TUNNEL_KEY_ATTR_TTL[4], data: 40
                  nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT[5]...
                  nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_CSUM[6], data:
                  nla_len 6, nla_type OVS_TUNNEL_KEY_ATTR_TP_DST[10],...
                  nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS[8],...
          nla_len 8, nla_type OVS_ACTION_ATTR_OUTPUT[1], data: 02 00 00 00
      - Dumping OVS_PACKET_ATR_PACKET data:
      ###[ Ethernet ]###
        dst       = 00:00:00:00:ec:01
        src       = 04:f4:bc:28:57:00
        type      = IPv4
      ###[ IP ]###
           version   = 4
           ihl       = 5
           tos       = 0x0
           len       = 50
           id        = 0
           flags     =
           frag      = 0
           ttl       = 127
           proto     = icmp
           chksum    = 0x2767
           src       = 10.0.0.1
           dst       = 10.0.0.100
           \options   \
      ###[ ICMP ]###
              type      = echo-request
              code      = 0
              chksum    = 0xf7f3
              id        = 0x0
              seq       = 0xc

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 14:42:14 +01:00
Ilya Maximets
9736b971b5 rhel: Enable AF_XDP by default in Fedora builds.
All supported versions of Fedora do package libxdp and libbpf, so it
makes sense to enable AF_XDP support.

Control files for debian packaging are much less flexible, so its hard
to enable AF_XDP builds while not breaking builds for version of Ubuntu
and Debian that do not package libbpf or libxdp.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
e44e803431 acinclude.m4: Build with AF_XDP support by default if possible.
With this change we will try to detect all the netdev-afxdp
dependencies and enable AF_XDP support by default if they are
present at the build time.

Configuration script behaves in a following way:

 - ./configure --enable-afxdp

   Will check for AF_XDP dependencies and fail if they are
   not available.

 - ./configure --disable-afxdp

   Disables checking for AF_XDP.  Build will not support
   AF_XDP even if all dependencies are installed.

 - Just ./configure or ./configure --enable-afxdp=auto

   Will check for AF_XDP dependencies.  Will print a warning
   if they are not available, but will continue without AF_XDP
   support.  If dependencies are available in a system, this
   option is equal to --enable-afxdp.

'--disable-afxdp' added to the debian and fedora package builds
to keep predictable behavior.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
771a55825f Documentation/afxdp: Use packaged libbpf/libxdp for the build.
Necessary bits was removed from the kernel's libbpf in 6.0 release,
so the instructions on how to build libbpf from kernel sources are
now incorrect.  Suggest to use libbpf and libxdp packaged by
distributions instead.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00