2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 21:38:13 +00:00

19471 Commits

Author SHA1 Message Date
Ilya Maximets
8986d4d556 Prepare for 3.1.0.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 21:37:03 +01:00
Han Zhou
43266915a4 ovs-vsctl: Do not sent 'set_db_change_aware'.
ovs-vsctl's connections are short-lived, so it doesn't care about db
status changes.

Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 20:44:41 +01:00
Han Zhou
8833e7c8ed ovsdb-idl: Provide API to disable set_db_change_aware request.
For ovsdb clients that are short-lived, e.g. when using
ovn-nbctl/ovn-sbctl to read some metrics from the OVN NB/SB server, they
don't really need to be aware of db changes, because they exit
immediately after getting the initial response for the requested data.
In such use cases, however, the clients still send 'set_db_change_aware'
request, which results in server side error logs when the server tries
to send out the response for the 'set_db_change_aware' request, because
at the moment the client that is supposed to receive the request has
already closed the connection and exited. E.g.:

2023-01-10T18:23:29.431Z|00007|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
2023-01-10T18:23:29.431Z|00008|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)

To avoid such problems, this patch provides an API to allow a client to
choose to not send the 'set_db_change_aware' request.

There was an earlier attempt to fix this [0], but it was not accepted
back then as discussed in the email [1]. It was also discussed in the
emails that an alternative approach is to use notification instead of
request, but that would require protocol changes and taking backward
compatibility into consideration. So this patch takes a different
approach and tries to keep the change small.

[0] http://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-git-send-email-dceara@redhat.com/

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050919.html

Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
Reported-by: Tobias Hofmann <tohofman@cisco.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-February/050914.html
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 20:14:10 +01:00
Ales Musil
08146bf7d9 openflow: Add extension to flush CT by generic match.
Add extension that allows to flush connections from CT
by specifying fields that the connections should be
matched against. This allows to match only some fields
of the connection e.g. source address for orig direction.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 19:58:08 +01:00
Ales Musil
a9ae73b916 ofp, dpif: Allow CT flush based on partial match.
Currently, the CT can be flushed by dpctl only by specifying
the whole 5-tuple.  This is not very convenient when there are
only some fields known to the user of CT flush.  Add new struct
ofp_ct_match which represents the generic filtering that can
be done for CT flush. The match is done only on fields that are
non-zero with exception to the icmp fields.

This allows the filtering just within dpctl, however it is a
preparation for OpenFlow extension.

Reported-at: https://bugzilla.redhat.com/2120546
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-16 19:55:21 +01:00
Kevin Traynor
de3bbdc479 dpif-netdev: Add PMD load based sleeping.
Sleep for an incremental amount of time if none of the Rx queues
assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
on an polling iteration of the PMD.

Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
sleep time to zero (i.e. no sleep).

Sleep time will be increased on each iteration where the low load
conditions remain up to a total of the max sleep time which is set
by the user e.g:
ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500

The default pmd-maxsleep value is 0, which means that no sleeps
will occur and the default behaviour is unchanged from previously.

Also add new stats to pmd-perf-show to get visibility of operation
e.g.
...
   - sleep iterations:       153994  ( 76.8 % of iterations)
   Sleep time (us):         9159399  ( 59 us/iteration avg.)
...

Reviewed-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-12 18:56:05 +01:00
Kevin Traynor
f4c8841351 util: Add non quiesce xnanosleep.
xnanosleep forces the thread into quiesce state in anticipation that
it will be sleeping for a considerable time and that the thread may
need to quiesce before the sleep is finished.

In some cases, a very short sleep may be requested and in that case
the overhead of going to into quiesce state may be unnecessary.

To allow for those cases add a xnanosleep_no_quiesce() variant.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-12 12:21:20 +01:00
David Marchand
4de6b009cf Documentation: Remove link to obsolete sources.
This archive website disappeared.
On the other hand, the link to an obsolete dpif-provider man page
probably did not provide much info and we can simply mention the current
file.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-12 11:55:03 +01:00
David Marchand
68ff5e9811 Documentation: Remove reference to RST online editor.
rst.ninjs.org is not available anymore, but there are alternatives
listed in this doc.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 20:26:29 +01:00
David Marchand
8ef198425b Documentation: Fix link to Netperf.
netperf.org was shut down in favor of some HP related resources.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 20:01:01 +01:00
David Marchand
61e2259cf4 Documentation: Fix link to AppVeyor.
Sphinx linkcheck complains with:

Warning, treated as error:
.../Documentation/intro/install/windows.rst:1093:broken link:
	www.appveyor.com ()

Add a https scheme in link to AppVeyor website.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 20:01:01 +01:00
David Marchand
7e18ae63a6 Documentation: Fix link to iproute2 git repository.
iproute2 git repositories were split and moved around v4.15 [1].
It is time to fix the link in OVS documentation.

1: https://lore.kernel.org/netdev/20180129082052.0eb85e9b@xeon-e3/

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 19:59:47 +01:00
David Marchand
c7da49bc64 netdev-offload-dpdk: Fix transfer flows.
Following DPDK commit bd2a4d4b2e3a ("ethdev: forbid direction attribute
in transfer flow rules"), the ingress attribute presence is rejected for
transfer flows.

Fixes: a77c7796f23a ("dpdk: Update to use v22.11.1.")
Acked-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 18:00:41 +01:00
Adrian Moreno
bd14aa31e3 tests: Add unit tests to rculist.
Low test coverage on this area caused some errors to remain unnoticed.
Add basic functional test of rculist.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-11 17:44:55 +01:00
Ilya Maximets
e5d92c1a54 cirrus: Update to use FreeBSD 12.4.
12.4 was released in December.  That means that 12.3
will become unavailable in a near future.  Updating.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 19:43:32 +01:00
Eelco Chaudron
264ae342dc system-dpdk: Fix error message in ping vhost-user ports.
In some environments, ovs-vswitchd gets shutdown before the pkill of
testpmd has been completed, which results in the following error messages:

  Removing port 'dpdkvhostuser0' while vhost device still attached.
  To restore connectivity after re-adding of port, VM on socket '' must be restarted.

This patch will wait for the socket disconnect to be handled by the
vhost-user before shutting down OVS.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 19:39:18 +01:00
David Marchand
c9e10ac57f netdev-dpdk: Drop coverage counter for vhost IRQs.
The vhost library now provides finegrained statistics for guest
notifications:
- notifications for buffer reclaim by the guest,
- notifications for buffer availability to the guest,

Example before this patch:
$ ovs-appctl coverage/show |
  grep vhost_notification
vhost_notification         0.0/sec     0.000/sec        2.0283/sec   total: 7302

$ ovs-vsctl get interface vhost4 statistics |
  sed -e 's#[{}]##g' -e 's#, #\n#g' |
  grep guest_notifications
rx_q0_guest_notifications=66
tx_q0_guest_notifications=7236

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 18:15:21 +01:00
David Marchand
3b29286db1 netdev-dpdk: Add per virtqueue statistics.
The DPDK vhost-user library maintains more granular per queue stats
which can replace what OVS was providing for vhost-user ports.

The benefits for OVS:
- OVS can skip parsing packet sizes on the rx side,
- dev->stats_lock won't be taken in rx/tx code unless some packet is
  dropped,
- vhost-user is aware of which packets are transmitted to the guest,
  so per *transmitted* packet size stats can be reported,
- more internal stats from vhost-user may be exposed, without OVS
  needing to understand them,

Note: the vhost-user library does not provide global stats for a port.
The proposed implementation is to have the global stats (exposed via
netdev_get_stats()) computed by querying and aggregating all per queue
stats.
Since per queue stats are exposed via another netdev ops
(netdev_get_custom_stats()), this may lead to some race and small
discrepancies.
This issue might already affect other netdev classes.

Example:
$ ovs-vsctl get interface vhost4 statistics |
  sed -e 's#[{}]##g' -e 's#, #\n#g' |
  grep -v =0$
rx_1_to_64_packets=12
rx_256_to_511_packets=15
rx_65_to_127_packets=21
rx_broadcast_packets=15
rx_bytes=7497
rx_multicast_packets=33
rx_packets=48
rx_q0_good_bytes=242
rx_q0_good_packets=3
rx_q0_guest_notifications=3
rx_q0_multicast_packets=3
rx_q0_size_65_127_packets=2
rx_q0_undersize_packets=1
rx_q1_broadcast_packets=15
rx_q1_good_bytes=7255
rx_q1_good_packets=45
rx_q1_guest_notifications=45
rx_q1_multicast_packets=30
rx_q1_size_256_511_packets=15
rx_q1_size_65_127_packets=19
rx_q1_undersize_packets=11
tx_1_to_64_packets=36
tx_256_to_511_packets=45
tx_65_to_127_packets=63
tx_broadcast_packets=45
tx_bytes=22491
tx_multicast_packets=99
tx_packets=144
tx_q0_broadcast_packets=30
tx_q0_good_bytes=14994
tx_q0_good_packets=96
tx_q0_guest_notifications=96
tx_q0_multicast_packets=66
tx_q0_size_256_511_packets=30
tx_q0_size_65_127_packets=42
tx_q0_undersize_packets=24
tx_q1_broadcast_packets=15
tx_q1_good_bytes=7497
tx_q1_good_packets=48
tx_q1_guest_notifications=48
tx_q1_multicast_packets=33
tx_q1_size_256_511_packets=15
tx_q1_size_65_127_packets=21
tx_q1_undersize_packets=12

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-09 18:14:57 +01:00
Mike Pattrick
006e1c6dbf tc: Add support for TCA_STATS_PKT64.
Currently tc offload flow packet counters will roll over every ~4
billion packets. This is because the packet counter in struct
tc_stats provided by TCA_STATS_BASIC is a 32bit integer.

Now we check for the optional TCA_STATS_PKT64 attribute which provides
the full 64bit packet counter if the 32bit one has rolled over. Because
the TCA_STATS_PKT64 attribute may appear multiple times in a netlink
message, the method of parsing attributes was changed.

Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Reported-at: https://bugzilla.redhat.com/1776816
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 16:09:45 +01:00
Ilya Maximets
a7826d05b8 Documentation: Fix links in maintainers.rst.
GitHub and Sphinx are parsing links differently.  Sphinx knows about
the overall documentation structure and all the sections defined in
other docs, while GitHub is using direct rst 2 html conversion and
doesn't know any of that.  Sphinx wants links to sections in other
docs to be defined with a :doc: field, but GitHub can't parse that
and requires having a direct link to the other rST document.

The problem is that we have a top level MAINTAINERS.rst, that should
be parseable by GitHub, included in the maintainers.rst in the
main documentation section that is used by Sphinx to generate html,
pdf and other docs.  So, it's hard to make links work in both.

Working around that limitation by using rST substitutions for the
links.  Cutting off the substitutions for actual links and adding
:doc: links instead during the file inclusion for Sphinx.

Reported-by: Igor Zhukov <ivzhukov@sbercloud.ru>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 16:07:58 +01:00
Ilya Maximets
1584062b99 Documentation: Fix links in the DPDK guide on physical ports.
The text enclosed in '<...>' supposed to be an actual link and not the
name of the link.  This generates incorrect links that lead nowhere.

Also, a single underscore supposed to be used for external links.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 16:04:21 +01:00
Ilya Maximets
461ab419ea treewide: Don't use non-portable '==' with test command.
'==' is not defined by POSIX and not supported by some shells.
This is causing test failures and potential other issues:

  ./tests/testsuite: 54: test: X2: unexpected operator
  ./tests/testsuite: 54: test: X157: unexpected operator
  ./tests/testsuite: 54: test: X116: unexpected operator

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052157.html
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 15:56:12 +01:00
Eelco Chaudron
182b9cb352 dpif: Fix tunnel key set for IPv6 tunnels with SLOW_ACTION.
The dpif_execute_helper_cb() function is supposed to add the
OVS_ACTION_ATTR_SET(OVS_KEY_ATTR_TUNNEL()) action to the
list of actions when passing it down to the kernel.

This function was only checking if the IPv4 destination
address was set, not both. This patch fixes this, including
a datapath testcase.

Fixes: 076caa2fb077 ("ofproto: Meter translation.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 15:42:15 +01:00
Eelco Chaudron
62e85106b4 utilities: Add USDT script to monitor dpif netlink execute message queuing.
This patch adds the dpif_nl_exec_monitor.py script that will used the
existing dpif_netlink_operate__:op_flow_execute USDT probe to show
all DPIF_OP_EXECUTE operations being queued for transmission over
the netlink interface.

Here is an example, truncated output:

Display DPIF_OP_EXECUTE operations being queued for transmission...
TIME               CPU  COMM             PID        NL_SIZE
3124.516679897     1    ovs-vswitchd     8219       180
    nlmsghdr  : len = 0, type = 36, flags = 1, seq = 0, pid = 0
    genlmsghdr: cmd = 3, version = 1, reserver = 0
    ovs_header: dp_ifindex = 21
      > Decode OVS_PACKET_ATTR_* TLVs:
      nla_len 46, nla_type OVS_PACKET_ATTR_PACKET[1], data: 00 00 00...
      nla_len 20, nla_type OVS_PACKET_ATTR_KEY[2], data: 08 00 02 00...
          > Decode OVS_KEY_ATTR_* TLVs:
          nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00...
          nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00...
      nla_len 88, nla_type OVS_PACKET_ATTR_ACTIONS[3], data: 4c 00 03...
          > Decode OVS_ACTION_ATTR_* TLVs:
          nla_len 76, nla_type OVS_ACTION_ATTR_SET[3], data: 48 00...
                  > Decode OVS_TUNNEL_KEY_ATTR_* TLVs:
                  nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_ID[0], data:...
                  nla_len 20, nla_type OVS_TUNNEL_KEY_ATTR_IPV6_DST[13], ...
                  nla_len 5, nla_type OVS_TUNNEL_KEY_ATTR_TTL[4], data: 40
                  nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT[5]...
                  nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_CSUM[6], data:
                  nla_len 6, nla_type OVS_TUNNEL_KEY_ATTR_TP_DST[10],...
                  nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS[8],...
          nla_len 8, nla_type OVS_ACTION_ATTR_OUTPUT[1], data: 02 00 00 00
      - Dumping OVS_PACKET_ATR_PACKET data:
      ###[ Ethernet ]###
        dst       = 00:00:00:00:ec:01
        src       = 04:f4:bc:28:57:00
        type      = IPv4
      ###[ IP ]###
           version   = 4
           ihl       = 5
           tos       = 0x0
           len       = 50
           id        = 0
           flags     =
           frag      = 0
           ttl       = 127
           proto     = icmp
           chksum    = 0x2767
           src       = 10.0.0.1
           dst       = 10.0.0.100
           \options   \
      ###[ ICMP ]###
              type      = echo-request
              code      = 0
              chksum    = 0xf7f3
              id        = 0x0
              seq       = 0xc

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-06 14:42:14 +01:00
Ilya Maximets
9736b971b5 rhel: Enable AF_XDP by default in Fedora builds.
All supported versions of Fedora do package libxdp and libbpf, so it
makes sense to enable AF_XDP support.

Control files for debian packaging are much less flexible, so its hard
to enable AF_XDP builds while not breaking builds for version of Ubuntu
and Debian that do not package libbpf or libxdp.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
e44e803431 acinclude.m4: Build with AF_XDP support by default if possible.
With this change we will try to detect all the netdev-afxdp
dependencies and enable AF_XDP support by default if they are
present at the build time.

Configuration script behaves in a following way:

 - ./configure --enable-afxdp

   Will check for AF_XDP dependencies and fail if they are
   not available.

 - ./configure --disable-afxdp

   Disables checking for AF_XDP.  Build will not support
   AF_XDP even if all dependencies are installed.

 - Just ./configure or ./configure --enable-afxdp=auto

   Will check for AF_XDP dependencies.  Will print a warning
   if they are not available, but will continue without AF_XDP
   support.  If dependencies are available in a system, this
   option is equal to --enable-afxdp.

'--disable-afxdp' added to the debian and fedora package builds
to keep predictable behavior.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
771a55825f Documentation/afxdp: Use packaged libbpf/libxdp for the build.
Necessary bits was removed from the kernel's libbpf in 6.0 release,
so the instructions on how to build libbpf from kernel sources are
now incorrect.  Suggest to use libbpf and libxdp packaged by
distributions instead.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
649dbc19ff github: Test AF_XDP build using libbpf instead of kernel sources.
AF_XDP bits was removed from kernel's libbpf in 6.0.  libbpf
and libxdp are now primary way to build AF_XDP applications.
Most of modern distributions are already packaging some version
of libbpf, so it's better to test building with it instead
of building old unsupported kernel tree.

Ubuntu started packaging libxdp only in 22.10, so not using
it for now.

Kernel build infrastructure in CI scripts is not needed anymore.
Removed.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
b17cadff1d netdev-afxdp: Hide too large memset from sparse.
Sparse complains about 64M umem initialization.  Hide it from
the checker instead of disabling a warning globally.

SPARSE_FLAGS are kept in the CI script even though they are
empty at the moment.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
1dcc490d44 netdev-afxdp: Allow building with libxdp and newer libbpf.
AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp.
Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8
and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach().

Updating configuration and source code to accommodate above changes
and allow building OVS with AF_XDP support on newer systems:

 - Checking the version of libbpf by detecting availability
   of bpf_xdp_detach.

 - Checking availability of the libxdp in a system by looking
   for a library providing libxdp_strerror(), if libbpf is
   newer than 0.6.  And checking for xsk.h header provided by
   libxdp-dev[el].

 - Use xsk.h from libbpf if it is older than 0.7 and not linking
   with libxdp in this case as there are known incompatible
   versions of libxdp in distributions.

 - Check for the NEED_WAKEUP feature replaced with direct checking
   in the source code if XDP_USE_NEED_WAKEUP is defined.

 - Checking availability of bpf_xdp_query_id and bpf_xdp_detach
   and using them instead of deprecated APIs.  Fall back to old
   functions if not found.

 - Dropped LIBBPF_LDADD variable as it makes library and function
   detection much harder without providing any actual benefits.
   AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS.

 - Header includes moved around to files where they are actually used.

 - Removed libelf dependency as it is not really used.

With these changes it should be possible to build OVS with either:

 - libbpf built from the kernel sources (5.19 or older).
 - libbpf < 0.7 provided in distributions.
 - libxdp and libbpf >= 0.7 provided in newer distributions.

While it is technically possible to build with libbpf 0.7+ without
libxdp at the moment we're not allowing that for a few reasons.
First, required functions in libbpf are deprecated and can be removed
in future releases.  Second, support for all these combinations makes
the detection code fairly complex.
AFAIK, most of the distributions packaging libbpf 0.7+ do package
libxdp as well.

libxdp added as a build dependency for Fedora build since all
supported versions of Fedora are packaging this library.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:21 +01:00
Ilya Maximets
0d8318db63 netdev-afxdp: Disable -Wfree-nonheap-object on receive.
GCC 11+ generates a warning:

  In file included from lib/netdev-linux-private.h:30,
                   from lib/netdev-afxdp.c:19:
  In function 'dp_packet_delete',
      inlined from 'dp_packet_delete' at lib/dp-packet.h:246:1,
      inlined from 'dp_packet_batch_add__' at lib/dp-packet.h:775:9,
      inlined from 'dp_packet_batch_add' at lib/dp-packet.h:783:5,
      inlined from 'netdev_afxdp_rxq_recv' at lib/netdev-afxdp.c:898:9:
  lib/dp-packet.h:260:9: warning: 'free' called on pointer
    '*umem.xpool.array' with nonzero offset [8, 2558044588346441168]
    [-Wfree-nonheap-object]
    260 |         free(b);
        |         ^~~~~~~

But it is a false positive since the code path is not possible.
In this call chain the packet will always have source DPBUF_AFXDP
and the free() will never be called.  GCC doesn't see that, because
initialization function dp_packet_use_afxdp() is part of a different
translation unit.

Disabling a warning in this particular place to avoid build failures.

Older versions of clang do not have the -Wfree-nonheap-object, so we
need to additionally guard the pragmas.  Clang is using GCC pragmas
and complains about unknown ones.

Reported-at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108187
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 12:39:32 +01:00
Ilya Maximets
d83d7c4915 ci: Fix overriding OPTS provided from the yml.
For GCC builds we're overriding --disable-ssl or --enable-shared
options set up in the GHA yml file.

Fix that by adding to EXTRA_OPTS instead.

Fixes: 2581b0ad1159 ("travis: Combine kernel builds.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 12:39:28 +01:00
Cheng Li
46e04ec31b dpif-netdev: Calculate per numa variance.
Currently, pmd_rebalance_dry_run() calculate overall variance of
all pmds regardless of their numa location. The overall result may
hide un-balance in an individual numa.

Considering the following case. Numa0 is free because VMs on numa0
are not sending pkts, while numa1 is busy. Within numa1, pmds
workloads are not balanced. Obviously, moving 500 kpps workloads from
pmd 126 to pmd 62 will make numa1 much more balance. For numa1
the variance improvement will be almost 100%, because after rebalance
each pmd in numa1 holds same workload(variance ~= 0). But the overall
variance improvement is only about 20%, which may not trigger auto_lb.

```
numa_id   core_id      kpps
      0        30         0
      0        31         0
      0        94         0
      0        95         0
      1       126      1500
      1       127      1000
      1        63      1000
      1        62       500
```

As auto_lb doesn't balance workload across numa nodes. So it makes
more sense to calculate variance improvement per numa node.

Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 22:15:47 +01:00
Kevin Traynor
ad6e506fcb dpif-netdev: Rename pmd_info_show_rxq variables.
There are some similar readings taken for pmds and Rx queues
in this function and a few of the variable names are ambiguous.

Improve the readability of the code by updating some variables
names to indicate that they are readings related to the pmd.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:58:30 +01:00
Kevin Traynor
e9ab15f4f8 docs: Add documentation for pmd-rxq-show secs parameter.
Add description of new '-secs' parameter in docs. Also, add to NEWS as
it is a user facing change.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:58:10 +01:00
Kevin Traynor
526230bfab dpif-netdev: Make pmd-rxq-show time configurable.
pmd-rxq-show shows the Rx queue to pmd assignments as well as the
pmd usage of each Rx queue.

Up until now a tail length of 60 seconds pmd usage was shown
for each Rx queue, as this is the value used during rebalance
to avoid any spike effects.

When debugging or tuning, it is also convenient to display the
pmd usage of an Rx queue over a shorter time frame, so any changes
config or traffic that impact pmd usage can be evaluated more quickly.

A parameter is added that allows pmd-rxq-show stats pmd usage to
be shown for a shorter time frame. Values are rounded up to the
nearest 5 seconds as that is the measurement granularity and the value
used is displayed. e.g.

$ ovs-appctl dpif-netdev/pmd-rxq-show -secs 5
 Displaying last 5 seconds pmd usage %
 pmd thread numa_id 0 core_id 4:
   isolated : false
   port: dpdk0            queue-id:  0 (enabled)   pmd usage: 95 %
   overhead:  4 %

The default time frame has not changed and the maximum value
is limited to the maximum stored tail length (60 seconds).

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:57:29 +01:00
David Marchand
9a86a3dd68 travis: Drop support.
Following a change in the terms of use, free Travis credits are really
too low for a realistic usage by OVS contributors.
As a consequence, testing OVS with Travis has been abandoned by most
(if not all) contributors to the project.

Drop the Travis configuration from our repository, clean references in
the documentation and move GHA specifics to the association yml.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:23:42 +01:00
Eelco Chaudron
d5469cb743 Makefile: Add USDT scripts to make install and fedora/debian test rpm.
This change will install all the USDT scripts to the
{_datadir}/openvswitch/scripts/usdt directory with the
make install command.

In addition it will also add them to the Fedora
and Debian openvswitch-test rpm.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:10:58 +01:00
Dan Williams
685973a9f1 ovsdb-server: Don't log when memory-trim-on-compaction doesn't change.
But log at least once even if the value hasn't changed, for
informational purposes.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:02:50 +01:00
Adrian Moreno
863d2e1a8c python: Don't exit OFPFlow constructor.
Returning None in a constructor does not make sense and is just error
prone.  Removing what was a leftover from an attempt to handle a common
error case of trying to parse what is commonly outputted by ovs-ofctl.
This should be done by the caller anyway.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 19:48:07 +01:00
Adrian Moreno
22eb224386 tests: Verify flows in odp.at are parseable.
Create a small helper script and check that flows tested in odp.at are
parseable.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 19:18:57 +01:00
Adrian Moreno
fc3f918cb5 tests: Verify flows in ofp-actions are parseable.
Create a small helper script and check that flows used in ofp-actions.at
are parseable.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 19:18:38 +01:00
Adrian Moreno
c395e9810e python: Interpret free keys as output in clone.
clone-like actions can also output to ports by specifying the port name.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
542fdad701 python: Fix output=CONTROLLER action.
When CONTROLLER is used as free key, it means output=CONTROLLER which is
handled by decode_controller. However, it must output the KV in the
right format: "output": {"format": "CONTROLLER"}.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
1850e5e689 python: Support case-insensitive OpenFlow actions.
OpenFlow actions names can be capitalized so in order to support this,
support case-insensitive KVDecoders and use it in Openflow actions.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
75a6e8db9c python: Return list of actions for odp action clone.
Sometimes we don't want to return the result of a nested key-value
decoding as a dictionary but as a list of dictionaries. This happens
when we parse actions where keys can be repeated.

Refactor code that already takes that into account from ofp_act.py to
kv.py and use it for datapath action "clone".

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
d33e548fc7 python: Make key-value matching strict by default.
Currently, if a key is not found in the decoder information, we use the
default decoder which typically returns a string.

This not only means we can go out of sync with the C code without
noticing but it's also error prone as malformed flows could be parsed
without warning.

Make KeyValue parsing strict, raising an error if a decoder is not found
for a key.
This behaviour can be turned off globally by running 'KVDecoders.strict
= False' but it's generally not recommended. Also, if a KVDecoder does
need this default behavior, it can be explicitly configured specifying
it's default decoder.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
fe204743cb python: Add explicit decoders for all ofp actions.
We were silently relying on some ofp actions to be decoded by the
default decoder which would yield decent string values.

In order to be more safe and robust, add an explicit decoder for all
missing actions.

This patch also reworks the learn action decoding to make it more
explicit and verify all the fields specified in the learn action are
actually valid fields.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
3648fec08f python: Include aliases in ofp_fields.py.
We currently auto-generate a dictionary of field names and decoders.
However, sometimes fields can be specified by their cannonical NXM or
OXM names.

Modify gen_ofp_field_decoders to also generate a dictionary of aliases
so it's easy to map OXM/NXM names to their fields and decoding
information.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
c627cfd9cb python: Fix datapath flow decoders.
Fix the following erros in odp decoding:
- Missing push_mpls action
- Typos in collector_set_id, tp_src/tp_dst and csum
- Missing two fields in vxlan match

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00