2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 21:38:13 +00:00

19672 Commits

Author SHA1 Message Date
Ilya Maximets
f5188ff214 daemon.at: Correctly terminate ovsdb process in a backtrace test.
In a backtrace test with monitor the child process will be re-started
after being killed.  The test doesn't wait for that to happen, so it
is possible that during the test cleanup the pid in a pid file is not
updated yet.  Hence, the on-exit hook will not kill the process.

This is causing issues in Cirrus CI, because gmake on FreBSD waits for
all child processes to exit and that never happens.

Fix the issue by waiting for a new process.  It's also better to exit
gracefully instead of relying on the on-exit kill.

Fixes: 759a29dc2d97 ("backtrace: Extend the backtrace functionality.")
Acked-by: Ales Musil <amusil@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-19 18:38:17 +02:00
Ilya Maximets
24520a401e vswitchd: Wait for a bridge exit before replying to exit unixctl.
Before the cleanup option, the bridge_exit() call was fairly fast,
because it didn't include any particularly long operations.  However,
with the cleanup flag, this function destroys a lot of datapath
resources freeing a lot of memory, waiting on RCU and talking to
the kernel.  That may take a noticeable amount of time, especially
on a busy system or under profilers/sanitizers.  However, the unixctl
'exit' command replies instantly without waiting for any work to
actually be done.  This may cause system test failures or other
issues where scripts expect ovs-vswitchd to exit or destroy all the
datapath resources shortly after appctl call.

Fix that by waiting for the bridge_exit() before replying to the user.
At least, all the datapath resources will actually be destroyed by
the time ovs-appctl exits.

Also moving a structure from stack to global.  Seems cleaner this way.

Since we're not replying right away and it's technically possible
to have multiple clients requesting exit at the same time, storing
connections in an array.

Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-18 20:45:00 +02:00
Ilya Maximets
bffffd841f Prepare for post-3.2.0 (3.2.90).
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:26:13 +02:00
Ilya Maximets
f20980a19e Prepare for 3.2.0.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:26:08 +02:00
Adrian Moreno
07ce41da11 netdev-linux: Support 64-bit rates in tc policing.
Use TCA_POLICE_RATE64 if the rate cannot be expressed using 32bits.

This breaks the 32Gbps barrier. The new barrier is ~4Tbps caused by
netdev's API expressing kbps rates using 32-bit integers.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137643
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:54 +02:00
Adrian Moreno
68ac6e9db7 netdev-linux: Refactor nl_msg_put_act_police.
In preparation for supporting 64-bit rates in tc policies, move the
allocation and initialization of struct tc_police object inside
nl_msg_put_act_police(). That way, the function is now called with the
actual rates.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:54 +02:00
Adrian Moreno
13e183da31 netdev-linux: Remove tc_matchall_fill_police.
It is equivalent to tc_policer_init() so remove the duplicated function.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:54 +02:00
Adrian Moreno
a86fea06fe netdev-linux: Use 64-bit rates in htb tc classes.
Currently, htb rates are capped at ~34Gbps because they are internally
expressed as 32-bit fields.

Move min and max rates to 64-bit fields and use TCA_HTB_RATE64 and
TCA_HTB_CEIL64 to configure HTC classes to break this barrier.

In order to test this, create a dummy tuntap device and set it's
speed to a very high value so we can try adding a QoS queue with big
rates.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137619
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:54 +02:00
Adrian Moreno
7edfac5745 netdev-linux: Use 64bit rtab and burst calculations.
tc uses these "rtab" tables to estimate the time (ticks) that it takes
to send a packet of different sizes. In preparation for the introduction
of 64-bit rates, add an argument to tc_put_rtab() to allow an external
64-bit rate.

Also use 64bits for other burst buffer calculation functions.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:54 +02:00
Adrian Moreno
b8f8fad864 netdev-linux: Use speed as max rate in tc classes.
Instead of relying on feature bits, use the speed value directly as
maximum rate for htb and hfsc classes.

There is still a limitation with the maximum rate that we can express
with a 32-bit number in bytes/s (~ 34.3Gbps), but using the actual link speed
instead of the feature bits, we can at least use an accurate maximum for
some link speeds (such as 25Gbps) which are not supported by netdev's feature
bits.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:54 +02:00
Adrian Moreno
6240c0b4c8 netdev: Add netdev_get_speed() to netdev API.
Currently, the netdev's speed is being calculated by taking the link's
feature bits (using netdev_get_features()) and transforming them into
bps.

This mechanism can be both inaccurate and difficult to maintain, mainly
because we currently use the feature bits supported by OpenFlow which
would have to be extended to support all new feature bits of all netdev
implementations while keeping the OpenFlow API intact.

In order to expose the link speed accurately for all current and future
hardware, add a new netdev API call that allows the implementations to
provide the current and maximum link speeds in Mbps.

Internally, the logic to get the maximum supported speed still relies on
feature bits so it might still get out of sync in the future. However,
the maximum configurable speed is not used as much as the current speed
and these feature bits are not exposed through the netdev interface so
it should be easier to add more.

Use this new function instead of netdev_get_features() where the link
speed is needed.

As a consequence of this patch, link speeds of cards is properly
reported (internally in OVSDB) even if not supported by OpenFlow.
A test verifies this behavior using a tap device.

Also, in order to avoid using the old, this patch adds a checkpatch.py
warning if the old API is used.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 20:03:32 +02:00
Ilya Maximets
1ef3f4f78a AUTHORS: Add Felix Huettner.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 19:51:39 +02:00
Felix Huettner
5392f89fed relay: Allow setting probe interval.
Previously it was not possible to set the probe interval for the
connection from a relay to the backing ovsdb-server. With this change
it is now possible using the
`ovsdb-server/set-relay-source-probe-interval` command.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-17 19:51:39 +02:00
Kevin Traynor
ef4883a8df dpif-netdev: Remove pmd-sleep-max experimental tag.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-15 00:17:05 +02:00
Kevin Traynor
bc6a6f82e5 dpif-netdev: Add pmd-sleep-show command.
Max requested sleep time and status for a PMD thread
is logged at start up or when changed, but it can be
convenient to have a command to dump this information
explicitly.

It is envisaged that this will be expanded for individual
pmds in the future, hence adding to dpif_netdev_pmd_info().

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-15 00:17:05 +02:00
Kevin Traynor
395668a68d pmd.at: Add macro for checking pmd sleep max time and state.
This is just cosmetic. There is no change to the tests.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-15 00:11:24 +02:00
Kevin Traynor
023dcdc7a1 dpif-netdev: Rename pmd-maxsleep config option.
other_config:pmd-maxsleep is a config option to allow
PMD thread cores to sleep under low or no load conditions.

Rename it to 'pmd-sleep-max' to allow a more structured
name and so that additional options or command can follow
the 'pmd-sleep-xyz' pattern.

Use of other_config:pmd-maxsleep is deprecated to be
removed in a future release and will result in a warning.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-15 00:11:21 +02:00
Terry Wilson
4d55a364ff python: Add async DNS support.
This adds a Python version of the async DNS support added in:

771680d96 DNS: Add basic support for asynchronous DNS resolving

The above version uses the unbound C library, and this
implimentation uses the SWIG-wrapped Python version of that.

In the event that the Python unbound library is not available,
a warning will be logged and the resolve() method will just
return None. For the case where inet_parse_active() is passed
an IP address, it will not try to resolve it, so existing
behavior should be preserved in the case that the unbound
library is unavailable.

Intentional differences from the C version are as follows:

  OVS_HOSTS_FILE environment variable can bet set to override
  the system 'hosts' file. This is primarily to allow testing to
  be done without requiring network connectivity.

  Since resolution can still be done via hosts file lookup, DNS
  lookups are not disabled when resolv.conf cannot be loaded.

  The Python socket_util module has fallen behind its C equivalent.
  The bare minimum change was done to inet_parse_active() to support
  sync/async dns, as there is no equivalent to
  parse_sockaddr_components(), inet_parse_passive(), etc. A TODO
  was added to bring socket_util.py up to equivalency to the C
  version.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-14 22:24:03 +02:00
Paolo Valerio
501f665a5a conntrack: Extract l4 information for SCTP.
Since a27d70a89 ("conntrack: add generic IP protocol support") all
the unrecognized IP protocols get handled using ct_proto_other ops
and are managed as L3 using 3 tuples.

This patch stores L4 information for SCTP in the conn_key so that
multiple conn instances, instead of one with ports zeroed, will be
created when there are multiple SCTP connections between two hosts.
It also performs crc32c check when not offloaded, and adds SCTP to
pat_enabled.

With this patch, given two SCTP association between two hosts,
tracking the connection will result in:

sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),
    reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),
    reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1

instead of:

sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),
    reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-13 21:22:41 +02:00
James Raphael Tiovalen
62f5aa42aa shash, simap, smap: Add assertions to *_count functions.
This commit adds assertions in the functions `shash_count`,
`simap_count`, and `smap_count` to ensure that the corresponding input
struct pointer is not NULL.

This ensures that if the return values of `shash_sort`, `simap_sort`,
or `smap_sort` are NULL, then the following for loops would not attempt
to access the pointer, which might result in segmentation faults or
undefined behavior.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-13 17:29:32 +02:00
Viacheslav Galaktionov
a5fdc45b84 netdev-dpdk: Fix build with experimental API.
The set_error function is now used regardless of whether experimental APIs
are allowed or not, so it must be defined unconditionally.

Fixes: fc06ea9a1883 ("netdev-dpdk: Add custom rx-steering configuration.")
Acked-by: Ivan Malov <ivan.malov@arknetworks.am>
Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@arknetworks.am>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-13 17:28:14 +02:00
Mike Pattrick
4829506b2a ofproto-dpif-xlate: Reduce stack usage in recursive xlate functions.
Several xlate actions used in recursive translation currently store a
large amount of information on the stack. This can result in handler
threads quickly running out of stack space despite before
xlate_resubmit_resource_check() is able to terminate translation. This
patch reduces stack usage by over 3kb from several translation actions.

This patch also moves some trace function from do_xlate_actions into its
own function.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2104779
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-13 15:46:17 +02:00
Eelco Chaudron
f3e9d30041 AUTHORS: Add Chandan Somani.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-12 12:09:57 +02:00
Chandan Somani
799f697e51 checkpatch: Print subject field if misspelled or missing.
This narrows down spelling errors that are in the commit
subject. It also provides a subject if the subject line is
missing. The provisional subject is the name of the patch
file, which should provide some context about the patch.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Chandan Somani <csomani@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-12 12:01:09 +02:00
Chandan Somani
9a50170a80 checkpatch: Add suggestions to the spell checker.
This will be useful for correcting possible spelling
mistakes with ease. Suggestions limited to 3 at first,
but can be made configurable in the future.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Chandan Somani <csomani@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-12 12:01:09 +02:00
Chandan Somani
d25c6bd8df checkpatch: Reorganize flagged words using a list.
Single out flagged words and allow for more useful
details, like spelling suggestions.

Signed-off-by: Chandan Somani <csomani@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-12 12:01:09 +02:00
Ilya Maximets
f770b8c133 AUTHORS: Add James Raphael Tiovalen.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-12 00:31:40 +02:00
James Raphael Tiovalen
b2d45921a6 ovs-vsctl: Fix crash when routing is enabled.
In the case where routing is enabled, the bridge member of the
`vsctl_port` structs is not populated. This can cause a crash if we
attempt to access it. This patch fixes the crash by checking if the
bridge member is valid before attempting to access it. In the
`check_conflicts` function, we print both the port name and the bridge
name if routing is disabled and we only print the port name if routing
is enabled.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-12 00:30:40 +02:00
James Raphael Tiovalen
e769387b42 file, monitor: Add null pointer assertions for old and new ovsdb_rows.
This commit adds non-null pointer assertions in some code that performs
some decisions based on old and new input ovsdb_rows.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-12 00:29:08 +02:00
James Raphael Tiovalen
e71f1a2da1 ovsdb: Assert and check return values of ovsdb_table_schema_get_column.
This commit adds a few null pointer assertions and checks to some return
values of `ovsdb_table_schema_get_column`. If a null pointer is
encountered in these blocks, either the assertion will fail or the
control flow will now be redirected to alternative paths which will
output the appropriate error messages.

A few ovsdb-rbac and ovsdb-server tests are also updated to verify the
expected warning logs by adding said logs to the ALLOWLIST of the
OVSDB_SERVER_SHUTDOWN statements.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-12 00:22:02 +02:00
Ilya Maximets
00782baac0 AUTHORS: Add Sayali Naval.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-12 00:14:54 +02:00
Sayali Naval
8e073791d4 bridge: Fix unexpected values for IPFIX enable-input/output-sampling.
As per the Open vSwitch Manual ovs-vsctl(8) the Bridge IPFIX parameters
can be passed as follows:

  ovs-vsctl -- set Bridge br0 ipfix=@i \
    --  --id=@i create  IPFIX targets=\"192.168.0.34:4739\" \
        obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 \
        cache_max_flows=13 \
        other_config:enable-input-sampling=false \
        other_config:enable-output-sampling=false

where the default values are:

  enable_input_sampling: true
  enable_output_sampling: true

But in the existing code these 2 parameters take up unexpected values
in some scenarios:

  be_opts.enable_input_sampling = !smap_get_bool(&be_cfg->other_config,
                                        "enable-input-sampling", false);

  be_opts.enable_output_sampling = !smap_get_bool(&be_cfg->other_config,
                                        "enable-output-sampling", false);

Here, the function smap_get_bool is being used with a negation.

This returns expected values for the default case (since the above code
will negate “false” we get from smap_get bool function and return the
value “true”) but unexpected values for the case where the sampling
value is passed through the CLI.
For example, if we pass "true" for other_config:enable-input-sampling
in the CLI, the above code will negate the “true” value we get from
the smap_bool function and return the value “false”. Same would be the
case for enable_output_sampling.

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Sayali Naval <sanaval@cisco.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-12 00:12:49 +02:00
Robin Jarry
fc06ea9a18 netdev-dpdk: Add custom rx-steering configuration.
Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.

Use the rte_flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. Re-program the RSS redirection table to only
use the other Rx queues.

The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.

This feature must be enabled per port on specific protocols via the
rx-steering option. This option takes "rss" followed by a "+" separated
list of protocol names. It is only supported on ethernet ports. This
feature is experimental.

If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control packets. If the hardware
cannot satisfy the number of requested Rx queues, the last Rx queue will
be assigned for control plane. If only one Rx queue is available, the
rx-steering feature will be disabled. If the hardware does not support
the rte_flow matchers/actions, the rx-steering feature will be
completely disabled on the port and regular rss will be performed
instead.

It cannot be enabled when other-config:hw-offload=true as it may
conflict with the offloaded flows. Similarly, if hw-offload is enabled,
custom rx-steering will be forcibly disabled on all ports and replaced
by regular rss.

Example use:

 ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
   set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \
   set interface phy0 options:rx-steering=rss+lacp -- \
   set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \
   set interface phy1 options:rx-steering=rss+lacp

As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.

To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.

   +----------------------+
   |         DUT          |
   |+--------------------+|
   ||       br-int       || in_port=patch10,actions=mod_dl_src:$patch11,
   ||                    ||                         mod_dl_dst:$tgen0,
   ||                    ||                         output:patch10
   ||                    || in_port=patch11,actions=mod_dl_src:$patch10
   ||                    ||                         mod_dl_dst:$tgen0,
   || patch10    patch11 ||                         output:patch10
   |+---|-----------|----+|
   |    |           |     |
   |+---|-----------|----+|
   || patch00    patch01 ||
   ||  tag:10    tag:20  ||
   ||                    ||
   ||       br-phy       || default flow, action=NORMAL
   ||                    ||
   ||       bond0        || balance-slb, lacp=passive, lacp-time=fast
   ||    phy0   phy1     ||
   |+------|-----|-------+|
   +-------|-----|--------+
           |     |
   +-------|-----|--------+
   |     port0  port1     | balance L3/L4, lacp=active, lacp-time=fast
   |         lag          | mode trunk VLANs 10, 20
   |                      |
   |        switch        |
   |                      |
   |  vlan 10    vlan 20  |  mode access
   |   port2      port3   |
   +-----|----------|-----+
         |          |
   +-----|----------|-----+
   |   tgen0      tgen1   |  Random traffic that is properly balanced
   |                      |  across the bond ports in both directions.
   |  traffic generator   |
   +----------------------+

Without rx-steering, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.

 ~# ovs-appctl lacp/show-stats bond0
 ---- bond0 statistics ----
 member: phy0:
   TX PDUs: 347246
   RX PDUs: 14865
   RX Bad PDUs: 0
   RX Marker Request PDUs: 0
   Link Expired: 168
   Link Defaulted: 0
   Carrier Status Changed: 0
 member: phy1:
   TX PDUs: 347245
   RX PDUs: 14919
   RX Bad PDUs: 0
   RX Marker Request PDUs: 0
   Link Expired: 147
   Link Defaulted: 1
   Carrier Status Changed: 0

When rx-steering is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput. Neither
the "Link Expired" nor the "Link Defaulted" counters are incremented
anymore.

This feature may be considered as "QoS". However, it does not work by
limiting the rate of traffic explicitly. It only guarantees that some
protocols have a lower chance of being dropped because the PMD cores
cannot keep up with regular traffic.

The choice of protocols is limited on purpose. This is not meant to be
configurable by users. Some limited configurability could be considered
in the future but it would expose to more potential issues if users are
accidentally redirecting all traffic in the isolated queue.

Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-10 15:49:44 +02:00
David Marchand
a5669fd51c netdev-dpdk: Drop TSO in case of conflicting virtio features.
At some point in OVS history, some virtio features were announced as
supported (ECN and UFO virtio features).

The userspace TSO code, which has been added later, does not support
those features and tries to disable them.

This breaks OVS upgrades: if an existing VM already negotiated such
features, their lack on reconnection to an upgraded OVS triggers a
vhost socket disconnection by Qemu.
This results in an endless loop because Qemu then retries with the same
set of virtio features.

This patch proposes to try and detect those vhost socket disconnection
and fallback restoring the old virtio features (and disabling TSO for
this vhost port).

Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-07 18:05:48 +02:00
Gavin Li
b4c7009c20 system-offloads-traffic.at: Add vxlan gbp offload test.
Add a vxlan gbp offload test case:

  vxlan offloads with gbp extention - ping between two ports - offloads
enabled ok

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
7f04588d78 netdev-tc-offloads: Probe for allowing vxlan gbp support.
Kernels that do not support vxlan gbp would treat the rule that has vxlan
gbp encap action or vxlan gbp id match differently, either reject it or
just skip the action/match and continue processing the knowing ones.

To solve the issue, probe and disallow inserting rules with vxlan gbp
action/match if kernel does not support it.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
a2a3f1983f tc: Add vxlan encap action with gbp option offload.
Add TC offload support for vxlan encap with gbp option.

Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
256c1e5819 tc: Pass encap entirely to nl_msg_put_act_tunnel_key_set.
Most of the data members of struct tc_action{ } are defined as anonymous
struct in place. Instead of passing all members of an anonymous struct,
which is not flexible to new members being added, expose encap as named
struct and pass it entirely.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
a4332b5e68 tc: Add vxlan gbp option flower match offload.
Add TC offload support for filtering vxlan tunnels with gbp option.

Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
c39d7d06f5 netlink: Add new function to add NLA_F_NESTED to nested netlink messages.
Linux kernel netlink module added NLA_F_NESTED flag checking for nested
netlink messages in 5.2. A nested message without the flag set will be
treated as malformatted one. The check is optional and is controlled by
message policy. To avoid this, add NLA_F_NESTED explicitly for all
nested netlink messages with a new function
nl_msg_start_nested_with_flag().

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
31baa7781e odp-util: Extract vxlan gbp option encoding to a function.
Extract vxlan gbp option encoding to odp_encode_gbp_raw to be used in
following commits.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
8c3d5488da odp-util: Extract vxlan gbp option decoding to a function.
Extract vxlan gbp option decoding to odp_decode_gbp_raw to be used in
following commits.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
affb9b8183 tc: Pass tunnel entirely to tunnel option parse and put functions.
Tc flower tunnel key options were encoded in nl_msg_put_flower_tunnel_opts
and decoded in nl_parse_flower_tunnel_opts. Only geneve was supported.

To avoid adding more arguments to the function to support more vxlan
options in the future, change the function arguments to pass tunnel
entirely to it instead of keep adding new arguments.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Ilya Maximets
c2433bdfc0 dpif-netdev: Lockless meters.
Current implementation of meters in the userspace datapath takes
the meter lock for every packet batch.  If more than one thread
hits the flow with the same meter, they will lock each other.

Replace the critical section with atomic operations to avoid
interlocking.  Meters themselves are RCU-protected, so it's safe
to access them without holding a lock.

Implementation does the following:

 1. Tries to advance the 'used' timer of the meter with atomic
    compare+exchange if it's smaller than 'now'.
 2. If the timer change succeeds, atomically update band buckets.
 3. Atomically update packet statistics for a meter.
 4. Go over buckets and try to atomically subtract the amount of
    packets or bytes, recording the highest exceeded band.
 5. Atomically update band statistics and drop packets.

Bucket manipulations are implemented with atomic compare+exchange
operations with extra checks, because bucket size should never
exceed the maximum and it should never go below zero.

Packet statistics may be momentarily inconsistent, i.e., number
of packets and the number of bytes may reflect different sets
of packets.  But it should be eventually consistent.  And the
difference at any given time should be in just few packets.

For the sake of reduced code complexity PKTPS meter tries to push
packets through the band one by one, even though they all have
the same weight.  This is also more fair if more than one thread
is passing packets through the same band at the same time.
Trying to predict the number of packets that can pass may also
cause extra atomic operations reducing the performance.

This implementation shows similar performance to the previous one,
but should scale better with more threads hitting the same meter.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Lin Huang <linhuang@ruijie.com.cn>
Tested-by: Zhang YuHuang <zhangyuhuang@ruijie.com.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-07-01 00:35:18 +02:00
Han Zhou
2ece9c9ac1 ovsdb: raft: Fix RAFT paper link.
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-30 00:03:36 +02:00
Paolo Valerio
9b4d2ad8e8 conntrack: Allow to dump userspace conntrack expectations.
The patch introduces a new commands ovs-appctl dpctl/dump-conntrack-exp
that allows to dump the existing expectations for the userspace ct.

Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-29 22:20:43 +02:00
Kevin Traynor
34ace16cb8 tests: Add macro to common file.
get_log_next_line_num() was defined in alb.at.

As it may be useful in other test files, move to
ofproto-macros.at.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-29 22:13:55 +02:00
Dumitru Ceara
d56932aac6 checkpatch: Ignore yml files when checking line lengths.
As far as I can tell they're used mostly for CI job definitions and
these tend to result in long lines.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405796.html
Suggested-by: Aaron Conole <aconole@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-28 12:39:31 +02:00
Eelco Chaudron
903294cde6 dpif: Add coverage counters for dpif_operate() failures.
Add additional error coverage counters for dpif operation failures.
This could help to quickly identify netlink problems when communicating
with the OVS kernel module.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2070630
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-23 15:04:55 +02:00
Simon Horman
c918670302 MAINTAINERS: Add Eelco Chaudron.
Eelco Chaudron was elected by the Open vSwitch committers yesterday.
This formalises his status as an Open vSwitch committer.

Welcome Eelco!

Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-21 15:02:47 +02:00