2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

19352 Commits

Author SHA1 Message Date
Kevin Traynor
c591827ec0 dpif-netdev: Fix PMD auto load balance with pmd-rxq-isolate.
There are currently some checks for cross-numa polling cases to
ensure that they won't effect the accuracy of the PMD ALB.

If an rxq is pinned to a PMD thread core by the user it will not
be reassigned by OVS, so even if it is non-local numa polled it
will not impact PMD ALB accuracy.

To establish this, a check was made on whether the PMD thread core
was isolated or not. However, since other_config:pmd-rxq-isolate was
introduced, rxqs may be pinned but the PMD thread core not isolated.

It means that by setting pmd-rxq-isolate=false and doing non-local
numa pinning, PMD ALB may not run where it should.

If the PMD thread core is isolated we can skip individual rxq checks
but if not, we should check the individual rxqs for pinning before we
disallow PMD ALB.

Also, update function comments to make it's operation clearer.

Fixes: 6193e03267c1 ("dpif-netdev: Allow pin rxq and non-isolate PMD.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 22:52:12 +02:00
Kevin Traynor
cdc9a196b1 pmd.at: Add tests for multi non-local numa pmds.
Ensure that if there are no local numa PMD thread
cores available that pmd cores from all other non-local
numas will be used.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 22:52:12 +02:00
Kevin Traynor
da6ce41d80 dpif-netdev: Fix non-local numa selection for more than two numas.
This issue only occurs when there are more than 2 numa nodes
and no local numa PMD thread cores available for an interface
rxq.

In the event of no PMD thread cores available on the local numa
for an rxq to be assigned to, a PMD thread core from a non-local
numa is selected.

If there are more than one non-local numas with PMD thread cores
they are RR through and checked if they have non-isolated PMD thread
cores.

When successfully finding a non-local numa with available PMD
thread cores for an rxq, that numa was not being stored. It meant
if a similar situation occurred for a subsequent rxq, the same numa
would be selected again.

Store the last numa used when successfully finding a non-local numa
with available PMD thread cores, so the numa RR state is kept for subsequent
rxqs.

Fixes: f577c2d046b2 ("dpif-netdev: Rework rxq scheduling code.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 22:52:12 +02:00
Kevin Traynor
4b5c3b66aa dpif-netdev: Fix typo in function name.
Rename pmd_reblance_dry_run_needed() to
pmd_rebalance_dry_run_needed().

Fixes: a83a406096e9 ("dpif-netdev: Sync PMD ALB state with user commands.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 22:52:12 +02:00
Ilya Maximets
8ff9dec468 AUTHORS: Add Abhiram R N.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 22:52:12 +02:00
Abhiram R N
7539b4e45a netdev-vport: Register IFINDEX for ERSPAN device.
When enabling offload for ERSPAN we are seeing one error as below.

netdev_offload_tc|INFO|init: failed to get ifindex for erspan0:
                             Operation not supported
netdev_offload|INFO|erspan0: No suitable flow API found.

Adding the NETDEV_VPORT_GET_IFINDEX to ERSPAN device resolves this
error.

Signed-off-by: Abhiram R N <abhiramrn@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 21:49:27 +02:00
Mike Pattrick
3a3a763349 signals: Add support for sigdescr_np.
In glibc 2.32 sys_siglist is no longer exported. The MT-safe function
sigdescr_np() is now available for the same purpose.

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 21:46:29 +02:00
Wentao Jia
e3de0bd82d python: idl: Set cond_changed to false if last id is zero.
After reconnection, cond_changed will be set to true, poll will be
called and never block causing cpu high load forever.

Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Wentao Jia <wentao.jia@easystack.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 21:39:54 +02:00
Paolo Valerio
0027b3b46c ofproto-dpif-xlate: Fix NULL pointer dereference in xlate_normal().
Considering the following flows:

ovs-ofctl dump-flows br0
 cookie=0x0, table=0, priority=0 actions=NORMAL

and assuming a packet originated from packet-out in this way:

ovs-ofctl packet-out br0 \
    "in_port=controller,packet=<UDP packet>,action=ct(table=0)"

If in_port is OFPP_NONE or OFPP_CONTROLLER, this leads to a
NULL pointer (xport) dereference in xlate_normal().

Fix it by checking the xport pointer validity while deciding whether
it is a candidate for mac learning or not.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 21:29:37 +02:00
Wan Junjie
efa6f1f2e0 ofproto/ofproto-dpif: Fix dpif_type for userspace tunnels.
When we create two or more tunnels with the same type, only the first
tunnel will be added by dpif since they share the same datapath port.
Set the dpif_type here will clear the ioctl error logs.

Fixes: 4f19a78 ("netdev-vport: Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.")
Signed-off-by: Wan Junjie <wanjunjie@bytedance.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-04 20:52:16 +02:00
Adrian Moreno
b16270e69b sset: add SHORT version of SAFE loop macros.
Add SHORT version of SAFE loop macros and overload the current macro
name to keep backwards compatibility.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Adrian Moreno
7aff8a5117 sparse: bump recommended version and include headers.
It seems versions older than 0.6.2 generate false positives. Bump the
recommended version and make sure we use the right headers from the ovs
tree.

Suggested-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Adrian Moreno
b54067b24a idlc: support short version of SAFE macros.
In order to be consistent with the rest of the SAFE loop macros,
overload each of the generated *_SAFE macro with a SHORT version that
does not require the user to provide the NEXT variable.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Adrian Moreno
d293965d7b rculist: use multi-variable helpers for loop macros.
Use multi-variable iteration helpers to rewrite rculist loops macros.

There is an important behavior change compared with the previous
implementation: When the loop ends normally (i.e: not via "break;"),
the object pointer provided by the user is NULL. This is safer
because it's not guaranteed that it would end up pointing a valid
address.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Adrian Moreno
745c80f52c hindex: remove the next variable in safe loops.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for HINDEX_*_SAFE.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Adrian Moreno
2d40277382 hindex: use multi-variable iterators.
Re-write hindex's loops using multi-variable helpers.

For safe loops, use the LONG version to maintain backwards
compatibility.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Adrian Moreno
ef39616486 cmap: use multi-variable iterators.
Re-write cmap's loops using multi-variable helpers.

For iterators based on cmap_cursor, we just need to make sure the NODE
variable is not used after the loop, so we set it to NULL.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
9e56549c2b hmap: use short version of safe loops if possible.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
860e69a8c3 hmap: implement UB-safe hmap pop iterator.
HMAP_FOR_EACH_POP iterator has an additional difficulty, which is the
use of two iterator variables of different types.

In order to re-write this loop in a UB-safe manner, create a iterator
struct to be used as loop variable.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
9e8d960a6b hmap: use multi-variable helpers for hmap loops.
Rewrite hmap's loops using multi-variable helpers.

For SAFE loops, use the LONG version of the multi-variable macros to
keep backwards compatibility.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
e9bf5bffb0 list: use short version of safe loops if possible.
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
d4566085ed list: use multi-variable helpers for list loops.
Use multi-variable iteration helpers to rewrite non-safe loops.

There is an important behavior change compared with the previous
implementation: When the loop ends normally (i.e: not via "break;"), the
object pointer provided by the user is NULL. This is safer because it's
not guaranteed that it would end up pointing a valid address.

For pop iterator, set the variable to NULL when the loop ends.

Clang-analyzer has successfully picked the potential null-pointer
dereference on the code that triggered this change (bond.c) and nothing
else has been detected.

For _SAFE loops, use the LONG version for backwards compatibility.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
5a2940978e util: add helpers to overload SAFE macro.
Having both LONG and SHORT versions of the SAFE macros with different
names is not very convenient. Add helpers that facilitate overloading
such macros using a single name.

In order to work around a known issue in MSVC [1], an indirection layer
has to be introduced.

[1]
https://developercommunity.visualstudio.com/t/-va-args-seems-to-be-trated-as-a-single-parameter/460154

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
882689711f util: add safe multi-variable iterators.
Safe version of multi-variable iterator helpers declare an internal
variable to store the next value of the iterator temporarily.

Two versions of the macro are provided, one that still uses the NEXT
variable for backwards compatibility and a shorter version that does not
require the use of an additional variable provided by the user.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
187a602fa0 util: add multi-variable loop iterator macros.
Multi-variable loop iterators avoid potential undefined behavior by
using an internal iterator variable to perform the iteration and only
referencing the containing object (via OBJECT_CONTAINING) if the
iterator has been validated via the second expression of the for
statement.

That way, the user can easily implement a loop that never tries to
obtain the object containing NULL or stack-allocated non-contained
nodes.

When the loop ends normally (not via "break;") the user-provided
variable is set to NULL.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Ilya Maximets
08e9e53373 ovsdb: raft: Fix inability to read the database with DNS host names.
Clustered OVSDB allows to use DNS names as addresses of raft members.
However, if DNS resolution fails during the initial database read,
this causes a fatal failure and exit of the ovsdb-server process.

Also, if DNS name of a joining server is not resolvable for one of the
followers, this follower will reject append requests for a new server
to join until the name is successfully resolved.  This makes a follower
effectively non-functional while DNS is unavailable.

To fix the problem relax the address verification.  Allowing validation
to pass if only name resolution failed and the address is valid
otherwise.  This will allow addresses to be added to the database, so
connections could be established later when the DNS is available.

Additionally fixing missed initialization of the dns-resolve module.
Without it, DNS requests are blocking.  This causes unexpected delays
in runtime.

Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving")
Reported-at: https://bugzilla.redhat.com/2055097
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Ilya Maximets
d96d14b147 openvswitch.h: Align uAPI definition with the kernel.
Upstream commit:
    commit 1926407a4ab0e59d5a27bed7b82029b356d80fa0
    Author: Ilya Maximets <i.maximets@ovn.org>
    Date:   Wed Mar 9 23:20:33 2022 +0100

    net: openvswitch: fix uAPI incompatibility with existing user space

    Few years ago OVS user space made a strange choice in the commit [1]
    to define types only valid for the user space inside the copy of a
    kernel uAPI header.  '#ifndef __KERNEL__' and another attribute was
    added later.

    This leads to the inevitable clash between user space and kernel types
    when the kernel uAPI is extended.  The issue was unveiled with the
    addition of a new type for IPv6 extension header in kernel uAPI.

    When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the
    older user space application, application tries to parse it as
    OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as
    malformed.  Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with
    every IPv6 packet that goes to the user space, IPv6 support is fully
    broken.

    Fixing that by bringing these user space attributes to the kernel
    uAPI to avoid the clash.  Strictly speaking this is not the problem
    of the kernel uAPI, but changing it is the only way to avoid breakage
    of the older user space applications at this point.

    These 2 types are explicitly rejected now since they should not be
    passed to the kernel.  Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved
    out from the '#ifdef __KERNEL__' as there is no good reason to hide
    it from the userspace.  And it's also explicitly rejected now, because
    it's for in-kernel use only.

    Comments with warnings were added to avoid the problem coming back.

    (1 << type) converted to (1ULL << type) to avoid integer overflow on
    OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now.

     [1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")

    Fixes: 28a3f0601727 ("net: openvswitch: IPv6: Add IPv6 extension header support")
    Link: https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8a5f@nvidia.com
    Link: beb75a40fd
    Reported-by: Roi Dayan <roid@nvidia.com>
    Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
    Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Acked-by: Aaron Conole <aconole@redhat.com>
    Link: https://lore.kernel.org/r/20220309222033.3018976-1-i.maximets@ovn.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Not adding OVS_KEY_ATTR_IPV6_EXTHDRS in this commit as this is not
necessary.  Will be added along with the actual userspace
implementation.

This change should help avoiding incompatibility issues in the future.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-25 20:32:12 +01:00
Ilya Maximets
9d86459516 system-traffic.at: Fix flaky DNAT load balancing test.
'conntrack - DNAT load balancing' test fails from time to time
because not all the group buckets are getting hit.

In short, the test creates a group with 3 buckets with the same
weight.  It creates 12 TCP sessions and expects that each bucket
will be used at least once.  However, there is a solid chance that
this will not happen.  The probability of having at least one
empty bucket is:

  C(3, 1) x (2/3)^N - C(3, 2) x (1/3)^N

Where N is the number of distinct TCP sessions.  For N=12, the
probability is about 0.023, i.e. there is a 2.3% chance for a
test to fail, which is not great for CI.

Increasing the number of sessions to 50 to reduce the probability
of failure down to 4.7e-9.  In my testing, the configuration with
50 TCP sessions didn't fail after 6000 runs.  Should be good
enough for CI systems.

Fixes: 2c66ebe47a88 ("ofp-actions: Allow conntrack action in group buckets.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Michael Phelan <michael.phelan@intel.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-25 20:30:37 +01:00
Ilya Maximets
5f76d0dede Set release date for 2.17.0.
Added a NEWS entry for OVSDB performance because it is user-visible.
It was not previously mentioned since it's an aggregated result of
various commits.

Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-24 17:35:58 +01:00
Eli Britstein
635cb95e0c dpif-netdev: Keep orig_in_port as a field of the flow.
A flow may be modified after its initial offload failed. In this case,
according to [1], the modification is handled as a flow add.
For a vport flow "add", the orig_in_port should be provided.
Keep that field in the flow struct, so it can be provided in the flow
modification use case.

[1] 0d25621e4d9f ("dpif-netdev: Fix flow modification after failure.")

Fixes: b5e6f6f6bfbe ("dpif-netdev: Provide orig_in_port in metadata for tunneled packets.")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-22 22:12:24 +01:00
Ilya Maximets
2e2217c126 tests: Fix incorrect usage of OVS_WAIT_UNTIL.
OVS_WAIT_UNTIL() macro has only 2 arguments and doesn't check the
output of the command, but bonding and route tests are trying to use
it as if it was AT_CHECK macro.  That makes checks in those tests
mostly useless, since they are not actually checking anything except
for command returning zero.

Introducing a new macro OVS_WAIT_UNTIL_EQUAL that will actually
perform the comparison with the desired output.  Using it for
the bonding and route tests and fixing all the caught incorrect
expected outputs along the way.

Adding an explicit argument check to the OVS_WAIT_UNTIL/WHILE to
avoid the problem in the future.

Fixes: b4e50218a0f8 ("bond: Add 'primary' interface concept for active-backup mode.")
Fixes: 9e11517e6ca6 ("ovs-router: Fix flushing of local routes.")
Acked-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-22 22:09:34 +01:00
Eelco Chaudron
31b467a751 odp-util: Fix output for tc to be equal to kernel.
When the same flow is programmed in the kernel and tc, they
look different due to the way they are translated. They take
the userspace approach by always including the packet type
attribute. To make the outputs the same, show the ethernet
header when the packet type is wildcarded, and not printed.

So without the fix the kernel would show (ovs-appctl dpctl/dump-flows):

  in_port(3),eth(),eth_type(0x0800),ipv4(frag=no), ..., actions:output

Where as TC would show:

  in_port(3),eth_type(0x0800),ipv4(frag=no), ..., actions:output

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:37:41 +01:00
Eelco Chaudron
6d76cfc444 netdev-offload-tc: Fix IP and port ranges in flower returns.
When programming NAT rules OVS only sets the minimum value for a
single IP/port value. However, responses from flower will always
return min == max for single IP/port values. This is causing the
verification to fail as the request is different than the response.
To avoid this, we will update the response to match the request.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
38298a877b netdev-offload-tc: Fix use of ICMP values instead of masks defines.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
a039636950 netdev-offload-tc: Always include conntrack information to tc.
Regardless of the traffic type, if requested, the conntrack information
should be included to keep the datapath and tc rules in sync.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
db40eb79ec netdev-offload-tc: Check for valid netdev ifindex in flow_put.
Verify that the returned ifindex by netdev_get_ifindex() is valid.
This might not be the case in the ERSPAN port scenario, which can
not be offloaded.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
b4868ee163 netdev-offload-tc: Set the correct VLAN_VID and VLAN_PCP masks.
This change will set the correct VID and PCP masks, as well as the
ethernet type mask.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
2bdf5b288c netdev-offload-tc: Add debug logs on tc rule verify failures.
This patch adds more detailed debug logs on tc verify failures to
ease debugging the actual cause after the fact.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Chris Mi
920ec5761e tc: Keep header rewrite actions order.
Currently, tc merges all header rewrite actions into one tc pedit
action. So the header rewrite actions order is lost. Save each header
rewrite action into one tc pedit action to keep the order. And only
append one tc csum action to the last pedit action of a series.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-18 17:41:14 +01:00
Suneetha Kalahasthi
be93ce40e1 faq: Update OVS/DPDK version table for OVS 2.15/2.16
FAQ is updated to reflect the latest DPDK for OVS branch 2.15 and 2.16

Signed-off-by: Suneetha Kalahasthi <suneetha.kalahasthi@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-03-14 09:34:34 +00:00
Kumar Amber
c44876b9e4 system-dpdk: Fix mfex autovalidator tests.
AVX512 DPIF must be active in order for the MFEX AutoValidator to be executed.
If the DPIF-AVX512 is not available, the unit test is skipped, as the
scalar DPIF does not use the MFEX function-pointer based optimizations.

Fixes: 50be6715c083 ("test/sytem-dpdk: Add unit test for mfex autovalidator")
Suggested-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-11 21:17:48 +01:00
Mike Pattrick
3bd593917c ofp-prop: Silence the 'may be uninitialized' warning.
GCC 11.2.1-2.2 emits a false-positive warnings like:

lib/ofp-packet.c: In function 'ofputil_decode_packet_in':
lib/ofp-packet.c:155:25: warning: 'reason' may be used
    uninitialized [-Wmaybe-uninitialized]
lib/ofp-packet.c: In function 'ofputil_decode_packet_in_private':
lib/ofp-packet.c:886:27: warning: 'value' may be used
    uninitialized [-Wmaybe-uninitialized]

Modifying callers of ofpprop_parse_* functions to always check
the return value before using the value from these functions.

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-11 21:16:02 +01:00
Dumitru Ceara
b1e783dde4 tests: Ignore log about failing to set NETLINK_EXT_ACK.
Since 4a6a4734622e ("netlink-socket: Log extack error messages in
netlink transactions."), tests fail on older systems that don't support
NETLINK_EXT_ACK.  It's not really an issue, so we can just ignore the
log.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-11 21:14:51 +01:00
Ilya Maximets
8d480c5cec ovsdb-cluster.at: Avoid test failures due to different hashing.
Depending on compiler flags and CPU architecture different hash
function are used.  That impacts the order of tables and columns
in database representation making ovsdb report different columns
in the warning about ephemeral-to-persistent conversion.

Stripping out changing parts of the message to avoid the issue.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-11 21:13:33 +01:00
Cian Ferriter
c356f6c0b9 dpif-netdev: Simplify atomic function pointer stores.
The same pattern for atomic stores and initialization was used for the
DPIF and MFEX function pointers declared in struct dp_netdev_pmd_thread.
Simplify this pattern for all stores to 'miniflow_extract_opt' and
'netdev_input_func'.

Also replace the first store to 'miniflow_extract_opt' which was a
atomic_store_relaxed() with atomic_init().

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-11 21:07:34 +01:00
Gaetan Rivet
f77dbc1eb2 ofproto: Use xlate map for uuid lookups.
The ofproto map 'all_ofproto_dpifs_by_uuid' does not support
concurrent accesses. It is however read by upcall handler threads
and written by the main thread at the same time.

Additionally, handler threads will change the ams_seq while
an ofproto is being destroyed, triggering crashes with the
following backtrace:

(gdb) bt
  hmap_next (hmap.h:398)
  seq_wake_waiters (seq.c:326)
  seq_change_protected (seq.c:134)
  seq_change (seq.c:144)
  ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
  process_upcall (ofproto_dpif_upcall.c:1782)
  recv_upcalls (ofproto_dpif_upcall.c:1026)
  udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
  ovsthread_wrapper (ovs_thread.c:734)

To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'.
Instead, another map already storing ofprotos in xlate can be used.

During an ofproto destruction, its reference is removed from the current
xlate xcfg. Such change is committed only after all threads have quiesced
at least once during xlate_txn_commit(). This wait ensures that the
removal is seen by all threads, rendering impossible for a thread to
still hold a reference while the destruction proceeds.

Furthermore, the xlate maps are copied during updates instead of
being written in place. It is thus correct to read xcfg->xbridges while
inserting or removing from new_xcfg->xbridges.

Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges,
it is important to use a high level of entropy. As it used the ofproto pointer
hashed, fewer bits were random compared to the uuid key used in
'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key
in xbridges as well, improving entropy.

Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.")
Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Co-authored-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-07 18:12:19 +01:00
Ilya Maximets
ba4ec29141 AUTHORS: Add Hongzhi Guo.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-07 18:12:19 +01:00
Peng He
b46fd37abe ofproto: Add refcount to ofproto to fix ofproto use-after-free.
From hepeng:
https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/#2487473

also from guohongzhi <guohongzhi1@huawei.com>:
http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongzhi1@huawei.com/

also from a discussion about the mixing use of RCU and refcount in the mail
list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.

A summary, as quoted from Ilya:

"
RCU for ofproto was introduced for one
and only one reason - to avoid freeing ofproto while rules are still
alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
rule destruction.").  The goal was to allow using rules without
refcounting them within a single grace period.  And that forced us
to postpone destruction of the ofproto for a single grace period.
Later commit 39c9459355b6 ("Use classifier versioning.") made it
possible for rules to be alive for more than one grace period, so
the commit made ofproto wait for 2 grace periods by double postponing.
As we can see now, that wasn't enough and we have to wait for more
than 2 grace periods in certain cases.
"

In a short, the ofproto should have a longer life time than rule, if
the rule lasts for more than 2 grace periods, the ofproto should live
longer to ensure rule->ofproto is valid. It's hard to predict how long
a ofproto should live, thus we need to use refcount on ofproto to make
things easy. The controversial part is that we have already used RCU postpone
to delay ofproto destrution, if we have to add refcount, is it simpler to
use just refcount without RCU postpone?

IMO, I think going back to the pure refcount solution is more
complicated than mixing using both.

Gaëtan Rive asks some questions on guohongzhi's v2 patch:

during ofproto_rule_create, should we use ofproto_ref
or ofproto_try_ref? how can we make sure the ofproto is alive?

By using RCU, ofproto has three states:

state 1: alive, with refcount >= 1
state 2: dying, with refcount == 0, however pointer is valid
state 3: died, memory freed, pointer might be dangling.

Without using RCU, there is no state 2, thus, we have to be very careful
every time we see a ofproto pointer. In contrast, with RCU, we can be sure
that it's alive at least in this grace peroid, so we can just check if
it is dying by ofproto_try_ref.

This shows that by mixing use of RCU and refcount we can save a lot of work
worrying if ofproto is dangling.

In short, the RCU part makes sure the ofproto is alive when we use it,
and the refcount part makes sure it lives longer enough.

In this patch, I have merged guohongzhi's patch and mine, and fixes
accoring to the previous comments.

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Co-authored-by: Hongzhi Guo <guohongzhi1@huawei.com>
Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-07 18:08:46 +01:00
Mohammad Heib
7baed8fe6b ovs-monitor-ipsec: Add list-commands command.
Currently ovs-python unixctl implement the list-commands
operation as 'help' command which doesn't match the ovs-appctl
man page and that can confuse the end-users who want to check
the supported operations of the ovs-monitor-ipsec.

This patch adds a list-commands alias name to 'help' operation.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-07 13:19:54 +01:00
lic121
a9f5ee1199 ofproto-dpif: Trigger revalidation when ipfix config set.
Currently, ipfix conf creation/deletion don't trigger dpif backer
revalidation. This is not expected, as we need the revalidation
to commit ipfix into xlate. So that xlate can generate ipfix
actions.

This patch covers only new creation/deletion of ipfix config.
Will upload one more patch to cover ipfix option changes.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: lic121 <lic121@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 20:21:56 +01:00