2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 21:38:13 +00:00

19894 Commits

Author SHA1 Message Date
Ilya Maximets
649dbc19ff github: Test AF_XDP build using libbpf instead of kernel sources.
AF_XDP bits was removed from kernel's libbpf in 6.0.  libbpf
and libxdp are now primary way to build AF_XDP applications.
Most of modern distributions are already packaging some version
of libbpf, so it's better to test building with it instead
of building old unsupported kernel tree.

Ubuntu started packaging libxdp only in 22.10, so not using
it for now.

Kernel build infrastructure in CI scripts is not needed anymore.
Removed.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
b17cadff1d netdev-afxdp: Hide too large memset from sparse.
Sparse complains about 64M umem initialization.  Hide it from
the checker instead of disabling a warning globally.

SPARSE_FLAGS are kept in the CI script even though they are
empty at the moment.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:30 +01:00
Ilya Maximets
1dcc490d44 netdev-afxdp: Allow building with libxdp and newer libbpf.
AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp.
Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8
and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach().

Updating configuration and source code to accommodate above changes
and allow building OVS with AF_XDP support on newer systems:

 - Checking the version of libbpf by detecting availability
   of bpf_xdp_detach.

 - Checking availability of the libxdp in a system by looking
   for a library providing libxdp_strerror(), if libbpf is
   newer than 0.6.  And checking for xsk.h header provided by
   libxdp-dev[el].

 - Use xsk.h from libbpf if it is older than 0.7 and not linking
   with libxdp in this case as there are known incompatible
   versions of libxdp in distributions.

 - Check for the NEED_WAKEUP feature replaced with direct checking
   in the source code if XDP_USE_NEED_WAKEUP is defined.

 - Checking availability of bpf_xdp_query_id and bpf_xdp_detach
   and using them instead of deprecated APIs.  Fall back to old
   functions if not found.

 - Dropped LIBBPF_LDADD variable as it makes library and function
   detection much harder without providing any actual benefits.
   AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS.

 - Header includes moved around to files where they are actually used.

 - Removed libelf dependency as it is not really used.

With these changes it should be possible to build OVS with either:

 - libbpf built from the kernel sources (5.19 or older).
 - libbpf < 0.7 provided in distributions.
 - libxdp and libbpf >= 0.7 provided in newer distributions.

While it is technically possible to build with libbpf 0.7+ without
libxdp at the moment we're not allowing that for a few reasons.
First, required functions in libbpf are deprecated and can be removed
in future releases.  Second, support for all these combinations makes
the detection code fairly complex.
AFAIK, most of the distributions packaging libbpf 0.7+ do package
libxdp as well.

libxdp added as a build dependency for Fedora build since all
supported versions of Fedora are packaging this library.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 16:06:21 +01:00
Ilya Maximets
0d8318db63 netdev-afxdp: Disable -Wfree-nonheap-object on receive.
GCC 11+ generates a warning:

  In file included from lib/netdev-linux-private.h:30,
                   from lib/netdev-afxdp.c:19:
  In function 'dp_packet_delete',
      inlined from 'dp_packet_delete' at lib/dp-packet.h:246:1,
      inlined from 'dp_packet_batch_add__' at lib/dp-packet.h:775:9,
      inlined from 'dp_packet_batch_add' at lib/dp-packet.h:783:5,
      inlined from 'netdev_afxdp_rxq_recv' at lib/netdev-afxdp.c:898:9:
  lib/dp-packet.h:260:9: warning: 'free' called on pointer
    '*umem.xpool.array' with nonzero offset [8, 2558044588346441168]
    [-Wfree-nonheap-object]
    260 |         free(b);
        |         ^~~~~~~

But it is a false positive since the code path is not possible.
In this call chain the packet will always have source DPBUF_AFXDP
and the free() will never be called.  GCC doesn't see that, because
initialization function dp_packet_use_afxdp() is part of a different
translation unit.

Disabling a warning in this particular place to avoid build failures.

Older versions of clang do not have the -Wfree-nonheap-object, so we
need to additionally guard the pragmas.  Clang is using GCC pragmas
and complains about unknown ones.

Reported-at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108187
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 12:39:32 +01:00
Ilya Maximets
d83d7c4915 ci: Fix overriding OPTS provided from the yml.
For GCC builds we're overriding --disable-ssl or --enable-shared
options set up in the GHA yml file.

Fix that by adding to EXTRA_OPTS instead.

Fixes: 2581b0ad1159 ("travis: Combine kernel builds.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-03 12:39:28 +01:00
Cheng Li
46e04ec31b dpif-netdev: Calculate per numa variance.
Currently, pmd_rebalance_dry_run() calculate overall variance of
all pmds regardless of their numa location. The overall result may
hide un-balance in an individual numa.

Considering the following case. Numa0 is free because VMs on numa0
are not sending pkts, while numa1 is busy. Within numa1, pmds
workloads are not balanced. Obviously, moving 500 kpps workloads from
pmd 126 to pmd 62 will make numa1 much more balance. For numa1
the variance improvement will be almost 100%, because after rebalance
each pmd in numa1 holds same workload(variance ~= 0). But the overall
variance improvement is only about 20%, which may not trigger auto_lb.

```
numa_id   core_id      kpps
      0        30         0
      0        31         0
      0        94         0
      0        95         0
      1       126      1500
      1       127      1000
      1        63      1000
      1        62       500
```

As auto_lb doesn't balance workload across numa nodes. So it makes
more sense to calculate variance improvement per numa node.

Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 22:15:47 +01:00
Kevin Traynor
ad6e506fcb dpif-netdev: Rename pmd_info_show_rxq variables.
There are some similar readings taken for pmds and Rx queues
in this function and a few of the variable names are ambiguous.

Improve the readability of the code by updating some variables
names to indicate that they are readings related to the pmd.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:58:30 +01:00
Kevin Traynor
e9ab15f4f8 docs: Add documentation for pmd-rxq-show secs parameter.
Add description of new '-secs' parameter in docs. Also, add to NEWS as
it is a user facing change.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:58:10 +01:00
Kevin Traynor
526230bfab dpif-netdev: Make pmd-rxq-show time configurable.
pmd-rxq-show shows the Rx queue to pmd assignments as well as the
pmd usage of each Rx queue.

Up until now a tail length of 60 seconds pmd usage was shown
for each Rx queue, as this is the value used during rebalance
to avoid any spike effects.

When debugging or tuning, it is also convenient to display the
pmd usage of an Rx queue over a shorter time frame, so any changes
config or traffic that impact pmd usage can be evaluated more quickly.

A parameter is added that allows pmd-rxq-show stats pmd usage to
be shown for a shorter time frame. Values are rounded up to the
nearest 5 seconds as that is the measurement granularity and the value
used is displayed. e.g.

$ ovs-appctl dpif-netdev/pmd-rxq-show -secs 5
 Displaying last 5 seconds pmd usage %
 pmd thread numa_id 0 core_id 4:
   isolated : false
   port: dpdk0            queue-id:  0 (enabled)   pmd usage: 95 %
   overhead:  4 %

The default time frame has not changed and the maximum value
is limited to the maximum stored tail length (60 seconds).

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:57:29 +01:00
David Marchand
9a86a3dd68 travis: Drop support.
Following a change in the terms of use, free Travis credits are really
too low for a realistic usage by OVS contributors.
As a consequence, testing OVS with Travis has been abandoned by most
(if not all) contributors to the project.

Drop the Travis configuration from our repository, clean references in
the documentation and move GHA specifics to the association yml.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:23:42 +01:00
Eelco Chaudron
d5469cb743 Makefile: Add USDT scripts to make install and fedora/debian test rpm.
This change will install all the USDT scripts to the
{_datadir}/openvswitch/scripts/usdt directory with the
make install command.

In addition it will also add them to the Fedora
and Debian openvswitch-test rpm.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:10:58 +01:00
Dan Williams
685973a9f1 ovsdb-server: Don't log when memory-trim-on-compaction doesn't change.
But log at least once even if the value hasn't changed, for
informational purposes.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 20:02:50 +01:00
Adrian Moreno
863d2e1a8c python: Don't exit OFPFlow constructor.
Returning None in a constructor does not make sense and is just error
prone.  Removing what was a leftover from an attempt to handle a common
error case of trying to parse what is commonly outputted by ovs-ofctl.
This should be done by the caller anyway.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 19:48:07 +01:00
Adrian Moreno
22eb224386 tests: Verify flows in odp.at are parseable.
Create a small helper script and check that flows tested in odp.at are
parseable.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 19:18:57 +01:00
Adrian Moreno
fc3f918cb5 tests: Verify flows in ofp-actions are parseable.
Create a small helper script and check that flows used in ofp-actions.at
are parseable.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 19:18:38 +01:00
Adrian Moreno
c395e9810e python: Interpret free keys as output in clone.
clone-like actions can also output to ports by specifying the port name.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
542fdad701 python: Fix output=CONTROLLER action.
When CONTROLLER is used as free key, it means output=CONTROLLER which is
handled by decode_controller. However, it must output the KV in the
right format: "output": {"format": "CONTROLLER"}.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
1850e5e689 python: Support case-insensitive OpenFlow actions.
OpenFlow actions names can be capitalized so in order to support this,
support case-insensitive KVDecoders and use it in Openflow actions.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
75a6e8db9c python: Return list of actions for odp action clone.
Sometimes we don't want to return the result of a nested key-value
decoding as a dictionary but as a list of dictionaries. This happens
when we parse actions where keys can be repeated.

Refactor code that already takes that into account from ofp_act.py to
kv.py and use it for datapath action "clone".

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
d33e548fc7 python: Make key-value matching strict by default.
Currently, if a key is not found in the decoder information, we use the
default decoder which typically returns a string.

This not only means we can go out of sync with the C code without
noticing but it's also error prone as malformed flows could be parsed
without warning.

Make KeyValue parsing strict, raising an error if a decoder is not found
for a key.
This behaviour can be turned off globally by running 'KVDecoders.strict
= False' but it's generally not recommended. Also, if a KVDecoder does
need this default behavior, it can be explicitly configured specifying
it's default decoder.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
fe204743cb python: Add explicit decoders for all ofp actions.
We were silently relying on some ofp actions to be decoded by the
default decoder which would yield decent string values.

In order to be more safe and robust, add an explicit decoder for all
missing actions.

This patch also reworks the learn action decoding to make it more
explicit and verify all the fields specified in the learn action are
actually valid fields.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
3648fec08f python: Include aliases in ofp_fields.py.
We currently auto-generate a dictionary of field names and decoders.
However, sometimes fields can be specified by their cannonical NXM or
OXM names.

Modify gen_ofp_field_decoders to also generate a dictionary of aliases
so it's easy to map OXM/NXM names to their fields and decoding
information.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Adrian Moreno
c627cfd9cb python: Fix datapath flow decoders.
Fix the following erros in odp decoding:
- Missing push_mpls action
- Typos in collector_set_id, tp_src/tp_dst and csum
- Missing two fields in vxlan match

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-21 18:36:02 +01:00
Cian Ferriter
9855f35dd2 dpif-netdev/mfex: Add AVX512 NVGRE traffic profiles.
A typical NVGRE encapsulated packet starts with the ETH/IP/GRE
protocols.  Miniflow extract will parse just the ETH and IP headers. The
GRE header will be processed later as part of the pop action. Add
support for parsing the ETH/IP headers in this scenario.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-12-21 15:44:17 +00:00
Cian Ferriter
363cc26839 dpif-netdev/dpcls: Specialize 8, 1 and 5, 2 signatures.
The subtable signatures being specialized here were found in an NVGRE
tunnel scenario.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-12-21 14:53:52 +00:00
Emma Finn
a879beb4db odp-execute: Add ISA implementation of set_masked IPv6 action
This commit adds support for the AVX512 implementation of the
ipv6_set_addrs action as well as an AVX512 implementation of
updating the L4 checksums.

Here are some relative performance numbers for this patch:
+-----------------------------+----------------+
| Actions                     | AVX with patch |
+-----------------------------+----------------+
| ipv6_src                    | 1.14x          |
+-----------------------------+----------------+
| ipv6_src + ipv6_dst         | 1.40x          |
+-----------------------------+----------------+
| ipv6_label                  | 1.14x          |
+-----------------------------+----------------+
| mod_ipv6 4 x field          | 1.43x          |
+-----------------------------+----------------+

Signed-off-by: Emma Finn <emma.finn@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-12-21 14:48:41 +00:00
Ilya Maximets
c1daeb4b41 AUTHORS: Add Qian Chen.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 18:02:01 +01:00
Qian Chen
7490f281f0 lldp: Fix bugs when parsing malformed AutoAttach.
The OVS LLDP implementation includes support for AutoAttach standard, which
the 'upstream' lldpd project does not include.  As part of adding this
support, the message parsing for these TLVs did not include proper length
checks for the LLDP_TLV_AA_ELEMENT_SUBTYPE and the
LLDP_TLV_AA_ISID_VLAN_ASGNS_SUBTYPE elements.  The result is that a message
without a proper boundary will cause an overread of memory, and lead to
undefined results, including crashes or other unidentified behavior.

The fix is to introduce proper bounds checking for these elements.  Introduce
a unit test to ensure that we have some proper rejection in this code
base in the future.

Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Signed-off-by: Qian Chen <cq674350529@163.com>
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 17:26:25 +01:00
Adrian Moreno
0d23948a59 ovs-thread: Detect changes in number of CPUs.
Currently, things like the number of handler and revalidator threads are
calculated based on the number of available CPUs. However, this number
is considered static and only calculated once, hence ignoring events
such as cpus being hotplugged, switched on/off or affinity mask
changing.

On the other hand, checking the number of available CPUs multiple times
per second seems like an overkill.
Affinity should not change that often and, even if it does, the impact
of destroying and recreating all the threads so often is probably a
price too expensive to pay.

I tested the impact of updating the threads every 5 seconds and saw
an impact in the main loop duration of <1% and a worst-case scenario
impact in throughput of < 5% [1]. This patch sets the default period to
10 seconds just to be safer.

[1] Tested in the worst-case scenario of disabling the kernel cache
(other_config:flow-size=0), modifying ovs-vswithd's affinity so the
number of handlers go up and down every 5 seconds and calculated the
difference in netperf's ops/sec.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 13:56:46 +01:00
Mike Pattrick
d34245ea15 ovs-ctl: Allow inclusion of hugepages in coredumps.
Add new option --dump-hugepages option in ovs-ctl to enable the addition
of hugepages in the core dump filter.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 13:46:42 +01:00
Eelco Chaudron
c82f496c3b dpif-netdev: Use unmasked key when adding datapath flows.
The datapath supports installing wider flows, and OVS relies on
this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0,
dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of
ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed
to be added.

However, if we try to add a wildcard rule, the installation fails:

# ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
# ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)

The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed. The solution is to use
the unmasked key for the existence check, the same way this is handled
in the "slow" dpif_flow_put() case.

OVS relies on the fact that overlapping flows can exist if one is a
superset of the other. Note that this is only true when the same set
of actions is applied. This is due to how the revalidator process
works. During revalidation, OVS removes too generic flows from the
datapath to avoid incorrect matches but allows too narrow flows to
stay in the datapath to avoid the data plane disruption and also to
avoid constant flow deletions if the datapath ignores wildcards on
certain fields/bits.  See flow_wildcards_has_extra() check in the
revalidate_ukey__() function.

The problem here is that we have a too narrow flow installed, and now
OpenFlow rules got changed, so the actual flow should be more generic.
Revalidators will not remove the narrow flow, and we will eventually get
an upcall on the packet that doesn't match the narrow flow, but we will
not be able to install a more generic flow because after masking with
the new wider mask, the key matches on the narrow flow, so we get EEXIST.

Fixes: beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 13:07:17 +01:00
Eelco Chaudron
79e7756a5d utilities: Add a GDB macro to dump hmap structures.
Add a new GDB macro called ovs_dump_hmap, which can be used to dump any
cmap structure. For example

  (gdb) ovs_dump_hmap "&'all_bridges.lto_priv.0'" "struct bridge" "node"
  (struct bridge *) 0x55ec43069c70
  (struct bridge *) 0x55ec430428a0
  (struct bridge *) 0x55ec430a55f0

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 13:02:53 +01:00
David Marchand
bf8fa1fe41 dpdk: Fix typo in v22.11.1 tarball extract example.
There was a small typo that slipped in when updating to v22.11.1 tag.

Fixes: a77c7796f23a ("dpdk: Update to use v22.11.1.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 13:01:53 +01:00
Timothy Redaelli
1ea0fa4ad7 rhel: Avoid creating an empty database file.
In 59e8cb8a053d ("rhel: Move conf.db to /var/lib/openvswitch, using symlinks.")
conf.db is created as empty file in /var/lib/openvswitch, if it doesn't
exists, but this prevent ovsdb-server to start.

This commit changes the previous behaviour to set
/var/lib/openvswitch owner to openvswitch:hugetlbfs, if built with
dpdk, or openvswitch:openvswitch.

Fixes: 59e8cb8a053d ("rhel: Move conf.db to /var/lib/openvswitch, using symlinks.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-December/400045.html
Reported-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-20 12:59:00 +01:00
Emma Finn
69e71bf791 odp-execute: Add check for L4 header size.
This patch adds check for L4 header size for avx512
implementation of the ipv4 action.

Fixes: 92eb03f7b03a ("odp-execute: Add ISA implementation of set_masked IPv4 action")
Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-14 12:44:30 +01:00
Dumitru Ceara
a787fbbf9d ovsdb-cs: Consider default conditions implicitly acked.
When initializing a monitor table the default monitor condition is
[True] which matches the behavior of the server (to send all rows of
that table).  There's no need to include this default condition in the
initial monitor request so we can consider it implicitly acked by the
server.

This fixes the incorrect (one too large) expected condition sequence
number reported by ovsdb_idl_set_condition() when application is
trying to set a [True] condition for a new table.

Reported-by: Numan Siddique <numans@ovn.org>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-13 18:52:10 +01:00
Emma Finn
739bcf2263 odp-execute: Fix ipv4 missing clearing of connection tracking fields.
This patch add clearing of connection tracking fields to the
avx512 implementation of the ipv4 action. This patch also extends
the actions autovalidator to include a compare for packet metadata.

Fixes: 92eb03f7b03a ("odp-execute: Add ISA implementation of set_masked IPv4 action")
Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-07 19:08:12 +01:00
Ilya Maximets
481555587f faq: Update some wording since kernel module is already removed.
The kernel module was removed in 3.0 release, but the faq page
still talks about that in a future tense.

Fixes: 3476bd3932b0 ("Documentation: Remove kernel module documentation.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-07 19:06:06 +01:00
Adrian Moreno
6bc92db366 rculist: Use rculist_back_protected to access prev.
The .prev member of a rculist should not be used directly by users
because it's not rcu-safe. A convenient fake mutex (rculist_fake_mutex)
helps ensuring that in conjunction with clang's thread safety
extensions.

Only writers with exclusive access to the rculist should access .prev
via some of the provided *_protected() accessors.

Use rculist_back_protected() in REVERSE_PROTECTED iterators to avoid
clang's compilation warning.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-06 16:21:54 +01:00
Ilya Maximets
e83dad6e53 ovsdb: Count weak reference objects.
OVSDB creates a separate object for each weak reference in order to
track them and there could be a significant amount of these objects
in the database.

We also had problems with number of these objects growing out of
bounds recently.  So, adding them to a memory report seems to be
a good thing.

Counting them globally to cover all the copied instances in transactions
and the transaction history (even though there should be none).
It's also hard to count them per-database, because weak references
are stored on destination rows and can be destroyed either while
destroying the destination row or while removing the reference from
the source row.  Also, not all the involved functions have direct
access to the database object.  So, there is no single clear place
where counters should be updated.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-06 16:21:54 +01:00
Daniel Ding
093915e04a vswitch.ovsschema: Set bfd_status to ephemeral.
When restart openvswitch, the bfd status will be kept
before ovs-vswitchd running.  And if the ovs-vswitchd
has high workload, which will defer updating bfd status,
which not we excepted.

Signed-off-by: Daniel Ding <zhihui.ding@easystack.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-06 16:21:54 +01:00
Ilya Maximets
b8bf410a5c db-ctl-base: Use partial map/set updates for last add/set commands.
Currently, command to add one item into a large set generates the
transaction with the full new content of that set plus 'wait'
operation for the full old content of that set.  So, if we're adding
one new load-balancer into a load-balancer group in OVN using
ovn-nbctl, transaction will include all the existing load-balancers
from that groups twice.

IDL supports partial updates for sets and maps.  The problem with that
is changes are not visible to the IDL user until the transaction
is committed.  That will cause problems for chained ctl commands.
However, we still can optimize the very last command in the list.
It makes sense to do, since it's a common case for manual invocations.

Updating the 'add' command as well as 'set' for a case where we're
actually adding one new element to the map.

One downside is that we can't check the set size without examining
it and checking for duplicates, so allowing the transaction to be
sent and constraints to be checked on the server side in that case.

Not touching 'remove' operation for now, since removals may have
different type, e.g. if elements from the map are removed by the key.
The function will likely need to be fully re-written to accommodate
all the corner cases.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-12-06 16:21:54 +01:00
Ian Stokes
a77c7796f2 dpdk: Update to use v22.11.1.
This commit add support to for DPDK v22.11.1, it includes the following
changes.

1. ci: Reduce DPDK compilation time.
2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528

3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332

4. netdev-dpdk: Report device bus specific information.
5. netdev-dpdk: Drop reference to Rx header split.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808

In addition documentation was also updated in this commit for use with
DPDK v22.11.1.

The Debian shared DPDK compilation test is removed as part of this patch
due to a packaging requirement. Once DPDK v22.11.1 is available in Debian
repositories it should be re-enabled in OVS.

For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com>
Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com>
Tested-by: Michael Phelan <michael.phelan@intel.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-12-06 15:06:28 +00:00
Numan Siddique
55b9507e68 ovsdb-idl: Add the support to specify the uuid for row insert.
ovsdb-server allows the OVSDB clients to specify the uuid for
the row inserts [1].  Both the C IDL client library and Python
IDL are missing this feature.  This patch adds this support.

In C IDL, for each schema table, a new function is generated -
<schema_table>insert_persistent_uuid(txn, uuid) which can
be used the clients to persist the uuid.

ovs-vsctl and other derivatives of ctl now supports the same
in the generic 'create' command with the option "--id=<UUID>".

In Python IDL, the uuid to persist can be specified in
the Transaction.insert() function.

[1] - a529e3cd1f("ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.:)

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 15:15:57 +01:00
Ilya Maximets
954ae38a12 odp-util: Fix reporting unknown keys as keys with bad length.
check_attr_len() currently reports all unknown keys as keys with bad
length.  For example, IPv6 extension headers are printed out like this
in flow dumps:

  eth_type(0x86dd),ipv6(...)
  (bad key length 2, expected -1)(00 00/(bad mask length 2, expected -1)(00 00),
  icmpv6(type=0/0,code=0/0)

However, since the key is unknown, the length check on it makes no
sense and should be ignored.  This will allow the unknown key to be
caught later by the format_unknown_key() function and printed in a
more user-friendly way:

  eth_type(0x86dd),ipv6(...),key32(00 00/00 00),icmpv6(type=0/0,code=0/0)

'32' here is the actual index of the key attribute, so we know
that it is unknown attribute #32 with the value/mask pair printed
out inside the parenthesis.

Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 15:12:54 +01:00
Timothy Redaelli
cd475f9765 ovs-dpctl-top: Fix ovs-dpctl-top via pipe.
Currently it's not possible to use ovs-dpctl-top via pipe (eg:
ovs-dpctl dump-flows | ovs-dpctl-top --script --verbose) since Python3
doesn't allow to open a file (stdin in our case) in binary mode without
buffering enabled.

This commit changes the behaviour in order to directly pass stdin to
flows_read instead of re-opening it without buffering.

Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 15:10:49 +01:00
Timothy Redaelli
59e8cb8a05 rhel: Move conf.db to /var/lib/openvswitch, using symlinks.
conf.db is by default at /etc/openvswitch, but it should be at
/var/lib/openvswitch like on Debian or like ovnnb_db.db and ovnsb_db.db.

If conf.db already exists in /etc/openvswitch then it's moved to
/var/lib/openvswitch.
Symlinks are created for conf.db and .conf.db.~lock~ into /etc/openvswitch
for backward compatibility.

Reported-at: https://bugzilla.redhat.com/1830857
Reported-by: Yedidyah Bar David <didi@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 14:59:08 +01:00
Ilya Maximets
b22c4d8403 netdev: Assume default link speed to be 10 Gbps instead of 100 Mbps.
100 Mbps was a fair assumption 13 years ago.  Modern days 10 Gbps seems
like a good value in case no information is available otherwise.

The change mainly affects QoS which is currently limited to 100 Mbps if
the user didn't specify 'max-rate' and the card doesn't report the
speed or OVS doesn't have a predefined enumeration for the speed
reported by the NIC.

Calculation of the path cost for STP/RSTP is also affected if OVS is
unable to determine the link speed.

Lower link speed adapters are typically good at reporting their speed,
so chances for overshoot should be low.  But newer high-speed adapters,
for which there is no speed enumeration or if there are some other
issues, will not suffer that much.

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 14:42:59 +01:00
David Marchand
d240f72ad2 netdev-dpdk: Cleanup mempool selection code.
Propagating per_port_memory value through a DPDK netdev creation gives
the false impression its value is somehow contextual to the creation.

On the contrary, this parameter value is set once and for all at
OVS initialization time.

Simplify the code and directly access the local boolean.

Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 13:58:34 +01:00
David Marchand
126e6046eb netdev-dpdk: Move DPDK netdev related configuration.
vhost related configuration and per port memory are netdev-dpdk
configuration items.

dpdk-stub.c and netdev-dpdk.c are never linked together, so we can move
those bits out of the generic dpdk code.

The dpdk_* accessors for those configuration items are then not needed
anymore and we can simply reference local variables.

Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-30 13:58:28 +01:00