2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-31 14:25:26 +00:00
Commit Graph

19672 Commits

Author SHA1 Message Date
Liang Mancang
b0d9a1efcc conntrack: Fix conntrack_clean may access the same exp_list each time.
when a exp_list contains more than the clean_end's number of nodes,
and these nodes will not expire immediately. Then, every times we
call conntrack_clean, it use the same next_sweep to get exp_list.

Actually, we should add i every times after we call ct_sweep.

Fixes: 3d9c1b855a ("conntrack: Replace timeout based expiration lists with rculists.")
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Liang Mancang <liangmc1@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-21 21:02:20 +01:00
Mike Pattrick
71e5669af6 ovs-actions: Correct typo in ovs-actions man page.
There was a minor typo in the ovs-actions man page.

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-21 21:02:20 +01:00
Adrian Moreno
2b1c706565 ofproto-ipfix: Use per-domain template timeouts.
IPFIX templates have to be sent for each Observation Domain ID.
Currently, a timer is kept at each dpif_ipfix_exporter to send them.
This works fine for per-bridge sampling where there is only one
Observation Domain ID per exporter. However, this is does not work for
per-flow sampling where more than one Observation Domain IDs can be
specified by the controller. In this case, ovs-vswitchd will only send
template information for one (arbitrary) DomainID.

Fix per-flow sampling by using an hmap to keep a timer for each
Observation Domain ID.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-21 14:42:37 +01:00
Russell Bryant
564dc0f2cc MAINTAINERS: Move myself to emeritus status.
I have not been active in OVS development in long enough that I should
move to emeritus status.

Signed-off-by: Russell Bryant <russell@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-21 14:26:19 +01:00
Ilya Maximets
481e3fa690 AUTHORS: Add Miika Petäjäniemi.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-21 14:26:19 +01:00
Miika Petäjäniemi
a6195e2c42 netdev-linux: Add jitter parameter to the netem qos options.
Adds jitter option to enable emulating latency fluctuation with netem.

Submitted-at: https://github.com/openvswitch/ovs/pull/407
Signed-off-by: Miika Petäjäniemi <miika.petajaniemi@solita.fi>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-21 14:25:57 +01:00
Ilya Maximets
f3c6cb907c AUTHORS: Add Vladislav Odintsov.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-20 20:01:14 +01:00
Vladislav Odintsov
4f27d5a024 utilities: Add support to set umask in ovs-ctl.
This patch adds new ovs-ctl options to pass umask configuration to allow
OVS daemons set requested socket permissions on group.  Previous
behaviour (if using with systemd service unit) created sockets with 0750
permissions mask (group has no write permission).

Write permission for group is reasonable in usecase, where ovs-vswitchd
or ovsdb-server runs as a non-privileged user:group (say,
openvswitch:openvswitch) and it is needed to access unix socket from
process running as another non-privileged user.  In this case
administrator has to add that user to openvswitch group and can connect
to OVS sockets from a process running under that user.

Two new ovs-ctl options --ovsdb-server-umask and --ovs-vswitchd-umask
were added to manage umask values for appropriate daemons.  This is
useful for systemd users: both ovs-vswitchd and ovsdb-server systemd
units read options from single /etc/sysconfig/openvswitch configuration
file.  So, with separate options it is possible to set umask only for
specific daemon.

OPTIONS="--ovsdb-server-umask=0002"

in /etc/openvswitch/sysconfig file will set umask to 0002 value before
starting only ovsdb-server, while

OPTIONS="--ovs-vswitchd-umask=0002"

will set umask to ovs-vswitchd daemon.

Previous behaviour (not setting umask) is left as default.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401501.html
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-20 19:58:35 +01:00
Ilya Maximets
1f47d73996 Set release date for 3.1.0.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-16 17:25:05 +01:00
Eelco Chaudron
cd1cf6a24b test: Remove duplicate test from system-offloads-traffic.at.
Remove the "offloads - simulated flow action update" test case, as
it's covered by the "datapath - simulated flow action update" test.

Fixes: b1f58f5072 ("netdev-offload-tc: Preserve tc statistics when flow gets modified.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-15 20:30:21 +01:00
Eelco Chaudron
6c24851f43 ofproto-dpif-upcall: Use last known stats ukey stats on revalidate missed dp flows.
Instead of using all zero stats when executing a revalidate for missed
dp flows, use the last known stats to avoid odd statistics being used.

As these zero stats are stored in the ukey, the next time revalidate_ukey()
is called the delta between the new stats and the zero stats is used, which
would cause an additional increase in total packets/bytes.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-15 19:43:17 +01:00
Ales Musil
0a7587034d conntrack: Properly unNAT inner header of related traffic.
The inner header was not handled properly.
Simplify the code which allows proper handling
of the inner headers.

Reported-at: https://bugzilla.redhat.com/2137754
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-13 19:17:18 +01:00
David Marchand
fc3d5e1dad sparse: Fix build with DPDK and GCC 12.
rte_vect.h pulls some AVX512 instrinsics headers added in GCC 12 [1]
trigger a lot of warnings:

libtool: compile:  env "REAL_CC=ccache gcc" "CHECK=sparse -Wsparse-error
	-I ../include/sparse -I ../include -m64 -I /usr/local/include
	" cgcc -target=x86_64 -target=host_os_specs -D__MMX__=1
	-D__MMX_WITH_SSE__=1 -D__SSE2_MATH__=1 -D__SSE_MATH__=1
	-D__SSE__=1 -D__SSE2__=1 -DHAVE_CONFIG_H -I. -I.. -I ../include
	-I ./include -I ../lib -I ./lib -Wstrict-prototypes -Wall
	-Wextra -Wno-sign-compare -Wpointer-arith -Wformat
	-Wformat-security -Wswitch-enum -Wunused-parameter
	-Wbad-function-cast -Wcast-align -Wstrict-prototypes
	-Wold-style-definition -Wmissing-prototypes
	-Wmissing-field-initializers -fno-strict-aliasing
	-Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument
	-Wbool-compare -Wshift-negative-value -Wduplicated-cond -Wshadow
	-Wmultistatement-macros -Wcast-align=strict -mssse3
	-I/home/dmarchan/git/pub/dpdk.org/22.11/install/include
	-include rte_config.h -I/usr/local/include -Werror
	-D_FILE_OFFSET_BITS=64 -g -O2 -MT lib/bfd.lo -MD -MP
	-MF lib/.deps/bfd.Tpo -c ../lib/bfd.c -o lib/bfd.o
../lib/bfd.c: note: in included file (through
	/usr/lib/gcc/x86_64-redhat-linux/12//include/immintrin.h,
	/usr/lib/gcc/x86_64-redhat-linux/12//include/x86intrin.h, ...):
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:38:9:
	error: '_Float16' has implicit type
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:38:18:
	error: Expected ; at end of declaration
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:38:18:
	error: got __v8hf
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:62:41:
	error: Expected ; at end of statement
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:62:41:
	error: got {
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:420:32:
	error: Expected ) in expression
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:420:32:
	error: got __A
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2271:61:
	error: Expected ) in function call
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2271:61:
	error: got __A
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2279:61:
	error: Expected ) in function call
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2279:61:
	error: got __A
/usr/lib/gcc/x86_64-redhat-linux/12//include/avx512fp16intrin.h:2328:50:
	error: Expected ) in function call
[...]

Besides, the list of headers by rte_memcpy.h is now out of sync with DPDK.
OVS takes care to include the right headers in its sources.
Simply make this header self-sufficient.

1: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a68412117fa4

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-10 20:09:29 +01:00
Frode Nordahl
5f219af8b3 ovsdb-server: Fix handling of DNS name for listener configuration.
Commit 08e9e53373 fixed proper initialization of the dns-resolve
module, and made DNS resolution asynchronous.

A side effect of that change revealed a long standing logic bug
which broke ovsdb-server listener configuration using DNS names.

Previously this worked because the DNS resolution would block,
now that DNS resolution is asynchronous the code before this
change would assume the error from jsonrpc_pstream_open meant
the remote was a specification for an active outgoing
connection, even when that was not the case.

To fix this a couple of changes was made to socket-util:
1) Pass optional result of dns resolution from inet_parse_passive.

When (re-)configuring listeners that use DNS names, we may need
to know whether the provided connection string is invalid or if
the provided DNS name has finished resolving.

2) Check dns resolution status in inet_open_passive.

If the connection string is valid, and contains a DNS name,
inet_open_passive will now return -EAGAIN if dns resolution
failed.  DNS resolution failure may either mean the asynchronous
resolver has not completed yet, or that the name does not resolve.

Reported-at: https://bugs.launchpad.net/bugs/1998781
Fixes: 08e9e53373 ("ovsdb: raft: Fix inability to read the database with DNS host names.")
Fixes: 771680d96f ("DNS: Add basic support for asynchronous DNS resolving")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-10 20:09:29 +01:00
Eelco Chaudron
7bb0c33d78 tests: Comment currently failing TC system-traffic tests.
I commented the three remaining failures when running tc with the
system-traffic tests. In addition I ran the following test to verify
we did not see any failures with recheck enabled:

  for i in {1..50}; do make check-offloads || \
    make check-offloads TESTSUITEFLAGS="--recheck" || break; \
    echo "ALL_50_OK: $i"; done;

Unfortunately, a bunch of test cases showed occasional failures.
For now, they are excluded from the test cases and need further
investigation. They are:

  datapath - truncate and output to gre tunnel
  datapath - truncate and output to gre tunnel by simulated packets

These tests where executed on a Fedora37 machine with the kernel
6.1.5-200.fc37.x86_64 installed.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 12:12:11 +01:00
Eelco Chaudron
594d1fee5b tests: Fix reading of OpenFlow byte counters in GRE test cases.
With some datapaths, read TC, it takes a bit longer to update the
OpenFlow statistics. Rather than adding an additional delay, try
to read the counters multiple times until we get the desired value.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 12:11:51 +01:00
Eelco Chaudron
a08a589ce4 netdev-offload-tc: If the flow has not been used, report it as such.
If a tc flow was installed but has not yet been used, report it as such.

In addition, add a delay to the "IGMP - flood under normal action" test
case to make it work with many repetitions. This delay is also present
in other ICMP/IGMP tests.

Fixes: f98e418fbd ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:32:55 +01:00
Eelco Chaudron
d57299fc6c odp-util: Make odp_flow_key_from_flow__ nlattr order the same as the kernel.
Make the order of the Netlink attributes for odp_flow_key_from_flow__()
the same as the kernel will return them.

This will make sure the attributes displayed in the dpctl/dump-flows
output appear in the same order for all datapath.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:32:55 +01:00
Eelco Chaudron
25b6f5585b test: Fix 'conntrack - Multiple ICMP traverse' for tc case.
tc does not include ethernet header length in packet byte count.
This fix will allow the packets that go trough tc to be 14 bytes less.

This difference in the TC implementation is already described in
tc-offload.rst.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:32:55 +01:00
Eelco Chaudron
3655ddb4f5 test: Tc does not support conntrack timeout, skip the related test.
The tc conntrack implementation does not support the timeout option.
The current implementation is silently ignoring the timeout option
by adding a general conntrack entry.

This patch will skip the related test by overriding the support macro.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:32:55 +01:00
Eelco Chaudron
b292cce2ff netdev-offload-tc: Conntrack ALGs are not supported with tc.
tc does not support conntrack ALGs. Even worse, with tc enabled, they
should not be used/configured at all. This is because even though TC
will ignore the rules with ALG configured, i.e., they will flow through
the kernel module, return traffic might flow through a tc conntrack
rule, and it will not invoke the ALG helper.

Fixes: 576126a931 ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:32:55 +01:00
Eelco Chaudron
7a176f9636 test: Flush datapath when changing rules on the fly.
Flush datapath flows as TC flows take some more time to be flushed out.
The flush speeds this up.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:25:42 +01:00
Eelco Chaudron
564d09ef53 netdev-offload-tc: Fix tc conntrack force commit support.
tc was not setting the OVS_CT_ATTR_FORCE_COMMIT flag when a forced
commit was requested. This patch will fix this.

Fixes: 576126a931 ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:25:36 +01:00
Eelco Chaudron
3209287103 test: Do not use MPLS implicit null label in test cases.
TC flower does not allow the push of the implicit null labels (RFC3032).
Avoid the use of such labels in the MPLS test cases.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-08 22:14:30 +01:00
Eelco Chaudron
f68e757ef1 tests: Include working system-traffic tests into the system-offloads-testsuite.
Include and run the system-traffic.at tests as part of the system offload
testsuite. Exclude all the tests that will not run without any special
modifications.

Lowered log level for "recirc_id sharing not supported" message, so tests
will not fail with older kernels. This is not an error level message, but
should be debug, like all other, EOPNOTSUPP, related log messages.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-08 22:14:22 +01:00
Peng He
5dfc8309d9 ofproto-dpif-upcall: New ukey needs to take the old ukey's dump seq.
The userspace datapath manages all the magaflows by a cmap. The cmap
data structure will grow/shrink during the datapath processing and it
will re-position megaflows. This might result in two revalidator threads
might process a same megaflow during one dump stage.

Consider a situation that, revalidator 1 processes a megaflow A, and
decides to delete it from the datapath, at the mean time, this megaflow
A is also queued in the process batch of revalidator 2. Normally it's ok
for revalidators to process the same megaflow multiple times, as the
dump_seq shows it's already dumped and the stats will not be contributed
twice.

Assume that right after A is deleted, a PMD thread generates again
a new megaflow B which has the same match and action of A. The ukey
of megaflow B will replace the one of megaflow A. Now the ukey B is
new to the revalidator system and its dump seq is 0.

Now since the dump seq of ukey B is 0, when processing megaflow A,
the revalidator 2 will not identify this megaflow A has already been
dumped by revalidator 1 and will contribute the old megaflow A's stats
again, this results in an inconsistent stats between ukeys and megaflows.

To fix this, the newly generated the ukey B should take the dump_seq
of the replaced ukey A to avoid a same megaflow being revalidated
twice in one dump stage.

We observe in the production environment, the OpenFlow rules' stats
sometimes are amplified compared to the actual value.

Signed-off-by: Peng He <hepeng.0320@bytedance.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-07 14:21:38 +01:00
Mike Pattrick
e85e8a7541 hash: Avoid 64bit crc intrinsics on 32bit aligned data.
UB Sanitizer report:

lib/hash.h:219:17: runtime error: load of misaligned address
0x7ffc164a88b4 for type 'const uint64_t', which requires 8 byte
alignment

    #0 in hash_words_inline lib/hash.h:219
    #1 in hash_words lib/hash.h:297
    [...]

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 22:18:16 +01:00
Mike Pattrick
4339e7b19f dp-packet: Allocate on cacheline boundary with DPDK.
UB Sanitizer report:
lib/dp-packet.h:587:22: runtime error: member access within misaligned
address 0x000001ecde10 for type 'struct dp_packet', which requires 64
byte alignment

    #0 in dp_packet_set_base lib/dp-packet.h:587
    #1 in dp_packet_use__ lib/dp-packet.c:46
    #2 in dp_packet_use lib/dp-packet.c:60
    #3 in dp_packet_init lib/dp-packet.c:126
    #4 in dp_packet_new lib/dp-packet.c:150
    [...]

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 22:18:16 +01:00
Mike Pattrick
531c17023c netdev-dummy: Allocate dummy_packet_stream on cacheline boundary.
UB Sanitizer report:

lib/netdev-dummy.c:197:15: runtime error: member access within
misaligned address 0x00000217a7f0 for type 'struct
dummy_packet_stream', which requires 64 byte alignment
              ^
    #0 dummy_packet_stream_init lib/netdev-dummy.c:197
    #1 dummy_packet_stream_create lib/netdev-dummy.c:208
    #2 dummy_packet_conn_set_config lib/netdev-dummy.c:436
    [...]

Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 22:18:15 +01:00
Eelco Chaudron
b1f58f5072 netdev-offload-tc: Preserve tc statistics when flow gets modified.
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.

This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.

It also updates the check_pkt_len tc test to purge the flows, so we do
not use existing updated tc flow counters, but start with fresh installed
set of datapath flows.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 16:39:59 +01:00
Ilya Maximets
d6501c6605 sparse: Fix numa.h for libnuma >= 2.0.13.
Current numa.h header for sparse re-defines functions in a way
that breaks the header from libnuma 2.0.13+, because the original
issue was fixed in that version:
  25dcde021d

Sparse errors as a result:

  lib/netdev-afxdp.c: note: in included file (through include/sparse/numa.h):
  /usr/include/numa.h:346:26: error: macro "numa_get_interleave_mask_compat"
                                     passed 1 arguments, but takes just 0
  /usr/include/numa.h:376:26: error: macro "numa_get_membind_compat"
                                     passed 1 arguments, but takes just 0
  /usr/include/numa.h:406:26: error: macro "numa_get_run_node_mask_compat"
                                     passed 1 arguments, but takes just 0
  /usr/include/numa.h:347:1: error: Expected ; at end of declaration
  /usr/include/numa.h:347:1: error: got {
  /usr/include/numa.h:351:9: error: 'tp' has implicit type

It's hard to adjust defines to work with both versions of a header.
Just defining all the functions we actually use in OVS instead and
not including the original header.

Fixes: e8568993e0 ("netdev-afxdp: NUMA-aware memory allocation for XSK related memory.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 15:58:29 +01:00
Ilya Maximets
4fd2d46c01 AUTHORS: Add wangchuanlei.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-31 17:40:50 +01:00
wangchuanlei
e22e1f6725 dpctl: Add support to count upcall packets.
Add support to count upcall packets per port, both succeed and failed,
which is a better way to see how many packets upcalled on each interface.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-31 17:40:50 +01:00
Eelco Chaudron
e1e5eac5b0 tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().
A long long time ago, an effort was made to make tc flower
rtnl_lock() free. However, on the OVS part we forgot to add
the TCA_KIND "flower" attribute, which tell the kernel to skip
the lock. This patch corrects this by adding the attribute for
the delete and get operations.

The kernel code calls tcf_proto_is_unlocked() to determine the
rtnl_lock() is needed for the specific tc protocol. It does this
in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter().

If the name is not set, tcf_proto_is_unlocked() will always return
false. If set, the specific protocol is queried for unlocked support.

Fixes: f98e418fbd ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-30 21:12:31 +01:00
Simon Horman
3f85b11d50 system-offloads-traffic: Skip tests if nc is not present.
The following tests use the nc command and should be skipped if
nc is not present.

- "offloads - check interface meter offloading -  offloads disabled"
- "offloads - check interface meter offloading -  offloads enabled"

Fixes: 5660b89a30 ("dpif-netlink: Offloading meter to tc police action")
Reported-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2023-01-30 14:49:16 +01:00
Simon Horman
6e5661d17d system-traffic: Remove unnecessary dependency on nc.
The conntrack - ICMP related to original direction" test does not
use nc and therefore does not need to be skipped if nc is not present.

Fixes: d0e4206230 ("tests: ICMP related to original direction test.")
Reported-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2023-01-30 14:49:16 +01:00
Ilya Maximets
9117f4d54f netdev-offload-tc: Fix misaligned access to ct label.
UndefinedBehaviorSanitizer:

  lib/netdev-offload-tc.c:1356:50: runtime error:
   member access within misaligned address 0x60700001a89c for type
   'const struct (unnamed struct at lib/netdev-offload-tc.c:1350:27)',
   which requires 8 byte alignment 0x60700001a89c: note: pointer points here
   24 00 04 00 01 00 00 05  00 00 0d 00 0a 00 00 00  00 00 00 00 ...
               ^
   0 0xd5d183 in parse_put_flow_ct_action lib/netdev-offload-tc.c:1356:50
   1 0xd5783f in netdev_tc_parse_nl_actions lib/netdev-offload-tc.c:2015:19
   2 0xd4027c in netdev_tc_flow_put lib/netdev-offload-tc.c:2355:11
   3 0x9666d7 in netdev_flow_put lib/netdev-offload.c:318:14
   4 0xcd4c0a in parse_flow_put lib/dpif-netlink.c:2297:11
   5 0xcd4c0a in try_send_to_netdev lib/dpif-netlink.c:2384:15
   6 0xcd4c0a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
   7 0x87d40e in dpif_operate lib/dpif.c:1372:13
   8 0x6d43e9 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
   9 0x6d43e9 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
   10 0x6cf6ea in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
   11 0xb6d7ea in ovsthread_wrapper lib/ovs-thread.c:423:12
   12 0x7f5ccf017801 in start_thread
   13 0x7f5ccefb744f in __GI___clone3

Fixes: 9221c721be ("netdev-offload-tc: Add conntrack label and mark support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Kevin Traynor
3beff0a6b0 dpif-netdev-perf: Add metric averages when no iterations.
pmd-perf-show with pmd-perf-metrics=true displays a histogram
with averages. However, averages were not displayed when there
is no iterations.

They will be all zero so it is not hiding useful information
but the stats look incomplete without them, especially when
they are displayed for some PMD thread cores and not others.

The histogram print is large and this is just an extra couple
of lines, so might as well print them all the time to ensure
that the user does not think there is something missing from
the display.

Before patch:
  Histograms
     cycles/it
     499       0
     716       0
     1025      0
     1469      0
  <snip>

After patch:
  Histograms
     cycles/it
     499       0
     716       0
     1025      0
     1469      0
  <snip>
  ---------------
     cycles/it
     0

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Kevin Traynor
7db18054ff dpif-netdev-perf: Remove not a number stat value.
Some stats in pmd-perf-show don't check for divide by zero
which results in not a number (-nan).

This is a normal case for some of the stats when there are
no Rx queues assigned to the PMD thread core.

It is not obvious what -nan is to a user so add a check for
divide by zero and set stat to 0 if present.

Before patch:
pmd thread numa_id 1 core_id 9:

  Iterations:                    0  (-nan us/it)
  - Used TSC cycles:             0  (  0.0 % of total cycles)
  - idle iterations:             0  ( -nan % of used cycles)
  - busy iterations:             0  ( -nan % of used cycles)

After patch:
pmd thread numa_id 1 core_id 9:

  Iterations:                    0  (0.00 us/it)
  - Used TSC cycles:             0  (  0.0 % of total cycles)
  - idle iterations:             0  (  0.0 % of used cycles)
  - busy iterations:             0  (  0.0 % of used cycles)

Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Ilya Maximets
4f0a728a59 system-traffic.at: Skip the 'ICMP6 Related' test if nc is missing.
Test fails is 'nc' is not available, it should be skipped instead.

Fixes: b020a416e2 ("System Tests: Enhance NAT tests.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Eelco Chaudron
6ad35dd80e utilities: Add revalidator measurement script and needed USDT probes.
This patch adds a Python script that can be used to analyze the
revalidator runs by providing statistics (including some real time
graphs).

The USDT events can also be captured to a file and used for
later offline analysis.

The following blog explains the Open vSwitch revalidator
implementation and how this tool can help you understand what is
happening in your system.

https://developers.redhat.com/articles/2022/10/19/open-vswitch-revalidator-process-explained

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Robin Jarry
c3ed0bf34b tests/mfex: Silence Blowfish/CAST5 deprecation warnings.
On Fedora 37 (at least), MFEX unit tests are failing because of
deprecation warnings:
$ python3 tests/mfex_fuzzy.py test_traffic.pcap 2000
/usr/lib/python3.11/site-packages/scapy/layers/ipsec.py:471:
   CryptographyDeprecationWarning: Blowfish has been deprecated
  cipher=algorithms.Blowfish,
/usr/lib/python3.11/site-packages/scapy/layers/ipsec.py:485:
   CryptographyDeprecationWarning: CAST5 has been deprecated
  cipher=algorithms.CAST5,

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:13:07 +01:00
Han Zhou
e5b3cb9995 revalidator: Allow min-revalidator-pps to be 0.
Today the minimum value for this setting is 1. This patch allows it to
be 0, meaning not checking pps at all, and always do revalidation.

This is particularly useful for environments where some of the
applications with long-lived connections may have very low traffic for
certain period but have high rate of burst periodically. It is desirable
to keep the datapath flows instead of periodically deleting them to
avoid burst of packet miss to userspace.

When setting to 0, there may be more datapath flows to be revalidated,
resulting in higher CPU cost of revalidator threads. This is the
downside but in certain cases this is still more desirable than packet
misses to user space.

Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:09:10 +01:00
Ilya Maximets
ebaee44624 netdev-dpdk: Free mbufs in bulk.
rte_pktmbuf_free_bulk() function was introduced in 19.11 and became
stable in 21.11.  Use it to free arrays of mbufs instead of freeing
packets one by one.

In simple V2V testing with 64B packets, 2 PMD threads and bidirectional
traffic this change improves performance by 3.5 - 4.5 %.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:07:39 +01:00
Ilya Maximets
b7f540129b ovsdb: Don't convert unchanged columns during database conversion.
Column conversion involves converting it to json and back.  These are
heavy operations and completely unnecessary if the column type didn't
change.  Most of the time schema changes only add new columns/tables
without changing existing ones at all.  Clone the column instead to
save some time.

This will also save time while destroying the original database since
we will only need to reduce reference counters on unchanged datum
objects that were cloned instead of actually freeing them.

Additionally, moving the column lookup into a separate loop, so we
don't perform an shash lookup for each column of each row.

Testing with 440 MB OVN_Southbound database shows 70% speed up of the
ovsdb_convert() function.  Execution time reduced from 15 to 4.4
seconds, 3.5 of which is a post-conversion transaction replay.  Overall
time required for the online database conversion reduced from 37 to 25
seconds.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 15:53:33 +01:00
Ilya Maximets
e0e4266a90 ovsdb-types: Add functions to compare types for equality.
Will be used in the next commit to optimize database conversion.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 15:53:06 +01:00
Kevin Traynor
948767a18d dpif-netdev: Set PMD load based sleep start/inc to 1 us.
Now that the timer slack for the PMD threads is reduced we can also
reduce the start/increment for PMD load based sleeping to match it.

This will further reduce initial sleep times making it more resilient
to interfaces that might be sensitive to large sleep times.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-23 17:23:28 +01:00
David Marchand
f62629a558 dpif-netdev: Set timer slack for PMD threads.
The default Linux timer slack groups timer expires into 50 uS intervals.

With some traffic patterns this can mean that returning to process
packets after a sleep takes too long and packets are dropped.

Add a helper to util.c and set use it to reduce the timer slack
for PMD threads, so that sleeps with smaller resolutions can be done
to prevent sleeping for too long.

Fixes: de3bbdc479 ("dpif-netdev: Add PMD load based sleeping.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401121.html
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-23 17:23:20 +01:00
David Marchand
e24b68fa70 netdev-dpdk: Fix deadlock due to virtqueue stats retrieval.
As Ilya reported, we have a ABBA deadlock between DPDK vq->access_lock
and OVS dev->mutex when OVS main thread refreshes statistics, while a
vring state change event is being processed for a same vhost port.

To break from this situation, move vring state change notifications
handling from the vhost-events DPDK thread to a dedicated thread
using a lockless queue.

Besides, for the case when a bogus/malicious guest is sending continuous
updates, add a counter of pending updates in the queue and warn if a
threshold of 1000 entries is reached.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401101.html
Fixes: 3b29286db1 ("netdev-dpdk: Add per virtqueue statistics.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-19 20:32:04 +01:00
Ilya Maximets
7402dae8f4 ovsdb: Fix database statistics during the database replacement.
The counter for the number of atoms has to be re-set to the number from
the new database, otherwise the value will be incorrect.  For example,
this is causing the atom counter doubling after online conversion of
a clustered database.

Miscounting may also lead to increased memory consumption by the
transaction history or otherwise too aggressive transaction history
sweep.

Fixes: 317b1bfd7d ("ovsdb: Don't let transaction history grow larger than the database.")
Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-18 13:54:42 +01:00