2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 22:05:19 +00:00
Commit Graph

19672 Commits

Author SHA1 Message Date
Robin Jarry
07f6d6a0cb Add editorconfig file.
EditorConfig is a file format and collection of text editor plugins for
maintaining consistent coding styles between different editors and IDEs.

Initialize the file following the coding rules in
Documentation/internals/contributing/coding-style.rst and add exceptions
declared in build-aux/initial-tab-allowed-files. Only enforce rules for
*.c and *.h files. Other files should use the default indenting rules
from text editors.

In order for this file to be taken into account (unless they use an
editor with built-in EditorConfig support), developers will have to
install a plugin.

Notes:

* All matching rules are considered. The last matching rule's properties
  will override the previous ones.
* The max_line_length property is only supported by a limited number of
  EditorConfig plugins. It will be ignored if unsupported.

Link: https://editorconfig.org/
Link: https://github.com/editorconfig/editorconfig-emacs
Link: https://github.com/editorconfig/editorconfig-vim
Link: https://github.com/editorconfig/editorconfig/wiki/EditorConfig-Properties#max_line_length
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-20 15:28:05 +02:00
Mike Pattrick
3337e6d91c userspace: Enable L4 checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.

Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.

Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:50:30 +02:00
Mike Pattrick
5d11c47d3e userspace: Enable IP checksum offloading by default.
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.

If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.

When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.

Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.

Linux devices don't support IP checksum offload alone, so the
support is not enabled.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 23:49:51 +02:00
Mike Pattrick
4433cc6860 dpif-netdev: Show netdev offloading flags.
This patch modifies netdev_get_status to include information about
checksum offload status by port, allowing the user to gain insight into
where checksum offloading is active.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 15:44:57 +02:00
Mike Pattrick
22df63c384 Documentation: Document netdev offload.
Document the implementation of netdev hardware offloading
in userspace datapath.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-15 15:44:53 +02:00
Eelco Chaudron
e3ba0be48c seq: Make read of the current value atomic.
Make the read of the current seq->value atomic, i.e., not needing to
acquire the global mutex when reading it. On 64-bit systems, this
incurs no overhead, and it will avoid the mutex and potentially
a system call.

For incrementing the value followed by waking up the threads, we are
still taking the mutex, so the current behavior is not changing. The
seq_read() behavior is already defined as, "Returns seq's current
sequence number (which could change immediately)". So the change
should not impact the current behavior.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-12 18:48:26 +02:00
Ilya Maximets
04f854f938 fatal-signal: Don't share signal fds/handles with forked process.
The signal_fds pipe and wevent are a mechanism to wake up the process
after it received a signal and stored the number for the future
processing.  They are not intended for inter-process communication.
However, in the current code, descriptors are not closed on fork().

The main scenario where we use fork() is a monitor process.  Monitor
doesn't actually use poll loops and doesn't wait on the descriptor.
But when a child process is killed, it (child) sends a byte to itself,
then it wakes up due to POLLIN on the pipe and terminates itself after
processing all the callbacks.  The byte stays unread.  And the pipe is
still open in the monitor process.  When child dies, the monitor wakes
up and forks again.  New child inherits the same pipe that still
contains unread data.  This data is never read, so the child will
constantly wake itself up for no reason.

Interestingly enough raise(SIGSEGV) doesn't immediately kill the
process.  The execution continues til the end of a signal handler,
so we're still able to write a byte to a pipe even in this case.
Presumably because we don't have SA_NODEFER.

Fix the issue by re-creating the pipe/event on fork.  This way
every new child will have its own notification channel and will
not wake up any other processes.

There was already an attempt to fix the issue, but it didn't get a
follow up (see the reported-at tag).  This is an alternative solution.

Fixes: ff8decf1a3 ("daemon: Add support for process monitoring and restart.")
Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20221019093147.2072-1-lifengqi@inspur.com/
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-09 14:15:11 +02:00
Ilya Maximets
469e98e16d ovsdb: monitor: Destroy initial change set when new columns added.
Initial change set is preserved for as long as the monitor itself.
However, if a new client has a condition on a column that is not
one of the monitored columns, this column will be added to the
monitor via ovsdb_monitor_condition_bind().  This new column, however,
doesn't exist in the initial change set.  That will cause ovsdb-server
to malfunction or crash trying to access non-existent column during
condition evaluation:

 ERROR: AddressSanitizer: heap-buffer-overflow
 READ of size 4 at 0x606000006780 thread T0
     0 ovsdb_clause_evaluate ovsdb/condition.c:328:26
     1 ovsdb_condition_match_any_clause ovsdb/condition.c:441:13
     2 ovsdb_condition_empty_or_match_any ovsdb/condition.h:84:13
     3 ovsdb_monitor_row_update_type_condition ovsdb/monitor.c:892:28
     4 ovsdb_monitor_compose_row_update2 ovsdb/monitor.c:1058:12
     5 ovsdb_monitor_compose_update ovsdb/monitor.c:1172:24
     6 ovsdb_monitor_get_update ovsdb/monitor.c:1276:24
     7 ovsdb_jsonrpc_monitor_create ovsdb/jsonrpc-server.c:1505:12
     8 ovsdb_jsonrpc_session_got_request ovsdb/jsonrpc-server.c:1030:21
     9 ovsdb_jsonrpc_session_run ovsdb/jsonrpc-server.c:572:17
    10 ovsdb_jsonrpc_session_run_all ovsdb/jsonrpc-server.c:602:21
    11 ovsdb_jsonrpc_server_run ovsdb/jsonrpc-server.c:417:9
    12 main_loop ovsdb/ovsdb-server.c:222:9
    13 main ovsdb/ovsdb-server.c:500:5
    14 __libc_start_call_main
    15 __libc_start_main@GLIBC_2.2.5
    16 _start (ovsdb/ovsdb-server+0x473034)

 Located 0 bytes after 64-byte region [0x606000006740,0x606000006780)
 allocated by thread T0 here:
     0 malloc (ovsdb/ovsdb-server+0x50dc82)
     1 xmalloc__ lib/util.c:140:15
     2 xmalloc lib/util.c:175:12
     3 clone_monitor_row_data ovsdb/monitor.c:336:12
     4 ovsdb_monitor_changes_update ovsdb/monitor.c:1384:23
     5 ovsdb_monitor_get_initial ovsdb/monitor.c:1535:21
     6 ovsdb_jsonrpc_monitor_create ovsdb/jsonrpc-server.c:1502:9
     7 ovsdb_jsonrpc_session_got_request ovsdb/jsonrpc-server.c:1030:21
     8 ovsdb_jsonrpc_session_run ovsdb/jsonrpc-server.c:572:17
     9 ovsdb_jsonrpc_session_run_all ovsdb/jsonrpc-server.c:602:21
    10 ovsdb_jsonrpc_server_run ovsdb/jsonrpc-server.c:417:9
    11 main_loop ovsdb/ovsdb-server.c:222:9
    12 main ovsdb/ovsdb-server.c:500:5
    13 __libc_start_call_main
    14 __libc_start_main@GLIBC_2.2.5
    15 _start (ovsdb/ovsdb-server+0x473034)

Fix that by destroying the initial change set every time new columns
are added to the monitor.  This will trigger re-generation of the
change set and it will contain all the necessary columns afterwards.

Fixes: 07c27226ee ("ovsdb: Monitor: Keep and maintain the initial change set.")
Reported-by: Han Zhou <hzhou@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-09 14:11:38 +02:00
Ales Musil
759a29dc2d backtrace: Extend the backtrace functionality.
Use the backtrace functions that is provided by libc, this allows us
to get backtrace that is independent of the current memory map of the
process.  Which in turn can be used for debugging/tracing purpose.
The backtrace is not 100% accurate due to various optimizations, most
notably "-fomit-frame-pointer" and LTO.  This might result that the
line in source file doesn't correspond to the real line.  However, it
should be able to pinpoint at least the function where the backtrace
was called.

The implementation is determined during compilation based on available
libraries.  Libunwind has higher priority if both methods are available
to keep the compatibility with current behavior.

The backtrace is not marked as signal safe however the backtrace manual
page gives more detailed explanation why it might be the case [0].
Load the "libgcc" or equivalent in advance within the "fatal_signal_init"
which should ensure that subsequent calls to backtrace* do not call
malloc and are signal safe.

The typical backtrace will look similar to the one below:
 /lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
 /lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
 /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
 /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) [0x7fc5db563a8d]
 ovsdb-server(+0xa661) [0x562cfce2e661]
 ovsdb-server(+0x7e39) [0x562cfce2be39]
 /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
 /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
 ovsdb-server(+0x8c35) [0x562cfce2cc35]

backtrace.h elaborates on how to effectively get the line information
associated with the addressed presented in the backtrace.

[0]
backtrace() and backtrace_symbols_fd() don't call malloc() explicitly,
but they are part of libgcc, which gets loaded dynamically when first
used.  Dynamic loading usually triggers a call to malloc(3).  If you
need certain calls to these two functions to not allocate memory (in
signal handlers, for example), you need to make sure libgcc is loaded
beforehand

Reported-at: https://bugzilla.redhat.com/2177760
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-08 20:30:42 +02:00
David Marchand
474a179aff cpu: Fix cpuid check for some AMD processors.
Some venerable AMD processors do not support querying extended features
(EAX=7) with cpuid.
In this case, it is not a programmatic error and the runtime check should
simply return the isa is unsupported.

Reported-by: Davide Repetto <red@idp.it>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2211747
Fixes: b366fa2f49 ("dpif-netdev: Call cpuid for x86 isa availability.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-07 23:08:44 +02:00
Frode Nordahl
106ef21860 tc: Fix crash on malformed reply from kernel.
The tc module combines the use of the `tc_transact` helper
function for communication with the in-kernel tc infrastructure
with assertions on the reply data by `ofpbuf_at_assert` on the
received data prior to further processing.

With the presence of bugs on the kernel side, we need to treat
the kernel as an unreliable service provider and replace assertions
on the reply from it with checks to avoid a fatal crash of OVS.

For the record, the symptom of the crash is this in the log:
  EMER|include/openvswitch/ofpbuf.h:194:
      assertion offset + size <= b->size failed in ofpbuf_at_assert()

And an excerpt of the backtrace looks like this:
  ofpbuf_at_assert (offset=16, size=20) at include/openvswitch/ofpbuf.h:194
  tc_replace_flower  at lib/tc.c:3223
  netdev_tc_flow_put at lib/netdev-offload-tc.c:2096
  netdev_flow_put    at lib/netdev-offload.c:257
  parse_flow_put     at lib/dpif-netlink.c:2297
  try_send_to_netdev at lib/dpif-netlink.c:2384

Reported-At: https://launchpad.net/bugs/2018500
Fixes: 5c039ddc64 ("netdev-linux: Add functions to manipulate tc police action")
Fixes: e7f6ba220e ("lib/tc: add ingress ratelimiting support for tc-offload")
Fixes: f98e418fbd ("tc: Add tc flower functions")
Fixes: c1c9c9c4b6 ("Implement QoS framework.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-07 22:46:45 +02:00
Ilya Maximets
64cdc290ef appveyor: Silence the git clone of pthreads4w.
Git by default reports progress on stderr.  This doesn't fail
the build, but upsets the powershell:

 git : Cloning into 'c:\pthreads4w-code'...
 At line:3 char:1
 + git clone https://git.code.sf.net/p/pthreads4w/code c:\pthreads4w-cod ...
 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     + CategoryInfo          : NotSpecified:
            (Cloning into 'c:\pthreads4w-code'...:String) [], RemoteException
     + FullyQualifiedErrorId : NativeCommandError

Silence the git clone to avoid the warning.

Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-02 10:57:54 +02:00
Robin Jarry
8bcc6d694c netdev-dpdk: Fix warning with gcc 13.
GCC now reports uninitialized warnings from function return values.

../lib/netdev-dpdk.c: In function 'netdev_dpdk_mempool_configure':
../lib/netdev-dpdk.c:964:22: warning: 'dmp' may be used uninitialized [-Wmaybe-uninitialized]
  964 |         dev->dpdk_mp = dmp;
      |         ~~~~~~~~~~~~~^~~~~
../lib/netdev-dpdk.c:854:21: note: 'dmp' was declared here
  854 |     struct dpdk_mp *dmp, *next;
      |                     ^~~

NB: this looks like a false positive, gcc 13 probably fails to see the link
between reuse and dmp in dpdk_mp_get().

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-06-02 10:56:57 +02:00
David Marchand
359cabbd6e netdev-offload: Fix some typos.
Caught while reviewing code.

Fixes: aca2f8a8a6 ("netdev-offload-dpdk: Implement HW miss packet recover for vport.")
Fixes: 241bad15d9 ("dpif-netdev: associate flow with a mark id")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-31 22:04:11 +02:00
Ilya Maximets
ef1da757f0 ovsdb: condition: Process condition changes incrementally.
In most cases, after the condition change request, the new condition
is the same as old one plus minus a few clauses.  Today, ovsdb-server
will evaluate every database row against all the old clauses and then
against all the new clauses in order to tell if an update should be
generated.

For example, every time a new port is added, ovn-controller adds two
new clauses to conditions for a Port_Binding table.  And this condition
may grow significantly in size making addition of every new port
heavier on the server side.

The difference between conditions is not larger and, likely,
significantly smaller than old and new conditions combined.  And if
the row doesn't match clauses that are different between old and new
conditions, that row should not be part of the update. It either
matches both old and new, or it doesn't match either of them.

If the row matches some clauses in the difference, then we need to
perform a full match against old and new in order to tell if it
should be added/removed/modified.  This is necessary because different
clauses may select same rows.

Let's generate the condition difference and use it to avoid evaluation
of all the clauses for rows not affected by the condition change.

Testing shows 70% reduction in total CPU time in ovn-heater's 120-node
density-light test with conditional monitoring.  Average CPU usage
during the test phase went down from frequent 100% spikes to just 6-8%.

Note: This will not help with new connections, or re-connections,
or new monitor requests after database conversion.  ovsdb-server will
still evaluate every database row against every clause in the condition
in these cases.  So, it's still important to not have too many clauses
in conditions for large tables.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-31 21:31:38 +02:00
Ilya Maximets
d56366bfa0 tests: Check ovsdb-server logs in OVSDB tests.
Many OVSDB tests are not checking the server log for warnings or
errors.  Some are not even using the log file.  It's mostly OK as we're
usually checking the user-visible behavior.  But it would also be nice
to detect some internal warnings if there are some.

Moving the OVSDB_SERVER_SHUTDOWN macro to the common place, adding
the call to check_logs into it and making OVSDB tests use this macro.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-31 21:30:10 +02:00
Russell Bryant
1335af2f55 MAINTAINERS.rst: Move several people to emeritus status
The following document discusses emeritus committer status:

https://docs.openvswitch.org/en/latest/internals/committer-emeritus-status/

There are several people who I would guess consider themselves
emeritus committers but have not formally declared it. Those moved to
emeritus status in this commit have either explicitly communicated
their desire to move or have both not been active in the last year and
have not yet replied to this patch.

It is easy to re-add people in the future should any emeritus
committer desire to become active again.

Per our policies, a vote of the majority of current committers (or
the list of maintainers prior to this change) is required to move a
committer to emeritus status.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ansis Atteka <ansisatteka@gmail.com>
Acked-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Thomas Graf <tgraf@tgraf.ch>
Acked-by: William Tu <u9012063@gmail.com>
CC: Andy Zhou <azhou@ovn.org>
CC: Gurucharan Shetty <guru@ovn.org>
CC: Ian Stokes <istokes@ovn.org>
CC: Jarno Rajahalme <jarno@ovn.org>
CC: YAMAMOTO Takashi <yamamoto@midokura.com>
2023-05-30 14:09:12 -04:00
Timothy Redaelli
e3d0e84ed3 utilities/bashcomp: Fix PS1 generation on new bash.
The current implementation used to extract PS1 prompt for ovs-vsctl is
broken on recent Bash releases.
Starting from Bash 4.4 it's possible to use @P expansion in order to get
the quoted PS1 directly.

This commit makes the 2 bash completion files to use @P expansion in order
to get the quoted PS1 on Bash >= 4.4.

Reported-at: https://bugzilla.redhat.com/2170344
Reported-by: Martin Necas <mnecas@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-29 20:28:49 +02:00
David Marchand
c3e410a03a netdev-offload-dpdk: Fix crash in debug log.
The offload thread calling ufid_to_rte_flow_disassociate() may be the
last one holding a reference on the netdev and physdev.
So displaying information about them might trigger a crash when
removing a physical port.

Fixes: faf71e4922 ("netdev-dpdk: Print port name in offload API messages.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-29 19:22:18 +02:00
Kevin Traynor
9dad8dfd1e netdev-dpdk: Check rx/tx descriptor sizes for device.
By default OVS configures 2048 descriptors for tx and rx queues
on DPDK devices. It also allows the user to configure those values.

If the values used are not acceptable to the device then queue setup
would fail.

The device exposes it's max/min/alignment requirements and OVS
applies some limits also. Use these to ensure an acceptable value
is used for the number of descriptors on a device tx/rx.

If the default or user value is not acceptable, adjust to a suitable
value and log.

Reported-at: https://bugzilla.redhat.com/2119876
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-26 19:59:52 +02:00
Kevin Traynor
0af352b6df netdev-dpdk: Remove requested descriptors from get_config.
There is no need to display 'requested_rx/tx_descriptors' and
'configured_rx/tx_descriptors' as they will be the same.

It is simpler to just have a single 'n_rxq/txq_desc' value.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-26 19:59:52 +02:00
Balazs Nemeth
59c9084105 ofproto-dpif-upcall: Don't set statistics to 0 when they jump back.
The only way that stats->{n_packets,n_bytes} would decrease is due to an
overflow, or if there are bugs in how statistics are handled. In the
past, there were multiple issues that caused a jump backward. A
workaround was in place to set the statistics to 0 in that case. When
this happened while the revalidator was under heavy load, the workaround
had an unintended side effect where should_revalidate returned false
causing the flow to be removed because the metric it calculated was
based on a bogus value. Since many of those bugs have now been
identified and resolved, there is no need to set the statistics to 0. In
addition, the (unlikely) overflow still needs to be handled
appropriately. If an unexpected jump does happen, just log it as a
warning.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-26 19:58:36 +02:00
Ilya Maximets
0826de990c stream-ssl: Disable alerts on unexpected EOF.
OpenSSL 3.0 enabled alerts for unexpected EOF by default.  It supposed
to alert the application whenever the connection terminated without
a proper close_notify.  And that should allow applications to take
actions to protect themselves from potential TLS truncation attack.
This is how it looks like in the log:

 |stream_ssl|WARN|SSL_read: error:0A000126:SSL routines::unexpected eof while reading
 |jsonrpc|WARN|ssl:127.0.0.1:34288: receive error: Input/output error
 |reconnect|WARN|ssl:127.0.0.1:34288: connection dropped (Input/output error)

The problem is that clients based on OVS libraries do not wait for
the proper termination if it didn't happen right away.  It means that
chances to have alerts on the server side for every single disconnection
are very high.

None of the high level protocols supported by OVS daemons can carry
state between re-connections, e.g., there are no session cookies or
anything like that.  So, the TLS truncation attack is no applicable.

Disable the alert to avoid unnecessary warnings in the log.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-26 14:52:54 +02:00
Frode Nordahl
d51a4ef0a6 tests: layer3-tunnels: Skip bareudp tests if not supported by kernel.
The bareudp tests depend on specific kernel configuration to
succeed.  Skip the test if the feature is not enabled in the
running kernel.

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 19:45:26 +02:00
Ilya Maximets
68d6d2777f AUTHORS: Add yangchang.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 19:45:09 +02:00
yangchang
263fcdfdb8 ovs-fields: Modify the width of tpa and spa.
Arp_spa and arp_tpa are IP addresses, their width should be 32 bits.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: yangchang <yangchang@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 19:44:44 +02:00
Nobuhiro MIKI
701c2dbfb8 userspace: Add new option srv6_flowlabel in SRv6 tunnel.
It supports flowlabel based load balancing by controlling the flowlabel
of outer IPv6 header, which is already implemented in Linux kernel as
seg6_flowlabel sysctl [1].

[1]: https://docs.kernel.org/networking/seg6-sysctl.html

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 17:08:32 +02:00
Nobuhiro MIKI
f328fd4892 netdev-native-tnl: Add ipv6_label param in netdev_tnl_ip_build_header.
For tunnels such as SRv6, some popular vendor appliances support
IPv6 flowlabel based load balancing. In preparation for OVS to
support it, this patch modifies the encapsulation to allow IPv6
flowlabel to be configured.

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 15:45:41 +02:00
Nobuhiro MIKI
eb8c19ebac netdev-native-tnl: Add ipv6_label param in netdev_tnl_push_ip_header.
For tunnels such as SRv6, some popular vendor appliances support
IPv6 flowlabel based load balancing. In preparation for OVS to
support it, this patch modifies the encapsulation to allow IPv6
flowlabel to be configured.

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 15:45:41 +02:00
Ilya Maximets
ce8828a372 netdev-vport: RCU-fy tunnel config.
Tunnel config can be accessed by multiple threads at the same time and
it is supposed to be protected by the netdev_vport mutex.  However,
many functions are getting direct access to it via netdev API without
taking the mutex, creating a potential for various race conditions.

Fix that by protecting the tunnel config with RCU.  The whole structure
is replaced on configuration changes.  Individual fields are never
updated and the structure itself is constant.  This way it can be safely
used by different threads within RCU grace period.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 15:45:35 +02:00
Ilya Maximets
0c4b299ebb smap: Make argument of smap_add_ipv6 constant.
The address is not getting modified inside.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 15:45:28 +02:00
Ilya Maximets
be6f096fbe netdev-vport: Fix unsafe handling of GRE sequence number.
GRE sequence number is maintained as part of the tunnel config.
This triggers tunnel reconfiguration every time set_tunnel_config()
is called, because memset over tunnel config will never be equal to
the new config constructed from database options.

And sequence number incremented non-atomically without holding a
mutex on tunnel push, that may lead to corruption if multiple
threads are sending packets to the same tunnel.

Fix that by moving sequence number to the netdev_vport structure
instead and using an atomic counter.

Fixes: 0ffff49753 ("userspace: add gre sequence number support.")
Fixes: 7dc18ae96d ("userspace: add erspan tunnel support.")
Fixes: 3c6d05a02e ("userspace: Add GTP-U support.")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-25 15:45:08 +02:00
Frode Nordahl
8045c0f8de tests: dpdk: Pass --no-pci to tests that do not use physical ports.
At present, the system-dpdk-testsuite makes assumptions about
environment configuration, and will error out if DPDK compatible
interfaces not configured for DPDK are present in the system with
a message like:

EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:00:03.0 (socket -1)
eth_virtio_pci_init(): Failed to init PCI device
EAL: Requested device 0000:00:03.0 cannot be used

The system-dpdk-testsuite is useful even with no DPDK PHY
available, as the tests requiring a PHY will skip gracefully when
none present.

This patch extends the OVS_DPDK_START and OVS_DPDK_START_VSWITCHD
macros to allow passing in values that will be set in
`other_config:dpdk-extra` before the test runs.

Tests that do not use physical ports are also extended to pass
the `--no-pci` argument.

We will use this patch in a follow-up, enabling more elaborate
Debian autopkgtests for Open vSwitch.

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-19 13:10:31 +02:00
Russell Bryant
5cb543bc59 MAINTAINERS.rst: Make myself an active maintainer
I am currently an emeritus committer, but I would like to become
active again for a short period of time to work through some
governance issues preventing us from updating our committers list
following our approved policies for doing so.

Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
2023-05-18 09:46:39 -04:00
Stefan Hoffmann
965c2955e6 test-stream: Add ssl tests for stream open block.
This tests stream.c and stream.py with ssl connection at
CHECK_STREAM_OPEN_BLOCK.
For the tests, ovsdb needs to be build with libssl.

Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-11 21:41:03 +02:00
Stefan Hoffmann
f3f3be682d tests-ovsdb: Switch OVSDB_START_IDLTEST to macro.
Define bash function as macro now. Later we can extend this macro for other
usecases.

Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-11 21:41:03 +02:00
Ilya Maximets
64e4cca5c4 AUTHORS: Add Zhiqi Chen.
Additionally re-sorted part of the list that was particularly
not ordered.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-11 21:13:29 +02:00
Zhiqi Chen
ffb8b743bb dpctl: Fix dereferencing null pointer in parse_ct_limit_zones().
Command with empty string following "dpctl/ct-get-limits zone="
such as "ovs-appctl dpctl/ct-get-limits zone=" will cause
parse_ct_limit_zones() dereferencing null.

Signed-off-by: Zhiqi Chen <chenzhiqi.123@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-11 21:11:36 +02:00
Eelco Chaudron
cd608cf96e netdev-offload: Fix deadlock/recursive use of the netdev_hmap_rwlock rwlock.
When doing performance testing with OVS v3.1 we ran into a deadlock
situation with the netdev_hmap_rwlock read/write lock. After some
debugging, it was discovered that the netdev_hmap_rwlock read lock
was taken recursively. And well in the following sequence of events:

 netdev_ports_flow_get()
   It takes the read lock, while it walks all the ports
   in the port_to_netdev hmap and calls:
   - netdev_flow_get() which will call:
     - netdev_tc_flow_get() which will call:
       - netdev_ifindex_to_odp_port()
          This function also takes the same read lock to
          walk the ifindex_to_port hmap.

In OVS a read/write lock does not support recursive readers. For details
see the comments in ovs-thread.h. If you do this, it will lock up,
mainly due to OVS setting the PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP
attribute to the lock.

The solution with this patch is to use two separate read/write
locks, with an order guarantee to avoid another potential deadlock.

Fixes: 9fe21a4fc1 ("netdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182541
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-10 21:57:05 +02:00
Yunjian Wang
14773af4b2 ofproto-dpif-xlate: Fix use-after-free when xlate_actions().
Currently, bundle->cvlans and xbundle->cvlans are pointing to the
same memory location. This can cause issues if the main thread
modifies bundle->cvlans and frees it while the revalidator thread
is still accessing xbundle->cvlans. This leads to use-after-free
error.

AddressSanitizer: heap-use-after-free on address 0x615000007b08 at
                        pc 0x0000004ede1e bp 0x7f3120ee0310 sp 0x7f3120ee0300
READ of size 8 at 0x615000007b08 thread T25 (revalidator25)
    0 0x4ede1d in bitmap_is_set lib/bitmap.h:91
    1 0x4fcb26 in xbundle_allows_cvlan ofproto/ofproto-dpif-xlate.c:2028
    2 0x4fe279 in input_vid_is_valid ofproto/ofproto-dpif-xlate.c:2294
    3 0x502abf in xlate_normal ofproto/ofproto-dpif-xlate.c:3051
    4 0x5164dc in xlate_output_action ofproto/ofproto-dpif-xlate.c:5361
    5 0x522576 in do_xlate_actions ofproto/ofproto-dpif-xlate.c:7047
    6 0x52a751 in xlate_actions ofproto/ofproto-dpif-xlate.c:8061
    7 0x4e2b66 in xlate_key ofproto/ofproto-dpif-upcall.c:2212
    8 0x4e2e13 in xlate_ukey ofproto/ofproto-dpif-upcall.c:2227
    9 0x4e345d in revalidate_ukey__ ofproto/ofproto-dpif-upcall.c:2276
    10 0x4e3f85 in revalidate_ukey ofproto/ofproto-dpif-upcall.c:2395
    11 0x4e7ac5 in revalidate ofproto/ofproto-dpif-upcall.c:2858
    12 0x4d9ed3 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:1010
    13 0x7cd92e in ovsthread_wrapper lib/ovs-thread.c:423
    14 0x7f312ff01f3a  (/usr/lib64/libpthread.so.0+0x8f3a)
    15 0x7f312fc8f51f in clone (/usr/lib64/libc.so.6+0xf851f)

0x615000007b08 is located 8 bytes inside of 512-byte region
                                        [0x615000007b00,0x615000007d00)
freed by thread T0 here:
    0 0x7f3130378ad8 in free (/usr/lib64/libasan.so.4+0xe0ad8)
    1 0x49044e in bundle_set ofproto/ofproto-dpif.c:3431
    2 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455
    3 0x40e6c9 in port_configure vswitchd/bridge.c:1300
    4 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921
    5 0x41f1a9 in bridge_run vswitchd/bridge.c:3313
    6 0x42d4fb in main vswitchd/ovs-vswitchd.c:132
    7 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86)

previously allocated by thread T0 here:
    0 0x7f3130378e70 in __interceptor_malloc
    1 0x8757fe in xmalloc__ lib/util.c:140
    2 0x8758da in xmalloc lib/util.c:175
    3 0x875927 in xmemdup lib/util.c:188
    4 0x475f63 in bitmap_clone lib/bitmap.h:79
    5 0x47797c in vlan_bitmap_clone lib/vlan-bitmap.h:40
    6 0x49048d in bundle_set ofproto/ofproto-dpif.c:3433
    7 0x444f92 in ofproto_bundle_register ofproto/ofproto.c:1455
    8 0x40e6c9 in port_configure vswitchd/bridge.c:1300
    9 0x40bcfd in bridge_reconfigure vswitchd/bridge.c:921
    10 0x41f1a9 in bridge_run vswitchd/bridge.c:3313
    11 0x42d4fb in main vswitchd/ovs-vswitchd.c:132
    12 0x7f312fbbcc86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86)

Fixes: fed8962aff ("Add new port VLAN mode "dot1q-tunnel"")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-10 21:40:41 +02:00
David Marchand
1a1b3106d9 ci: Separate DPDK from OVS build.
Let's separate DPDK compilation from the rest of OVS build:
- this avoids multiple jobs building DPDK in parallel, which especially
  affects builds in the dpdk-latest branch,
- we separate concerns about DPDK build requirements from OVS build
  requirements, like python dependencies,
- building DPDK does not depend on how we will link OVS against it, so we
  can use a single cache entry regardless of DPDK_SHARED option,

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-05 19:19:06 +02:00
Ilya Maximets
46240314ac ovsdb-idl.at: Fix write-changed-only tests without change tracking.
The '-w' command line argument is not passed to test-ovsdb in the
OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C, so it juts repeats normal
tests without testing the feature.

Adding the flag.  And using the long version of the flag to make
things more obvious and harder to overlook.  Swapping the argument
in the other working test as well, just for consistency.

Fixes: d94cd0d3ee ("ovsdb-idl: Support write-only-changed IDL monitor mode.")
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-05-04 18:17:39 +02:00
Roi Dayan
77d8228985 tc: Fix cleaning chains.
Sometimes there is a need to clean empty chains as done in
delete_chains_from_netdev().  The cited commit doesn't remove
the chain completely which cause adding ingress_block later to fail.
This can be reproduced with adding bond as ovs port which makes ovs
use ingress_block for it.
While at it add the netdev name that fails to the log.

Fixes: e1e5eac5b0 ("tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-28 19:01:14 +02:00
Ilya Maximets
572e89f418 AUTHORS: Add Stefan, Luca and Max.
Also, slightly re-sort the list to fix the order.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-26 14:54:41 +02:00
Stefan Hoffmann
b456b1a02f python-stream: Handle SSL error in do_handshake.
In some cases ovsdb server or relay gets restarted, ovsdb python clients
may keep the local socket open. Instead of reconnecting a lot of failures
will be logged.
This can be reproduced with ssl connections to the server/relay and
restarting it, so it has the same IP after restart.

This patch catches the Exceptions at do_handshake to recreate the
connection on the client side.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz>
Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz>
Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-26 14:49:49 +02:00
Yunjian Wang
c3559dffcb dpif-netlink: Fix memory leak dpif_netlink_open().
In the specific call to dpif_netlink_dp_transact() (line 398) in
dpif_netlink_open(), the 'dp' content is not being used in the branch
when no error is returned (starting line 430). Furthermore, the 'dp'
and 'buf' variables are overwritten later in this same branch when a
new netlink request is sent (line 437), which results in a memory leak.

Reported by Address Sanitizer.

Indirect leak of 1024 byte(s) in 1 object(s) allocated from:
  0  0x7fe09d3bfe70 in __interceptor_malloc (/usr/lib64/libasan.so.4+0xe0e70)
  1  0x8759be in xmalloc__ lib/util.c:140
  2  0x875a9a in xmalloc lib/util.c:175
  3  0x7ba0d2 in ofpbuf_init lib/ofpbuf.c:141
  4  0x7ba1d6 in ofpbuf_new lib/ofpbuf.c:169
  5  0x9057f9 in nl_sock_transact lib/netlink-socket.c:1113
  6  0x907a7e in nl_transact lib/netlink-socket.c:1817
  7  0x8b5abe in dpif_netlink_dp_transact lib/dpif-netlink.c:5007
  8  0x89a6b5 in dpif_netlink_open lib/dpif-netlink.c:398
  9  0x5de16f in do_open lib/dpif.c:348
  10 0x5de69a in dpif_open lib/dpif.c:393
  11 0x5de71f in dpif_create_and_open lib/dpif.c:419
  12 0x47b918 in open_dpif_backer ofproto/ofproto-dpif.c:764
  13 0x483e4a in construct ofproto/ofproto-dpif.c:1658
  14 0x441644 in ofproto_create ofproto/ofproto.c:556
  15 0x40ba5a in bridge_reconfigure vswitchd/bridge.c:885
  16 0x41f1a9 in bridge_run vswitchd/bridge.c:3313
  17 0x42d4fb in main vswitchd/ovs-vswitchd.c:132
  18 0x7fe09cc03c86 in __libc_start_main (/usr/lib64/libc.so.6+0x25c86)

Fixes: b841e3cd4a ("dpif-netlink: Fix feature negotiation for older kernels.")
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-25 21:58:14 +02:00
Yunjian Wang
8d59ab31d2 ofp-parse: Check ranges on string to uint32_t conversion.
An unnecessarily overflow would occurs when the 'value' is longer than
4294967295. So it's required to check ranges to avoid uint32_t overflow.

Reported-by: Nan Zhou <zhounan14@huawei.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-25 21:54:47 +02:00
Songtao Zhan
3fa0fc5824 util: Fix an issue that thread name cannot be set.
The name of the current thread consists of a name with a maximum
length of 16 bytes and a thread ID. The final name may be longer
than 16 bytes. If the name is longer than 16 bytes, the thread
name will fail to be set

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Songtao Zhan <zhanst1@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-25 21:54:47 +02:00
Nobuhiro MIKI
36c8c101cd doc: Fix the list of supported tunnels in README.
Without distinguishing between IPv4 and IPv6, such as GRE and GRE-IPv6,
nine types of tunneling are currently supported.

Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-25 21:45:04 +02:00
Faicker Mo
70ba6e97db learning-switch: Fix coredump of OpenFlow15 learning-switch.
The OpenFlow15 Packet-Out message contains the match instead of the
in_port.  The flow.tunnel.metadata.tab is not inited but used in the
loop of tun_metadata_to_nx_match.

The coredump gdb backtrace is:
 0  memcpy_from_metadata (dst=0x2f060, src=0x30880, loc=0x10) at lib/tun-metadata.c:467
 1  metadata_loc_from_match_read (match=0x30598, is_masked=<..>,
                                  mask=0x30838, idx=0, map=0x0)
        at lib/tun-metadata.c:865
 2  metadata_loc_from_match_read (is_masked=<...>, mask=0x30838, idx=0,
                                  match=0x30598, map=0x0)
        at lib/tun-metadata.c:854
 3  tun_metadata_to_nx_match (b=0x892260, oxm=OFP15_VERSION, match=0x30598)
        at lib/tun-metadata.c:888
 4  nx_put_raw (b=0x892260, oxm=OFP15_VERSION, match=0x30598,
                cookie=<...>, cookie=0, cookie_mask=<...>, cookie_mask=0)
        at lib/nx-match.c:1186
 5  oxm_put_match (b=0x892260, match=0x30598, version=OFP15_VERSION)
        at lib/nx-match.c:1343
 6  ofputil_encode_packet_out (po=0x30580, protocol=<...>) at lib/ofp-packet.c:1226
 7  process_packet_in (sw=0x891d70, oh=<...>) at lib/learning-switch.c:619
 8  lswitch_process_packet (msg=0x892210, sw=0x891d70) at lib/learning-switch.c:374
 9  lswitch_run (sw=0x891d70) at lib/learning-switch.c:324
 10 main (argc=<...>, argv=<...>) at utilities/ovs-testcontroller.c:180

Fix that by initing the flow metadata.

Fixes: 35eb6326d5 ("ofp-util: Add flow metadata to ofputil_packet_out")
Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-25 18:20:32 +02:00