This commit adds non-null pointer assertions in some code that performs
some decisions based on old and new input ovsdb_rows.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds a few null pointer assertions and checks to some return
values of `ovsdb_table_schema_get_column`. If a null pointer is
encountered in these blocks, either the assertion will fail or the
control flow will now be redirected to alternative paths which will
output the appropriate error messages.
A few ovsdb-rbac and ovsdb-server tests are also updated to verify the
expected warning logs by adding said logs to the ALLOWLIST of the
OVSDB_SERVER_SHUTDOWN statements.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As per the Open vSwitch Manual ovs-vsctl(8) the Bridge IPFIX parameters
can be passed as follows:
ovs-vsctl -- set Bridge br0 ipfix=@i \
-- --id=@i create IPFIX targets=\"192.168.0.34:4739\" \
obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 \
cache_max_flows=13 \
other_config:enable-input-sampling=false \
other_config:enable-output-sampling=false
where the default values are:
enable_input_sampling: true
enable_output_sampling: true
But in the existing code these 2 parameters take up unexpected values
in some scenarios:
be_opts.enable_input_sampling = !smap_get_bool(&be_cfg->other_config,
"enable-input-sampling", false);
be_opts.enable_output_sampling = !smap_get_bool(&be_cfg->other_config,
"enable-output-sampling", false);
Here, the function smap_get_bool is being used with a negation.
This returns expected values for the default case (since the above code
will negate “false” we get from smap_get bool function and return the
value “true”) but unexpected values for the case where the sampling
value is passed through the CLI.
For example, if we pass "true" for other_config:enable-input-sampling
in the CLI, the above code will negate the “true” value we get from
the smap_bool function and return the value “false”. Same would be the
case for enable_output_sampling.
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Sayali Naval <sanaval@cisco.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.
Use the rte_flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. Re-program the RSS redirection table to only
use the other Rx queues.
The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.
This feature must be enabled per port on specific protocols via the
rx-steering option. This option takes "rss" followed by a "+" separated
list of protocol names. It is only supported on ethernet ports. This
feature is experimental.
If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control packets. If the hardware
cannot satisfy the number of requested Rx queues, the last Rx queue will
be assigned for control plane. If only one Rx queue is available, the
rx-steering feature will be disabled. If the hardware does not support
the rte_flow matchers/actions, the rx-steering feature will be
completely disabled on the port and regular rss will be performed
instead.
It cannot be enabled when other-config:hw-offload=true as it may
conflict with the offloaded flows. Similarly, if hw-offload is enabled,
custom rx-steering will be forcibly disabled on all ports and replaced
by regular rss.
Example use:
ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \
set interface phy0 options:rx-steering=rss+lacp -- \
set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \
set interface phy1 options:rx-steering=rss+lacp
As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.
To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.
+----------------------+
| DUT |
|+--------------------+|
|| br-int || in_port=patch10,actions=mod_dl_src:$patch11,
|| || mod_dl_dst:$tgen0,
|| || output:patch10
|| || in_port=patch11,actions=mod_dl_src:$patch10
|| || mod_dl_dst:$tgen0,
|| patch10 patch11 || output:patch10
|+---|-----------|----+|
| | | |
|+---|-----------|----+|
|| patch00 patch01 ||
|| tag:10 tag:20 ||
|| ||
|| br-phy || default flow, action=NORMAL
|| ||
|| bond0 || balance-slb, lacp=passive, lacp-time=fast
|| phy0 phy1 ||
|+------|-----|-------+|
+-------|-----|--------+
| |
+-------|-----|--------+
| port0 port1 | balance L3/L4, lacp=active, lacp-time=fast
| lag | mode trunk VLANs 10, 20
| |
| switch |
| |
| vlan 10 vlan 20 | mode access
| port2 port3 |
+-----|----------|-----+
| |
+-----|----------|-----+
| tgen0 tgen1 | Random traffic that is properly balanced
| | across the bond ports in both directions.
| traffic generator |
+----------------------+
Without rx-steering, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.
~# ovs-appctl lacp/show-stats bond0
---- bond0 statistics ----
member: phy0:
TX PDUs: 347246
RX PDUs: 14865
RX Bad PDUs: 0
RX Marker Request PDUs: 0
Link Expired: 168
Link Defaulted: 0
Carrier Status Changed: 0
member: phy1:
TX PDUs: 347245
RX PDUs: 14919
RX Bad PDUs: 0
RX Marker Request PDUs: 0
Link Expired: 147
Link Defaulted: 1
Carrier Status Changed: 0
When rx-steering is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput. Neither
the "Link Expired" nor the "Link Defaulted" counters are incremented
anymore.
This feature may be considered as "QoS". However, it does not work by
limiting the rate of traffic explicitly. It only guarantees that some
protocols have a lower chance of being dropped because the PMD cores
cannot keep up with regular traffic.
The choice of protocols is limited on purpose. This is not meant to be
configurable by users. Some limited configurability could be considered
in the future but it would expose to more potential issues if users are
accidentally redirecting all traffic in the isolated queue.
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
At some point in OVS history, some virtio features were announced as
supported (ECN and UFO virtio features).
The userspace TSO code, which has been added later, does not support
those features and tries to disable them.
This breaks OVS upgrades: if an existing VM already negotiated such
features, their lack on reconnection to an upgraded OVS triggers a
vhost socket disconnection by Qemu.
This results in an endless loop because Qemu then retries with the same
set of virtio features.
This patch proposes to try and detect those vhost socket disconnection
and fallback restoring the old virtio features (and disabling TSO for
this vhost port).
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add a vxlan gbp offload test case:
vxlan offloads with gbp extention - ping between two ports - offloads
enabled ok
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Kernels that do not support vxlan gbp would treat the rule that has vxlan
gbp encap action or vxlan gbp id match differently, either reject it or
just skip the action/match and continue processing the knowing ones.
To solve the issue, probe and disallow inserting rules with vxlan gbp
action/match if kernel does not support it.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Add TC offload support for vxlan encap with gbp option.
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Most of the data members of struct tc_action{ } are defined as anonymous
struct in place. Instead of passing all members of an anonymous struct,
which is not flexible to new members being added, expose encap as named
struct and pass it entirely.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Linux kernel netlink module added NLA_F_NESTED flag checking for nested
netlink messages in 5.2. A nested message without the flag set will be
treated as malformatted one. The check is optional and is controlled by
message policy. To avoid this, add NLA_F_NESTED explicitly for all
nested netlink messages with a new function
nl_msg_start_nested_with_flag().
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Extract vxlan gbp option encoding to odp_encode_gbp_raw to be used in
following commits.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Extract vxlan gbp option decoding to odp_decode_gbp_raw to be used in
following commits.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tc flower tunnel key options were encoded in nl_msg_put_flower_tunnel_opts
and decoded in nl_parse_flower_tunnel_opts. Only geneve was supported.
To avoid adding more arguments to the function to support more vxlan
options in the future, change the function arguments to pass tunnel
entirely to it instead of keep adding new arguments.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Current implementation of meters in the userspace datapath takes
the meter lock for every packet batch. If more than one thread
hits the flow with the same meter, they will lock each other.
Replace the critical section with atomic operations to avoid
interlocking. Meters themselves are RCU-protected, so it's safe
to access them without holding a lock.
Implementation does the following:
1. Tries to advance the 'used' timer of the meter with atomic
compare+exchange if it's smaller than 'now'.
2. If the timer change succeeds, atomically update band buckets.
3. Atomically update packet statistics for a meter.
4. Go over buckets and try to atomically subtract the amount of
packets or bytes, recording the highest exceeded band.
5. Atomically update band statistics and drop packets.
Bucket manipulations are implemented with atomic compare+exchange
operations with extra checks, because bucket size should never
exceed the maximum and it should never go below zero.
Packet statistics may be momentarily inconsistent, i.e., number
of packets and the number of bytes may reflect different sets
of packets. But it should be eventually consistent. And the
difference at any given time should be in just few packets.
For the sake of reduced code complexity PKTPS meter tries to push
packets through the band one by one, even though they all have
the same weight. This is also more fair if more than one thread
is passing packets through the same band at the same time.
Trying to predict the number of packets that can pass may also
cause extra atomic operations reducing the performance.
This implementation shows similar performance to the previous one,
but should scale better with more threads hitting the same meter.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Lin Huang <linhuang@ruijie.com.cn>
Tested-by: Zhang YuHuang <zhangyuhuang@ruijie.com.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The patch introduces a new commands ovs-appctl dpctl/dump-conntrack-exp
that allows to dump the existing expectations for the userspace ct.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
get_log_next_line_num() was defined in alb.at.
As it may be useful in other test files, move to
ofproto-macros.at.
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As far as I can tell they're used mostly for CI job definitions and
these tend to result in long lines.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405796.html
Suggested-by: Aaron Conole <aconole@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add additional error coverage counters for dpif operation failures.
This could help to quickly identify netlink problems when communicating
with the OVS kernel module.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2070630
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Eelco Chaudron was elected by the Open vSwitch committers yesterday.
This formalises his status as an Open vSwitch committer.
Welcome Eelco!
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
EditorConfig is a file format and collection of text editor plugins for
maintaining consistent coding styles between different editors and IDEs.
Initialize the file following the coding rules in
Documentation/internals/contributing/coding-style.rst and add exceptions
declared in build-aux/initial-tab-allowed-files. Only enforce rules for
*.c and *.h files. Other files should use the default indenting rules
from text editors.
In order for this file to be taken into account (unless they use an
editor with built-in EditorConfig support), developers will have to
install a plugin.
Notes:
* All matching rules are considered. The last matching rule's properties
will override the previous ones.
* The max_line_length property is only supported by a limited number of
EditorConfig plugins. It will be ignored if unsupported.
Link: https://editorconfig.org/
Link: https://github.com/editorconfig/editorconfig-emacs
Link: https://github.com/editorconfig/editorconfig-vim
Link: https://github.com/editorconfig/editorconfig/wiki/EditorConfig-Properties#max_line_length
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the L4 checksum was verified and it is OK or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner L4 header since that is not yet supported.
Calculate the L4 checksum when the packet is going to be sent
over a device that doesn't support the feature.
Linux tap devices allows enabling L3 and L4 offload, so this
patch enables the feature. However, Linux socket interface
remains disabled because the API doesn't allow enabling
those two features without enabling TSO too.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The netdev receiving packets is supposed to provide the flags
indicating if the IP checksum was verified and it is GOOD or BAD,
otherwise the stack will check when appropriate by software.
If the packet comes with good checksum, then postpone the
checksum calculation to the egress device if needed.
When encapsulate a packet with that flag, set the checksum
of the inner IP header since that is not yet supported.
Calculate the IP checksum when the packet is going to be sent over
a device that doesn't support the feature.
Linux devices don't support IP checksum offload alone, so the
support is not enabled.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch modifies netdev_get_status to include information about
checksum offload status by port, allowing the user to gain insight into
where checksum offloading is active.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Co-authored-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Make the read of the current seq->value atomic, i.e., not needing to
acquire the global mutex when reading it. On 64-bit systems, this
incurs no overhead, and it will avoid the mutex and potentially
a system call.
For incrementing the value followed by waking up the threads, we are
still taking the mutex, so the current behavior is not changing. The
seq_read() behavior is already defined as, "Returns seq's current
sequence number (which could change immediately)". So the change
should not impact the current behavior.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The signal_fds pipe and wevent are a mechanism to wake up the process
after it received a signal and stored the number for the future
processing. They are not intended for inter-process communication.
However, in the current code, descriptors are not closed on fork().
The main scenario where we use fork() is a monitor process. Monitor
doesn't actually use poll loops and doesn't wait on the descriptor.
But when a child process is killed, it (child) sends a byte to itself,
then it wakes up due to POLLIN on the pipe and terminates itself after
processing all the callbacks. The byte stays unread. And the pipe is
still open in the monitor process. When child dies, the monitor wakes
up and forks again. New child inherits the same pipe that still
contains unread data. This data is never read, so the child will
constantly wake itself up for no reason.
Interestingly enough raise(SIGSEGV) doesn't immediately kill the
process. The execution continues til the end of a signal handler,
so we're still able to write a byte to a pipe even in this case.
Presumably because we don't have SA_NODEFER.
Fix the issue by re-creating the pipe/event on fork. This way
every new child will have its own notification channel and will
not wake up any other processes.
There was already an attempt to fix the issue, but it didn't get a
follow up (see the reported-at tag). This is an alternative solution.
Fixes: ff8decf1a318 ("daemon: Add support for process monitoring and restart.")
Reported-at: https://patchwork.ozlabs.org/project/openvswitch/patch/20221019093147.2072-1-lifengqi@inspur.com/
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Initial change set is preserved for as long as the monitor itself.
However, if a new client has a condition on a column that is not
one of the monitored columns, this column will be added to the
monitor via ovsdb_monitor_condition_bind(). This new column, however,
doesn't exist in the initial change set. That will cause ovsdb-server
to malfunction or crash trying to access non-existent column during
condition evaluation:
ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 4 at 0x606000006780 thread T0
0 ovsdb_clause_evaluate ovsdb/condition.c:328:26
1 ovsdb_condition_match_any_clause ovsdb/condition.c:441:13
2 ovsdb_condition_empty_or_match_any ovsdb/condition.h:84:13
3 ovsdb_monitor_row_update_type_condition ovsdb/monitor.c:892:28
4 ovsdb_monitor_compose_row_update2 ovsdb/monitor.c:1058:12
5 ovsdb_monitor_compose_update ovsdb/monitor.c:1172:24
6 ovsdb_monitor_get_update ovsdb/monitor.c:1276:24
7 ovsdb_jsonrpc_monitor_create ovsdb/jsonrpc-server.c:1505:12
8 ovsdb_jsonrpc_session_got_request ovsdb/jsonrpc-server.c:1030:21
9 ovsdb_jsonrpc_session_run ovsdb/jsonrpc-server.c:572:17
10 ovsdb_jsonrpc_session_run_all ovsdb/jsonrpc-server.c:602:21
11 ovsdb_jsonrpc_server_run ovsdb/jsonrpc-server.c:417:9
12 main_loop ovsdb/ovsdb-server.c:222:9
13 main ovsdb/ovsdb-server.c:500:5
14 __libc_start_call_main
15 __libc_start_main@GLIBC_2.2.5
16 _start (ovsdb/ovsdb-server+0x473034)
Located 0 bytes after 64-byte region [0x606000006740,0x606000006780)
allocated by thread T0 here:
0 malloc (ovsdb/ovsdb-server+0x50dc82)
1 xmalloc__ lib/util.c:140:15
2 xmalloc lib/util.c:175:12
3 clone_monitor_row_data ovsdb/monitor.c:336:12
4 ovsdb_monitor_changes_update ovsdb/monitor.c:1384:23
5 ovsdb_monitor_get_initial ovsdb/monitor.c:1535:21
6 ovsdb_jsonrpc_monitor_create ovsdb/jsonrpc-server.c:1502:9
7 ovsdb_jsonrpc_session_got_request ovsdb/jsonrpc-server.c:1030:21
8 ovsdb_jsonrpc_session_run ovsdb/jsonrpc-server.c:572:17
9 ovsdb_jsonrpc_session_run_all ovsdb/jsonrpc-server.c:602:21
10 ovsdb_jsonrpc_server_run ovsdb/jsonrpc-server.c:417:9
11 main_loop ovsdb/ovsdb-server.c:222:9
12 main ovsdb/ovsdb-server.c:500:5
13 __libc_start_call_main
14 __libc_start_main@GLIBC_2.2.5
15 _start (ovsdb/ovsdb-server+0x473034)
Fix that by destroying the initial change set every time new columns
are added to the monitor. This will trigger re-generation of the
change set and it will contain all the necessary columns afterwards.
Fixes: 07c27226ee96 ("ovsdb: Monitor: Keep and maintain the initial change set.")
Reported-by: Han Zhou <hzhou@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Use the backtrace functions that is provided by libc, this allows us
to get backtrace that is independent of the current memory map of the
process. Which in turn can be used for debugging/tracing purpose.
The backtrace is not 100% accurate due to various optimizations, most
notably "-fomit-frame-pointer" and LTO. This might result that the
line in source file doesn't correspond to the real line. However, it
should be able to pinpoint at least the function where the backtrace
was called.
The implementation is determined during compilation based on available
libraries. Libunwind has higher priority if both methods are available
to keep the compatibility with current behavior.
The backtrace is not marked as signal safe however the backtrace manual
page gives more detailed explanation why it might be the case [0].
Load the "libgcc" or equivalent in advance within the "fatal_signal_init"
which should ensure that subsequent calls to backtrace* do not call
malloc and are signal safe.
The typical backtrace will look similar to the one below:
/lib64/libopenvswitch-3.1.so.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
/lib64/libopenvswitch-3.1.so.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
/lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
/lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) [0x7fc5db563a8d]
ovsdb-server(+0xa661) [0x562cfce2e661]
ovsdb-server(+0x7e39) [0x562cfce2be39]
/lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
ovsdb-server(+0x8c35) [0x562cfce2cc35]
backtrace.h elaborates on how to effectively get the line information
associated with the addressed presented in the backtrace.
[0]
backtrace() and backtrace_symbols_fd() don't call malloc() explicitly,
but they are part of libgcc, which gets loaded dynamically when first
used. Dynamic loading usually triggers a call to malloc(3). If you
need certain calls to these two functions to not allocate memory (in
signal handlers, for example), you need to make sure libgcc is loaded
beforehand
Reported-at: https://bugzilla.redhat.com/2177760
Signed-off-by: Ales Musil <amusil@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some venerable AMD processors do not support querying extended features
(EAX=7) with cpuid.
In this case, it is not a programmatic error and the runtime check should
simply return the isa is unsupported.
Reported-by: Davide Repetto <red@idp.it>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2211747
Fixes: b366fa2f4947 ("dpif-netdev: Call cpuid for x86 isa availability.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The tc module combines the use of the `tc_transact` helper
function for communication with the in-kernel tc infrastructure
with assertions on the reply data by `ofpbuf_at_assert` on the
received data prior to further processing.
With the presence of bugs on the kernel side, we need to treat
the kernel as an unreliable service provider and replace assertions
on the reply from it with checks to avoid a fatal crash of OVS.
For the record, the symptom of the crash is this in the log:
EMER|include/openvswitch/ofpbuf.h:194:
assertion offset + size <= b->size failed in ofpbuf_at_assert()
And an excerpt of the backtrace looks like this:
ofpbuf_at_assert (offset=16, size=20) at include/openvswitch/ofpbuf.h:194
tc_replace_flower at lib/tc.c:3223
netdev_tc_flow_put at lib/netdev-offload-tc.c:2096
netdev_flow_put at lib/netdev-offload.c:257
parse_flow_put at lib/dpif-netlink.c:2297
try_send_to_netdev at lib/dpif-netlink.c:2384
Reported-At: https://launchpad.net/bugs/2018500
Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action")
Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload")
Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Fixes: c1c9c9c4b636 ("Implement QoS framework.")
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Git by default reports progress on stderr. This doesn't fail
the build, but upsets the powershell:
git : Cloning into 'c:\pthreads4w-code'...
At line:3 char:1
+ git clone https://git.code.sf.net/p/pthreads4w/code c:\pthreads4w-cod ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified:
(Cloning into 'c:\pthreads4w-code'...:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
Silence the git clone to avoid the warning.
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
GCC now reports uninitialized warnings from function return values.
../lib/netdev-dpdk.c: In function 'netdev_dpdk_mempool_configure':
../lib/netdev-dpdk.c:964:22: warning: 'dmp' may be used uninitialized [-Wmaybe-uninitialized]
964 | dev->dpdk_mp = dmp;
| ~~~~~~~~~~~~~^~~~~
../lib/netdev-dpdk.c:854:21: note: 'dmp' was declared here
854 | struct dpdk_mp *dmp, *next;
| ^~~
NB: this looks like a false positive, gcc 13 probably fails to see the link
between reuse and dmp in dpdk_mp_get().
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Caught while reviewing code.
Fixes: aca2f8a8a6b6 ("netdev-offload-dpdk: Implement HW miss packet recover for vport.")
Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In most cases, after the condition change request, the new condition
is the same as old one plus minus a few clauses. Today, ovsdb-server
will evaluate every database row against all the old clauses and then
against all the new clauses in order to tell if an update should be
generated.
For example, every time a new port is added, ovn-controller adds two
new clauses to conditions for a Port_Binding table. And this condition
may grow significantly in size making addition of every new port
heavier on the server side.
The difference between conditions is not larger and, likely,
significantly smaller than old and new conditions combined. And if
the row doesn't match clauses that are different between old and new
conditions, that row should not be part of the update. It either
matches both old and new, or it doesn't match either of them.
If the row matches some clauses in the difference, then we need to
perform a full match against old and new in order to tell if it
should be added/removed/modified. This is necessary because different
clauses may select same rows.
Let's generate the condition difference and use it to avoid evaluation
of all the clauses for rows not affected by the condition change.
Testing shows 70% reduction in total CPU time in ovn-heater's 120-node
density-light test with conditional monitoring. Average CPU usage
during the test phase went down from frequent 100% spikes to just 6-8%.
Note: This will not help with new connections, or re-connections,
or new monitor requests after database conversion. ovsdb-server will
still evaluate every database row against every clause in the condition
in these cases. So, it's still important to not have too many clauses
in conditions for large tables.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Many OVSDB tests are not checking the server log for warnings or
errors. Some are not even using the log file. It's mostly OK as we're
usually checking the user-visible behavior. But it would also be nice
to detect some internal warnings if there are some.
Moving the OVSDB_SERVER_SHUTDOWN macro to the common place, adding
the call to check_logs into it and making OVSDB tests use this macro.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The following document discusses emeritus committer status:
https://docs.openvswitch.org/en/latest/internals/committer-emeritus-status/
There are several people who I would guess consider themselves
emeritus committers but have not formally declared it. Those moved to
emeritus status in this commit have either explicitly communicated
their desire to move or have both not been active in the last year and
have not yet replied to this patch.
It is easy to re-add people in the future should any emeritus
committer desire to become active again.
Per our policies, a vote of the majority of current committers (or
the list of maintainers prior to this change) is required to move a
committer to emeritus status.
Signed-off-by: Russell Bryant <russell@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ansis Atteka <ansisatteka@gmail.com>
Acked-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Thomas Graf <tgraf@tgraf.ch>
Acked-by: William Tu <u9012063@gmail.com>
CC: Andy Zhou <azhou@ovn.org>
CC: Gurucharan Shetty <guru@ovn.org>
CC: Ian Stokes <istokes@ovn.org>
CC: Jarno Rajahalme <jarno@ovn.org>
CC: YAMAMOTO Takashi <yamamoto@midokura.com>
The current implementation used to extract PS1 prompt for ovs-vsctl is
broken on recent Bash releases.
Starting from Bash 4.4 it's possible to use @P expansion in order to get
the quoted PS1 directly.
This commit makes the 2 bash completion files to use @P expansion in order
to get the quoted PS1 on Bash >= 4.4.
Reported-at: https://bugzilla.redhat.com/2170344
Reported-by: Martin Necas <mnecas@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The offload thread calling ufid_to_rte_flow_disassociate() may be the
last one holding a reference on the netdev and physdev.
So displaying information about them might trigger a crash when
removing a physical port.
Fixes: faf71e492263 ("netdev-dpdk: Print port name in offload API messages.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
By default OVS configures 2048 descriptors for tx and rx queues
on DPDK devices. It also allows the user to configure those values.
If the values used are not acceptable to the device then queue setup
would fail.
The device exposes it's max/min/alignment requirements and OVS
applies some limits also. Use these to ensure an acceptable value
is used for the number of descriptors on a device tx/rx.
If the default or user value is not acceptable, adjust to a suitable
value and log.
Reported-at: https://bugzilla.redhat.com/2119876
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There is no need to display 'requested_rx/tx_descriptors' and
'configured_rx/tx_descriptors' as they will be the same.
It is simpler to just have a single 'n_rxq/txq_desc' value.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The only way that stats->{n_packets,n_bytes} would decrease is due to an
overflow, or if there are bugs in how statistics are handled. In the
past, there were multiple issues that caused a jump backward. A
workaround was in place to set the statistics to 0 in that case. When
this happened while the revalidator was under heavy load, the workaround
had an unintended side effect where should_revalidate returned false
causing the flow to be removed because the metric it calculated was
based on a bogus value. Since many of those bugs have now been
identified and resolved, there is no need to set the statistics to 0. In
addition, the (unlikely) overflow still needs to be handled
appropriately. If an unexpected jump does happen, just log it as a
warning.
Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OpenSSL 3.0 enabled alerts for unexpected EOF by default. It supposed
to alert the application whenever the connection terminated without
a proper close_notify. And that should allow applications to take
actions to protect themselves from potential TLS truncation attack.
This is how it looks like in the log:
|stream_ssl|WARN|SSL_read: error:0A000126:SSL routines::unexpected eof while reading
|jsonrpc|WARN|ssl:127.0.0.1:34288: receive error: Input/output error
|reconnect|WARN|ssl:127.0.0.1:34288: connection dropped (Input/output error)
The problem is that clients based on OVS libraries do not wait for
the proper termination if it didn't happen right away. It means that
chances to have alerts on the server side for every single disconnection
are very high.
None of the high level protocols supported by OVS daemons can carry
state between re-connections, e.g., there are no session cookies or
anything like that. So, the TLS truncation attack is no applicable.
Disable the alert to avoid unnecessary warnings in the log.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The bareudp tests depend on specific kernel configuration to
succeed. Skip the test if the feature is not enabled in the
running kernel.
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Arp_spa and arp_tpa are IP addresses, their width should be 32 bits.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: yangchang <yangchang@chinatelecom.cn>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
It supports flowlabel based load balancing by controlling the flowlabel
of outer IPv6 header, which is already implemented in Linux kernel as
seg6_flowlabel sysctl [1].
[1]: https://docs.kernel.org/networking/seg6-sysctl.html
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
For tunnels such as SRv6, some popular vendor appliances support
IPv6 flowlabel based load balancing. In preparation for OVS to
support it, this patch modifies the encapsulation to allow IPv6
flowlabel to be configured.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>