In a backtrace test with monitor the child process will be re-started
after being killed. The test doesn't wait for that to happen, so it
is possible that during the test cleanup the pid in a pid file is not
updated yet. Hence, the on-exit hook will not kill the process.
This is causing issues in Cirrus CI, because gmake on FreBSD waits for
all child processes to exit and that never happens.
Fix the issue by waiting for a new process. It's also better to exit
gracefully instead of relying on the on-exit kill.
Fixes: 759a29dc2d97 ("backtrace: Extend the backtrace functionality.")
Acked-by: Ales Musil <amusil@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Before the cleanup option, the bridge_exit() call was fairly fast,
because it didn't include any particularly long operations. However,
with the cleanup flag, this function destroys a lot of datapath
resources freeing a lot of memory, waiting on RCU and talking to
the kernel. That may take a noticeable amount of time, especially
on a busy system or under profilers/sanitizers. However, the unixctl
'exit' command replies instantly without waiting for any work to
actually be done. This may cause system test failures or other
issues where scripts expect ovs-vswitchd to exit or destroy all the
datapath resources shortly after appctl call.
Fix that by waiting for the bridge_exit() before replying to the user.
At least, all the datapath resources will actually be destroyed by
the time ovs-appctl exits.
Also moving a structure from stack to global. Seems cleaner this way.
Since we're not replying right away and it's technically possible
to have multiple clients requesting exit at the same time, storing
connections in an array.
Fixes: fe13ccdca6a2 ("vswitchd: Add --cleanup option to the 'appctl exit' command")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Use TCA_POLICE_RATE64 if the rate cannot be expressed using 32bits.
This breaks the 32Gbps barrier. The new barrier is ~4Tbps caused by
netdev's API expressing kbps rates using 32-bit integers.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137643
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In preparation for supporting 64-bit rates in tc policies, move the
allocation and initialization of struct tc_police object inside
nl_msg_put_act_police(). That way, the function is now called with the
actual rates.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
It is equivalent to tc_policer_init() so remove the duplicated function.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, htb rates are capped at ~34Gbps because they are internally
expressed as 32-bit fields.
Move min and max rates to 64-bit fields and use TCA_HTB_RATE64 and
TCA_HTB_CEIL64 to configure HTC classes to break this barrier.
In order to test this, create a dummy tuntap device and set it's
speed to a very high value so we can try adding a QoS queue with big
rates.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137619
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
tc uses these "rtab" tables to estimate the time (ticks) that it takes
to send a packet of different sizes. In preparation for the introduction
of 64-bit rates, add an argument to tc_put_rtab() to allow an external
64-bit rate.
Also use 64bits for other burst buffer calculation functions.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Instead of relying on feature bits, use the speed value directly as
maximum rate for htb and hfsc classes.
There is still a limitation with the maximum rate that we can express
with a 32-bit number in bytes/s (~ 34.3Gbps), but using the actual link speed
instead of the feature bits, we can at least use an accurate maximum for
some link speeds (such as 25Gbps) which are not supported by netdev's feature
bits.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently, the netdev's speed is being calculated by taking the link's
feature bits (using netdev_get_features()) and transforming them into
bps.
This mechanism can be both inaccurate and difficult to maintain, mainly
because we currently use the feature bits supported by OpenFlow which
would have to be extended to support all new feature bits of all netdev
implementations while keeping the OpenFlow API intact.
In order to expose the link speed accurately for all current and future
hardware, add a new netdev API call that allows the implementations to
provide the current and maximum link speeds in Mbps.
Internally, the logic to get the maximum supported speed still relies on
feature bits so it might still get out of sync in the future. However,
the maximum configurable speed is not used as much as the current speed
and these feature bits are not exposed through the netdev interface so
it should be easier to add more.
Use this new function instead of netdev_get_features() where the link
speed is needed.
As a consequence of this patch, link speeds of cards is properly
reported (internally in OVSDB) even if not supported by OpenFlow.
A test verifies this behavior using a tap device.
Also, in order to avoid using the old, this patch adds a checkpatch.py
warning if the old API is used.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously it was not possible to set the probe interval for the
connection from a relay to the backing ovsdb-server. With this change
it is now possible using the
`ovsdb-server/set-relay-source-probe-interval` command.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Max requested sleep time and status for a PMD thread
is logged at start up or when changed, but it can be
convenient to have a command to dump this information
explicitly.
It is envisaged that this will be expanded for individual
pmds in the future, hence adding to dpif_netdev_pmd_info().
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This is just cosmetic. There is no change to the tests.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
other_config:pmd-maxsleep is a config option to allow
PMD thread cores to sleep under low or no load conditions.
Rename it to 'pmd-sleep-max' to allow a more structured
name and so that additional options or command can follow
the 'pmd-sleep-xyz' pattern.
Use of other_config:pmd-maxsleep is deprecated to be
removed in a future release and will result in a warning.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This adds a Python version of the async DNS support added in:
771680d96 DNS: Add basic support for asynchronous DNS resolving
The above version uses the unbound C library, and this
implimentation uses the SWIG-wrapped Python version of that.
In the event that the Python unbound library is not available,
a warning will be logged and the resolve() method will just
return None. For the case where inet_parse_active() is passed
an IP address, it will not try to resolve it, so existing
behavior should be preserved in the case that the unbound
library is unavailable.
Intentional differences from the C version are as follows:
OVS_HOSTS_FILE environment variable can bet set to override
the system 'hosts' file. This is primarily to allow testing to
be done without requiring network connectivity.
Since resolution can still be done via hosts file lookup, DNS
lookups are not disabled when resolv.conf cannot be loaded.
The Python socket_util module has fallen behind its C equivalent.
The bare minimum change was done to inet_parse_active() to support
sync/async dns, as there is no equivalent to
parse_sockaddr_components(), inet_parse_passive(), etc. A TODO
was added to bring socket_util.py up to equivalency to the C
version.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Since a27d70a89 ("conntrack: add generic IP protocol support") all
the unrecognized IP protocols get handled using ct_proto_other ops
and are managed as L3 using 3 tuples.
This patch stores L4 information for SCTP in the conn_key so that
multiple conn instances, instead of one with ports zeroed, will be
created when there are multiple SCTP connections between two hosts.
It also performs crc32c check when not offloaded, and adds SCTP to
pat_enabled.
With this patch, given two SCTP association between two hosts,
tracking the connection will result in:
sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),
reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),
reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1
instead of:
sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),
reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds assertions in the functions `shash_count`,
`simap_count`, and `smap_count` to ensure that the corresponding input
struct pointer is not NULL.
This ensures that if the return values of `shash_sort`, `simap_sort`,
or `smap_sort` are NULL, then the following for loops would not attempt
to access the pointer, which might result in segmentation faults or
undefined behavior.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The set_error function is now used regardless of whether experimental APIs
are allowed or not, so it must be defined unconditionally.
Fixes: fc06ea9a1883 ("netdev-dpdk: Add custom rx-steering configuration.")
Acked-by: Ivan Malov <ivan.malov@arknetworks.am>
Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@arknetworks.am>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Several xlate actions used in recursive translation currently store a
large amount of information on the stack. This can result in handler
threads quickly running out of stack space despite before
xlate_resubmit_resource_check() is able to terminate translation. This
patch reduces stack usage by over 3kb from several translation actions.
This patch also moves some trace function from do_xlate_actions into its
own function.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2104779
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
This narrows down spelling errors that are in the commit
subject. It also provides a subject if the subject line is
missing. The provisional subject is the name of the patch
file, which should provide some context about the patch.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Chandan Somani <csomani@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
This will be useful for correcting possible spelling
mistakes with ease. Suggestions limited to 3 at first,
but can be made configurable in the future.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Chandan Somani <csomani@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Single out flagged words and allow for more useful
details, like spelling suggestions.
Signed-off-by: Chandan Somani <csomani@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
In the case where routing is enabled, the bridge member of the
`vsctl_port` structs is not populated. This can cause a crash if we
attempt to access it. This patch fixes the crash by checking if the
bridge member is valid before attempting to access it. In the
`check_conflicts` function, we print both the port name and the bridge
name if routing is disabled and we only print the port name if routing
is enabled.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds non-null pointer assertions in some code that performs
some decisions based on old and new input ovsdb_rows.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds a few null pointer assertions and checks to some return
values of `ovsdb_table_schema_get_column`. If a null pointer is
encountered in these blocks, either the assertion will fail or the
control flow will now be redirected to alternative paths which will
output the appropriate error messages.
A few ovsdb-rbac and ovsdb-server tests are also updated to verify the
expected warning logs by adding said logs to the ALLOWLIST of the
OVSDB_SERVER_SHUTDOWN statements.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: James Raphael Tiovalen <jamestiotio@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As per the Open vSwitch Manual ovs-vsctl(8) the Bridge IPFIX parameters
can be passed as follows:
ovs-vsctl -- set Bridge br0 ipfix=@i \
-- --id=@i create IPFIX targets=\"192.168.0.34:4739\" \
obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 \
cache_max_flows=13 \
other_config:enable-input-sampling=false \
other_config:enable-output-sampling=false
where the default values are:
enable_input_sampling: true
enable_output_sampling: true
But in the existing code these 2 parameters take up unexpected values
in some scenarios:
be_opts.enable_input_sampling = !smap_get_bool(&be_cfg->other_config,
"enable-input-sampling", false);
be_opts.enable_output_sampling = !smap_get_bool(&be_cfg->other_config,
"enable-output-sampling", false);
Here, the function smap_get_bool is being used with a negation.
This returns expected values for the default case (since the above code
will negate “false” we get from smap_get bool function and return the
value “true”) but unexpected values for the case where the sampling
value is passed through the CLI.
For example, if we pass "true" for other_config:enable-input-sampling
in the CLI, the above code will negate the “true” value we get from
the smap_bool function and return the value “false”. Same would be the
case for enable_output_sampling.
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Sayali Naval <sanaval@cisco.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.
Use the rte_flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. Re-program the RSS redirection table to only
use the other Rx queues.
The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.
This feature must be enabled per port on specific protocols via the
rx-steering option. This option takes "rss" followed by a "+" separated
list of protocol names. It is only supported on ethernet ports. This
feature is experimental.
If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control packets. If the hardware
cannot satisfy the number of requested Rx queues, the last Rx queue will
be assigned for control plane. If only one Rx queue is available, the
rx-steering feature will be disabled. If the hardware does not support
the rte_flow matchers/actions, the rx-steering feature will be
completely disabled on the port and regular rss will be performed
instead.
It cannot be enabled when other-config:hw-offload=true as it may
conflict with the offloaded flows. Similarly, if hw-offload is enabled,
custom rx-steering will be forcibly disabled on all ports and replaced
by regular rss.
Example use:
ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
set interface phy0 type=dpdk options:dpdk-devargs=0000:ca:00.0 -- \
set interface phy0 options:rx-steering=rss+lacp -- \
set interface phy1 type=dpdk options:dpdk-devargs=0000:ca:00.1 -- \
set interface phy1 options:rx-steering=rss+lacp
As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.
To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.
+----------------------+
| DUT |
|+--------------------+|
|| br-int || in_port=patch10,actions=mod_dl_src:$patch11,
|| || mod_dl_dst:$tgen0,
|| || output:patch10
|| || in_port=patch11,actions=mod_dl_src:$patch10
|| || mod_dl_dst:$tgen0,
|| patch10 patch11 || output:patch10
|+---|-----------|----+|
| | | |
|+---|-----------|----+|
|| patch00 patch01 ||
|| tag:10 tag:20 ||
|| ||
|| br-phy || default flow, action=NORMAL
|| ||
|| bond0 || balance-slb, lacp=passive, lacp-time=fast
|| phy0 phy1 ||
|+------|-----|-------+|
+-------|-----|--------+
| |
+-------|-----|--------+
| port0 port1 | balance L3/L4, lacp=active, lacp-time=fast
| lag | mode trunk VLANs 10, 20
| |
| switch |
| |
| vlan 10 vlan 20 | mode access
| port2 port3 |
+-----|----------|-----+
| |
+-----|----------|-----+
| tgen0 tgen1 | Random traffic that is properly balanced
| | across the bond ports in both directions.
| traffic generator |
+----------------------+
Without rx-steering, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.
~# ovs-appctl lacp/show-stats bond0
---- bond0 statistics ----
member: phy0:
TX PDUs: 347246
RX PDUs: 14865
RX Bad PDUs: 0
RX Marker Request PDUs: 0
Link Expired: 168
Link Defaulted: 0
Carrier Status Changed: 0
member: phy1:
TX PDUs: 347245
RX PDUs: 14919
RX Bad PDUs: 0
RX Marker Request PDUs: 0
Link Expired: 147
Link Defaulted: 1
Carrier Status Changed: 0
When rx-steering is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput. Neither
the "Link Expired" nor the "Link Defaulted" counters are incremented
anymore.
This feature may be considered as "QoS". However, it does not work by
limiting the rate of traffic explicitly. It only guarantees that some
protocols have a lower chance of being dropped because the PMD cores
cannot keep up with regular traffic.
The choice of protocols is limited on purpose. This is not meant to be
configurable by users. Some limited configurability could be considered
in the future but it would expose to more potential issues if users are
accidentally redirecting all traffic in the isolated queue.
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
At some point in OVS history, some virtio features were announced as
supported (ECN and UFO virtio features).
The userspace TSO code, which has been added later, does not support
those features and tries to disable them.
This breaks OVS upgrades: if an existing VM already negotiated such
features, their lack on reconnection to an upgraded OVS triggers a
vhost socket disconnection by Qemu.
This results in an endless loop because Qemu then retries with the same
set of virtio features.
This patch proposes to try and detect those vhost socket disconnection
and fallback restoring the old virtio features (and disabling TSO for
this vhost port).
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add a vxlan gbp offload test case:
vxlan offloads with gbp extention - ping between two ports - offloads
enabled ok
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Kernels that do not support vxlan gbp would treat the rule that has vxlan
gbp encap action or vxlan gbp id match differently, either reject it or
just skip the action/match and continue processing the knowing ones.
To solve the issue, probe and disallow inserting rules with vxlan gbp
action/match if kernel does not support it.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Add TC offload support for vxlan encap with gbp option.
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Most of the data members of struct tc_action{ } are defined as anonymous
struct in place. Instead of passing all members of an anonymous struct,
which is not flexible to new members being added, expose encap as named
struct and pass it entirely.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Linux kernel netlink module added NLA_F_NESTED flag checking for nested
netlink messages in 5.2. A nested message without the flag set will be
treated as malformatted one. The check is optional and is controlled by
message policy. To avoid this, add NLA_F_NESTED explicitly for all
nested netlink messages with a new function
nl_msg_start_nested_with_flag().
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Extract vxlan gbp option encoding to odp_encode_gbp_raw to be used in
following commits.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Extract vxlan gbp option decoding to odp_decode_gbp_raw to be used in
following commits.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tc flower tunnel key options were encoded in nl_msg_put_flower_tunnel_opts
and decoded in nl_parse_flower_tunnel_opts. Only geneve was supported.
To avoid adding more arguments to the function to support more vxlan
options in the future, change the function arguments to pass tunnel
entirely to it instead of keep adding new arguments.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Current implementation of meters in the userspace datapath takes
the meter lock for every packet batch. If more than one thread
hits the flow with the same meter, they will lock each other.
Replace the critical section with atomic operations to avoid
interlocking. Meters themselves are RCU-protected, so it's safe
to access them without holding a lock.
Implementation does the following:
1. Tries to advance the 'used' timer of the meter with atomic
compare+exchange if it's smaller than 'now'.
2. If the timer change succeeds, atomically update band buckets.
3. Atomically update packet statistics for a meter.
4. Go over buckets and try to atomically subtract the amount of
packets or bytes, recording the highest exceeded band.
5. Atomically update band statistics and drop packets.
Bucket manipulations are implemented with atomic compare+exchange
operations with extra checks, because bucket size should never
exceed the maximum and it should never go below zero.
Packet statistics may be momentarily inconsistent, i.e., number
of packets and the number of bytes may reflect different sets
of packets. But it should be eventually consistent. And the
difference at any given time should be in just few packets.
For the sake of reduced code complexity PKTPS meter tries to push
packets through the band one by one, even though they all have
the same weight. This is also more fair if more than one thread
is passing packets through the same band at the same time.
Trying to predict the number of packets that can pass may also
cause extra atomic operations reducing the performance.
This implementation shows similar performance to the previous one,
but should scale better with more threads hitting the same meter.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Lin Huang <linhuang@ruijie.com.cn>
Tested-by: Zhang YuHuang <zhangyuhuang@ruijie.com.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The patch introduces a new commands ovs-appctl dpctl/dump-conntrack-exp
that allows to dump the existing expectations for the userspace ct.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
get_log_next_line_num() was defined in alb.at.
As it may be useful in other test files, move to
ofproto-macros.at.
Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As far as I can tell they're used mostly for CI job definitions and
these tend to result in long lines.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405796.html
Suggested-by: Aaron Conole <aconole@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add additional error coverage counters for dpif operation failures.
This could help to quickly identify netlink problems when communicating
with the OVS kernel module.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2070630
Reviewed-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Eelco Chaudron was elected by the Open vSwitch committers yesterday.
This formalises his status as an Open vSwitch committer.
Welcome Eelco!
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>