When doing performance testing with OVS v3.1 we ran into a deadlock
situation with the netdev_hmap_rwlock read/write lock. After some
debugging, it was discovered that the netdev_hmap_rwlock read lock
was taken recursively. And well in the following sequence of events:
netdev_ports_flow_get()
It takes the read lock, while it walks all the ports
in the port_to_netdev hmap and calls:
- netdev_flow_get() which will call:
- netdev_tc_flow_get() which will call:
- netdev_ifindex_to_odp_port()
This function also takes the same read lock to
walk the ifindex_to_port hmap.
In OVS a read/write lock does not support recursive readers. For details
see the comments in ovs-thread.h. If you do this, it will lock up,
mainly due to OVS setting the PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP
attribute to the lock.
The solution with this patch is to use two separate read/write
locks, with an order guarantee to avoid another potential deadlock.
Fixes: 9fe21a4fc1 ("netdev-offload: replace netdev_hmap_mutex to netdev_hmap_rwlock")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2182541
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Cited commit introduced a flag in dpif-netdev level, to optimize
performance and avoid hw_miss_packet_recover() for devices with no such
support.
However, there is a race condition between traffic processing and
assigning a 'flow_api' object to the netdev. In such case, EOPNOTSUPP is
returned by netdev_hw_miss_packet_recover() in netdev-offload.c layer
because 'flow_api' is not yet initialized. As a result, the flag is
falsely disabled, and subsequent packets won't be recovered, though they
should.
In order to fix it, move the flag to be in netdev-offload layer, to
avoid that race.
Fixes: 6e50c16518 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices with no support.")
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add API to offload meter to HW, and the corresponding functions to call
the meter callbacks from all the registered flow API providers.
The interfaces are like those related to meter in dpif_class, in order
to pass necessary info to HW.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Expose functions reporting user configuration of offloading threads, as
well as utility functions for multithreading.
This will only expose the configuration knob to the user, while no
datapath will implement the multiple thread request.
This will allow implementations to use this API for offload thread
management in relevant layers before enabling the actual dataplane
implementation.
The offload thread ID is lazily allocated and can as such be in a
different order than the offload thread start sequence.
The RCU thread will sometime access hardware-offload objects from
a provider for reclamation purposes. In such case, it will get
a default offload thread ID of 0. Care must be taken that using
this thread ID is safe concurrently with the offload threads.
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add a new operation for flow API providers to
uninitialize when the API is disassociated from a netdev.
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
netdev_set_dpif_type() can only be used with a normalized dpif type
as an argument, which is a constant static string derived from a type
of a dpif_class or a constant string "system". Usage of a same
constant string allows netdev-offload module to compare types by
simply comparing pointers.
OTOH, 'br->ofproto->type' is a dynamic string that:
a. Can be NULL.
b. Even if not NULL and equal, can be a different dynamically
allocated string.
Both these qualities breaks assumptions made by all other modules
related to HW offload, breaking the functionality.
Fix that by moving netdev_set_dpif_type() to dpif.c and calling with
a correct constant string as an argument.
The call moved from bridge.c to dpif.c, because we need to have access
to the dpif class, but bridge.c should not.
Not trying to set the dpif_type inside the netdev_ports_insert(),
because it's used now outside the offloading context. So, it's
cleaner to move the netdev_set_dpif_type() call outside of the
netdev-offload module.
Additionally removed the redundant call from the netdev_ports_insert()
and refactored the function, since it doesn't need an extra argument
anymore.
Fixes: 4f19a78a61 ("netdev-vport: Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.")
Reported-by: Roi Dayan <roid@nvidia.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-December/390117.html
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Lin Huang <linhuang@ruijie.com.cn>
Acked-by: Roi Dayan <roid@nvidia.com>
When the HW offload involves multiple flows, like in tunnel decap path,
it is possible that not all flows in the path are offloaded, resulting
in partial processing in HW. In order to proceed with rest of the
processing in SW, the packet state has to be recovered as if it was
processed in SW from the beginning. In the case of tunnel decap,
potential state to recover could be the outer tunneling layer to
metadata.
Add an API for that.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The n_offloaded_flows counter is saved in dpif, and this is the first
one when ofproto is created. When flow operation is done by ovs-appctl
commands, such as, dpctl/add-flow, a new dpif is opened, and the
n_offloaded_flows in it can't be used. So, instead of using counter,
the number of offloaded flows is queried from each netdev, then sum
them up. To achieve this, a new API is added in netdev_flow_api to get
how many flows assigned to a netdev.
In order to get better performance, this number is calculated directly
from tc_to_ufid hmap for netdev-offload-tc, because flow dumping by tc
takes much time if there are many flows offloaded.
Fixes: af06184705 ("dpif-netlink: Count the number of offloaded rules")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There is no real difference between the 'class' and 'type' in the
context of common lookup operations inside netdev-offload module
because it only checks the value of pointers without using the
value itself. However, 'type' has some meaning and can be used by
offload provides on the initialization phase to check if this type
of Flow API in pair with the netdev type could be used in particular
datapath type. For example, this is needed to check if Linux flow
API could be used for current tunneling vport because it could be
used only if tunneling vport belongs to system datapath, i.e. has
backing linux interface.
This is needed to unblock tunneling offloads in userspace datapath
with DPDK flow API.
Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In order to improve revalidator performance by minimizing unnecessary
copying of data, extend netdev-offloads to support terse dump mode. Extend
netdev_flow_api->flow_dump_create() with 'terse' bool argument. Implement
support for terse dump in functions that convert netlink to flower and
flower to match. Set flow stats "used" value based on difference in number
of flow packets because lastuse timestamp is not included in TC terse dump.
Kernel API support is implemented in following patch.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
All the kmap lookup operations netdev_ports_flow_del, netdev_ports_get
netdev_ifindex_to_odp_port should protected by rdlock without
affect each other in the handlers and revalidators
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.
Flow API providers further encapsulated.
The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>