By default OVS configures 2048 descriptors for tx and rx queues
on DPDK devices. It also allows the user to configure those values.
If the values used are not acceptable to the device then queue setup
would fail.
The device exposes it's max/min/alignment requirements and OVS
applies some limits also. Use these to ensure an acceptable value
is used for the number of descriptors on a device tx/rx.
If the default or user value is not acceptable, adjust to a suitable
value and log.
Reported-at: https://bugzilla.redhat.com/2119876
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There is no need to display 'requested_rx/tx_descriptors' and
'configured_rx/tx_descriptors' as they will be the same.
It is simpler to just have a single 'n_rxq/txq_desc' value.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
rte_pktmbuf_free_bulk() function was introduced in 19.11 and became
stable in 21.11. Use it to free arrays of mbufs instead of freeing
packets one by one.
In simple V2V testing with 64B packets, 2 PMD threads and bidirectional
traffic this change improves performance by 3.5 - 4.5 %.
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As Ilya reported, we have a ABBA deadlock between DPDK vq->access_lock
and OVS dev->mutex when OVS main thread refreshes statistics, while a
vring state change event is being processed for a same vhost port.
To break from this situation, move vring state change notifications
handling from the vhost-events DPDK thread to a dedicated thread
using a lockless queue.
Besides, for the case when a bogus/malicious guest is sending continuous
updates, add a counter of pending updates in the queue and warn if a
threshold of 1000 entries is reached.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401101.html
Fixes: 3b29286db1c5 ("netdev-dpdk: Add per virtqueue statistics.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The vhost library now provides finegrained statistics for guest
notifications:
- notifications for buffer reclaim by the guest,
- notifications for buffer availability to the guest,
Example before this patch:
$ ovs-appctl coverage/show |
grep vhost_notification
vhost_notification 0.0/sec 0.000/sec 2.0283/sec total: 7302
$ ovs-vsctl get interface vhost4 statistics |
sed -e 's#[{}]##g' -e 's#, #\n#g' |
grep guest_notifications
rx_q0_guest_notifications=66
tx_q0_guest_notifications=7236
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The DPDK vhost-user library maintains more granular per queue stats
which can replace what OVS was providing for vhost-user ports.
The benefits for OVS:
- OVS can skip parsing packet sizes on the rx side,
- dev->stats_lock won't be taken in rx/tx code unless some packet is
dropped,
- vhost-user is aware of which packets are transmitted to the guest,
so per *transmitted* packet size stats can be reported,
- more internal stats from vhost-user may be exposed, without OVS
needing to understand them,
Note: the vhost-user library does not provide global stats for a port.
The proposed implementation is to have the global stats (exposed via
netdev_get_stats()) computed by querying and aggregating all per queue
stats.
Since per queue stats are exposed via another netdev ops
(netdev_get_custom_stats()), this may lead to some race and small
discrepancies.
This issue might already affect other netdev classes.
Example:
$ ovs-vsctl get interface vhost4 statistics |
sed -e 's#[{}]##g' -e 's#, #\n#g' |
grep -v =0$
rx_1_to_64_packets=12
rx_256_to_511_packets=15
rx_65_to_127_packets=21
rx_broadcast_packets=15
rx_bytes=7497
rx_multicast_packets=33
rx_packets=48
rx_q0_good_bytes=242
rx_q0_good_packets=3
rx_q0_guest_notifications=3
rx_q0_multicast_packets=3
rx_q0_size_65_127_packets=2
rx_q0_undersize_packets=1
rx_q1_broadcast_packets=15
rx_q1_good_bytes=7255
rx_q1_good_packets=45
rx_q1_guest_notifications=45
rx_q1_multicast_packets=30
rx_q1_size_256_511_packets=15
rx_q1_size_65_127_packets=19
rx_q1_undersize_packets=11
tx_1_to_64_packets=36
tx_256_to_511_packets=45
tx_65_to_127_packets=63
tx_broadcast_packets=45
tx_bytes=22491
tx_multicast_packets=99
tx_packets=144
tx_q0_broadcast_packets=30
tx_q0_good_bytes=14994
tx_q0_good_packets=96
tx_q0_guest_notifications=96
tx_q0_multicast_packets=66
tx_q0_size_256_511_packets=30
tx_q0_size_65_127_packets=42
tx_q0_undersize_packets=24
tx_q1_broadcast_packets=15
tx_q1_good_bytes=7497
tx_q1_good_packets=48
tx_q1_guest_notifications=48
tx_q1_multicast_packets=33
tx_q1_size_256_511_packets=15
tx_q1_size_65_127_packets=21
tx_q1_undersize_packets=12
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit add support to for DPDK v22.11.1, it includes the following
changes.
1. ci: Reduce DPDK compilation time.
2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528
3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332
4. netdev-dpdk: Report device bus specific information.
5. netdev-dpdk: Drop reference to Rx header split.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808
In addition documentation was also updated in this commit for use with
DPDK v22.11.1.
The Debian shared DPDK compilation test is removed as part of this patch
due to a packaging requirement. Once DPDK v22.11.1 is available in Debian
repositories it should be re-enabled in OVS.
For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com>
Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com>
Tested-by: Michael Phelan <michael.phelan@intel.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Propagating per_port_memory value through a DPDK netdev creation gives
the false impression its value is somehow contextual to the creation.
On the contrary, this parameter value is set once and for all at
OVS initialization time.
Simplify the code and directly access the local boolean.
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
vhost related configuration and per port memory are netdev-dpdk
configuration items.
dpdk-stub.c and netdev-dpdk.c are never linked together, so we can move
those bits out of the generic dpdk code.
The dpdk_* accessors for those configuration items are then not needed
anymore and we can simply reference local variables.
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Packets that could not be transmitted because the TXQ are full should be
taken into account in the global ovs_tx_failure_drops as it was the case
before commit 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit
path.").
netdev_dpdk_eth_tx_burst() returns the number of packets that were *not*
transmitted. Add that number to stats.tx_failure_drops and only include
the packets that were dropped in previous steps afterwards.
Fixes: 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit path.")
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Since DPDK 21.05, the representor identifier now handles a relative VF
offset. The legacy representor ID seems only valid in certain cases
(first dpdk port).
Link: https://github.com/DPDK/dpdk/commit/cebf7f17159a8
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Mempools may currently be shared between DPDK ports based
on port MTU and NUMA. With some hint from the user we can
increase the sharing on MTU and hence reduce memory
consumption in many cases.
For example, a port with MTU 9000, uses a mempool with an
mbuf size based on 9000 MTU. A port with MTU 1500, uses a
different mempool with an mbuf size based on 1500 MTU.
In this case, assuming same NUMA, both these ports could
share the 9000 MTU mempool.
The user must give a hint as order of creation of ports and
setting of MTUs may vary and we need to ensure that upgrades
from older OVS versions do not require more memory.
This scheme can also prevent multiple mempools being created
for cases where a port is added picking up a default MTU and
an appropriate mempool, but later has it's MTU changed to a
different value requiring a different mempool.
Example usage:
$ ovs-vsctl --no-wait set Open_vSwitch . \
other_config:shared-mempool-config=9000,1500:1,6000:1
Port added on NUMA 0:
* MTU 1500, use mempool based on 9000 MTU
* MTU 5000, use mempool based on 9000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
Port added on NUMA 1:
* MTU 1500, use mempool based on 1500 MTU
* MTU 5000, use mempool based on 6000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
Default behaviour is unchanged and mempools are still only created
when needed.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Currently mempools for vhost are being assigned before the vhost device
is added. In some cases this may be just reusing an existing mempool but
in others it can require creation of a mempool.
For multi-NUMA, the NUMA info of the vhost port is not known until a
device is added to the port, so on multi-NUMA systems the initial NUMA
node for the mempool is a best guess based on vswitchd affinity.
When a device is added to the vhost port, the NUMA info can be checked
and if the guess was incorrect a mempool on the correct NUMA node
created.
For multi-NUMA, the current scheme can have the effect of creating a
mempool on a NUMA node that will not be needed and at least for a certain
time period requires more memory on a NUMA node.
It is also difficult for a user trying to provision memory on different
NUMA nodes, if they are not sure which NUMA node the initial mempool
for a vhost port will be on.
For single NUMA, even though the mempool will be on the correct NUMA, it
is assigned ahead of time and if a vhost device was not added, it could
also be using uneeded memory.
This patch delays the creation of the mempool for a vhost port until the
vhost device is added.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Dropped packets were not counted as tx_dropped when a DPDK netdev is
down (like after calling netdev-dpdk/set-admin-state dpdk1 down).
Fixes: 3b1fb0779b87 ("netdev-dpdk: Don't call rte_dev_stop() in update_flags().")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A lock annotation was left behind after removing the nonpmd mutex.
Remove it.
Fixes: 1166b0d82043 ("netdev-dpdk: Remove useless nonpmd_mempool_mutex.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch split out the common code between vhost and
dpdk transmit paths to shared functions to simplify the
code and fix an issue.
The issue is that the packet coming from non-DPDK device
and egressing on a DPDK device currently skips the hwol
preparation.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Co-authored-by: Mike Pattrick <mkp@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.
In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The rte_mempool_avail_count() and rte_mempool_in_use_count() DPDK API
can tell us the usage of the mempool. It could be helpful for debug
on any memleak in the mempool.
Add a line in the cmd's output:
count: avail (118988), in use (12084)
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Wan Junjie <wanjunjie@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch updates OVS to use DPDK RTE_ETH namespaces.
DPDK commit 295968d17407 ("ethdev: add namespace") [0] added RTE_ETH
namespaces for ethdev enums and macros in DPDK 21.11.
As compatibility for the older names was kept and they were not officially
deprecated in DPDK 21.11, there was no impact to OVS and OVS did not have
to be updated.
In future DPDK releases the older names will be deprecated and that will
cause build warnings for OVS. They may also be removed from DPDK at some
point.
There is no immediate need to update OVS to use the new namespaces while
DPDK 21.11 is being used but at the same time there is no need to wait
until it becomes an issue either. So might as well align with the
updated names in DPDK 21.11.
[0] http://git.dpdk.org/dpdk/commit/?id=295968d1740760337e16b0d7914875c5cac52850
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The rte_flow DPDK API was made thread-safe [1] in release 20.11.
Now that the DPDK offload provider in OVS is thread safe, remove the
locks.
[1]: http://mails.dpdk.org/archives/dev/2020-October/184251.html
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When troubleshooting multiqueue setups, having per queue statistics helps
checking packets repartition in rx and tx queues.
Per queue statistics are exported by most DPDK drivers (with capability
RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS).
OVS only filters DPDK statistics, there is nothing to request in DPDK API.
So the only change is to extend the filter on xstats.
Querying statistics with
$ ovs-vsctl get interface dpdk0 statistics |
sed -e 's#[{}]##g' -e 's#, #\n#g'
and comparing gives:
@@ -13,7 +13,12 @@
rx_phy_crc_errors=0
rx_phy_in_range_len_errors=0
rx_phy_symbol_errors=0
+rx_q0_bytes=0
rx_q0_errors=0
+rx_q0_packets=0
+rx_q1_bytes=0
rx_q1_errors=0
+rx_q1_packets=0
rx_wqe_errors=0
tx_broadcast_packets=0
tx_bytes=0
@@ -27,3 +32,13 @@
tx_pp_rearm_queue_errors=0
tx_pp_timestamp_future_errors=0
tx_pp_timestamp_past_errors=0
+tx_q0_bytes=0
+tx_q0_packets=0
+tx_q1_bytes=0
+tx_q1_packets=0
+tx_q2_bytes=0
+tx_q2_packets=0
+tx_q3_bytes=0
+tx_q3_packets=0
+tx_q4_bytes=0
+tx_q4_packets=0
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When changing number of Rx or Tx queues, per queue basic stats can be
renumbered in DPDK ethdev layer [1].
OVS maintains an internal xstats IDs cache that was refreshed when a
cached id was not valid anymore (in netdev_dpdk_get_custom_stats) or if
a new DPDK port was created.
This did not handle changes of Rx/Tx queues count.
For example, with a mlx5 port:
$ ovs-vsctl set interface dpdk0 options:n_rxq=2
$ ovs-vsctl get interface dpdk0 statistics |
sed -e 's#[{}]##g' -e 's#, #\n#g' |
grep rx_q._errors
rx_q0_errors=0
Move the cache filling after reconfiguring and starting the port.
There is no need to flush the cache in netdev_dpdk_get_custom_stats.
While at it, the xstats code can be cleaned up:
- remove wrong or Lapalissade comments,
- don't check x*alloc return value,
- expect that consecutive calls to xstats API return the same number of
elements,
- only write to dev-> when all DPDK calls succeeded,
- add missing lock annotations to netdev_dpdk_clear_xstats and
netdev_dpdk_get_xstat_name,
1: https://git.dpdk.org/dpdk/tree/lib/librte_ethdev/rte_ethdev.c?h=v20.11#n2696
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389456.html
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add the acceptance of GRE devices to netdev_dpdk_flow_api_supported() API,
to allow offloading of DPDK GRE devices.
Signed-off-by: Nir Anteby <nanteby@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This commit adds support for DPDK v21.11, it includes the following
changes.
1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*
5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*
In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.
For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
In the future, virtio may support RSS.
In any case, it is safer to rely on exposed capabilities rather than
matching on driver names.
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OVS has support for using policing to enforce a rate limit in
kilobits per second. This is configured using OVSDB. f.e.
$ ovs-vsctl set interface tap0 ingress_policing_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_burst=100
This patch adds a related feature, allowing policing to enforce a rate
limit in kilo-packets per second. This is also configured using OVSDB.
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100
The kilo-bit and kilo-packet rate limits may be used separately or in
combination.
Add separate action for BPS and PPS in netlink message.
Revise code and change action result to pipe to allow
traffic pipe into second action.
This patch implements the feature for:
* OVSDB (northbound API)
* TC policer when used both with and without TC offload (kernel API)
Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add the acceptance of vxlan devices to netdev_dpdk_flow_api_supported()
API, to allow offloading of DPDK vxlan devices.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As a pre-step towards tunnel offloads, introduce DPDK APIs.
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
shinfo is used to store reference counter and free callback
of an external buffer, but it is stored in mbuf if the mbuf
has tailroom for it.
This is wrong because the mbuf (and its data) can be freed
before the external buffer, for example:
pkt2 = rte_pktmbuf_alloc(mp);
rte_pktmbuf_attach(pkt2, pkt);
rte_pktmbuf_free(pkt);
After this, pkt is freed, but it still contains shinfo, which
is referenced by pkt2.
This sequence of operations is possible inside DPDK e.g., while
performing TSO operations for 'net_tap' PMD.
Fix this by always storing shinfo at the tail of external buffer.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Co-authored-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
In some cloud topologies, using DPDK VF representors in guest requires
configuring a VF before it is assigned to the guest.
A first basic option for such configuration is setting the VF MAC
address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
table.
This option can be used as such:
$ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
options:dpdk-vf-mac=00:11:22:33:44:55
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
It is possible to set the MAC address of DPDK ports by calling
rte_eth_dev_default_mac_addr_set(). OvS does not actually call
this function for non-internal ports, but the implementation is
exposed to be used in a later commit.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Support for vhost-user dequeue zero-copy was deprecated in OVS 2.14 with
the aim of removing it for OVS 2.15.
OVS only supports zero copy for vhost client mode, as such it will cease
to function due to DPDK commit [1]
Also DPDK is set to remove zero-copy functionality in DPDK 20.11 as
referenced by commit [2]
As such remove support from OVS.
[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client mode")
[2] d21003c9dafa ("doc: announce removal of vhost zero-copy dequeue")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Since DPDK 19.11 [1], it is not allowed to set any RX mq mode for virtio
driver.
[1] 13b3137f3b
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Dequeue zero-copy is no longer supported for vhost-user client mode
in DPDK due to commit [1].
In addition to this, zero-copy mode has been proposed to be marked
deprecated in [2] with removal in the next DPDK LTS release.
This commit deprecates support for vhost-user dequeue zero-copy in OVS
with its removal expected in the next OVS release.
[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client
mode")
[2] http://mails.dpdk.org/archives/dev/2020-August/177236.html
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
As of DPDK 19.11, in order to use dequeue-zero-copy in DPDK Vhost library,
the application has to disable the linear buffer option. Hence
dequeue-zero-copy is not supported for vhost application that requires
linear buffers.
An alternative DPDK based approach to disable the linear buffers within
the vhost library itself was proposed in [1], however the consensus was
that application should be responsible for disabling linear buffers.
As such this patch disables linear buffers when zero-copy is enabled.
[1] https://patches.dpdk.org/patch/67200/
Fixes: 127b6a6eea02 ("dpdk: Update to use DPDK 19.11.")
Signed-off-by: Sivaprasad Tummala <Sivaprasad.Tummala@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
'dpdkr' ring ports was deprecated in 2.13 release and was not
actually used for a long time. Remove support now.
More details in
commit b4c5f00c339b ("netdev-dpdk: Deprecate ring ports.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ideally SCTP checksum offload needs be advertised by the
NIC when userspace TSO is enabled. However, very few drivers
do that and it's not a widely used protocol. So, this patch
enables SCTP checksum offload if available, otherwise userspace
TSO can still be enabled but SCTP packets will be dropped on
NICs without support.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Virtio doesn't expose flags to control which protocols checksum
offload needs to be enabled or disabled. This patch checks if the
NIC supports UDP checksum offload and active it when TSO is enabled.
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Disable ECN and UFO since this is not supported yet. Also, disable
all other features when userspace_tso is not enabled.
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The check on TSO capability did not ensure ip checksum, tcp checksum and
TSO tx offloads were available which resulted in a port init failure
(example below with a ena device):
*2020-02-04T17:42:52.976Z|00084|dpdk|ERR|Ethdev port_id=0 requested Tx
offloads 0x2a doesn't match Tx offloads capabilities 0xe in
rte_eth_dev_configure()*
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.
A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.
It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.
If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.
It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
There is no support for multi-segmented buffers, so flag
that to vhost library.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Add a getter function for using the dpdk port id outside the scope of
netdev-dpdk.c to be used for HW offload.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Introduce a rte flow query function as a pre-step towards reading HW
statistics of fully offloaded flows.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.
This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.
For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.
ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
--id=@myqos create qos type=trtcm-policer \
other-config:cir=52000 other-config:cbs=2048 \
other-config:eir=52000 other-config:ebs=2048 \
queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
--id=@dpdk1Q10 create queue \
other-config:cir=41600000 other-config:cbs=2048 \
other-config:eir=0 other-config:ebs=0 -- \
--id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
other-config:eir=41600000 other-config:ebs=2048 \
This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This patch adds support for multi-queue QoS to the DPDK datapath. Most of
the code is based on an earlier patch from a patchset sent out by
zhaozhanxu. The patch was titled "[ovs-dev, v2, 1/4] netdev-dpdk.c: Support
the multi-queue QoS configuration for dpdk datapath"
Signed-off-by: zhaozhanxu <zhaozhanxu@163.com>
Co-authored-by: zhaozhanxu <zhaozhanxu@163.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
In "Use of library functions" in the C standard, the following statement
is written to apply to all library functions:
If an argument to a function has an invalid value (such as ... a
null pointer ... the behavior is undefined.
Later, under the "String handling" section, "Comparison functions" no
exception is listed for strcmp, which means NULL is invalid. It may
be possible for the smap_get to return NULL.
Given the above, we must check that new_devargs is not null. The check
against NULL for new_devargs later in the function is still valid.
Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>