2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

373 Commits

Author SHA1 Message Date
Adrian Moreno
9e56549c2b hmap: use short version of safe loops if possible.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Adrian Moreno
e9bf5bffb0 list: use short version of safe loops if possible.
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Wan Junjie
9016592ca0 netdev-dpdk: Add mempool count in cmd get-mempool-info.
The rte_mempool_avail_count() and rte_mempool_in_use_count() DPDK API
can tell us the usage of the mempool.  It could be helpful for debug
on any memleak in the mempool.

Add a line in the cmd's output:
     count: avail (118988), in use (12084)

Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Wan Junjie <wanjunjie@bytedance.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 19:58:15 +01:00
Kevin Traynor
ab4d3bfbef netdev-dpdk: Update to use RTE_ETH namespace defines.
This patch updates OVS to use DPDK RTE_ETH namespaces.

DPDK commit 295968d17407 ("ethdev: add namespace") [0] added RTE_ETH
namespaces for ethdev enums and macros in DPDK 21.11.

As compatibility for the older names was kept and they were not officially
deprecated in DPDK 21.11, there was no impact to OVS and OVS did not have
to be updated.

In future DPDK releases the older names will be deprecated and that will
cause build warnings for OVS. They may also be removed from DPDK at some
point.

There is no immediate need to update OVS to use the new namespaces while
DPDK 21.11 is being used but at the same time there is no need to wait
until it becomes an issue either. So might as well align with the
updated names in DPDK 21.11.

[0] http://git.dpdk.org/dpdk/commit/?id=295968d1740760337e16b0d7914875c5cac52850

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-02-08 11:59:29 +00:00
Gaetan Rivet
f20abde5a2 netdev-dpdk: Remove rte-flow API access locks.
The rte_flow DPDK API was made thread-safe [1] in release 20.11.
Now that the DPDK offload provider in OVS is thread safe, remove the
locks.

[1]: http://mails.dpdk.org/archives/dev/2020-October/184251.html

Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-19 01:35:19 +01:00
David Marchand
1140c87e2e netdev-dpdk: Expose per rxq/txq basic statistics.
When troubleshooting multiqueue setups, having per queue statistics helps
checking packets repartition in rx and tx queues.

Per queue statistics are exported by most DPDK drivers (with capability
RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS).
OVS only filters DPDK statistics, there is nothing to request in DPDK API.
So the only change is to extend the filter on xstats.

Querying statistics with
$ ovs-vsctl get interface dpdk0 statistics |
  sed -e 's#[{}]##g' -e 's#, #\n#g'

and comparing gives:
@@ -13,7 +13,12 @@
 rx_phy_crc_errors=0
 rx_phy_in_range_len_errors=0
 rx_phy_symbol_errors=0
+rx_q0_bytes=0
 rx_q0_errors=0
+rx_q0_packets=0
+rx_q1_bytes=0
 rx_q1_errors=0
+rx_q1_packets=0
 rx_wqe_errors=0
 tx_broadcast_packets=0
 tx_bytes=0
@@ -27,3 +32,13 @@
 tx_pp_rearm_queue_errors=0
 tx_pp_timestamp_future_errors=0
 tx_pp_timestamp_past_errors=0
+tx_q0_bytes=0
+tx_q0_packets=0
+tx_q1_bytes=0
+tx_q1_packets=0
+tx_q2_bytes=0
+tx_q2_packets=0
+tx_q3_bytes=0
+tx_q3_packets=0
+tx_q4_bytes=0
+tx_q4_packets=0

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-12 11:40:16 +01:00
David Marchand
f260db1efc netdev-dpdk: Fix statistics when changing Rx/Tx queues count.
When changing number of Rx or Tx queues, per queue basic stats can be
renumbered in DPDK ethdev layer [1].

OVS maintains an internal xstats IDs cache that was refreshed when a
cached id was not valid anymore (in netdev_dpdk_get_custom_stats) or if
a new DPDK port was created.
This did not handle changes of Rx/Tx queues count.

For example, with a mlx5 port:
$ ovs-vsctl set interface dpdk0 options:n_rxq=2
$ ovs-vsctl get interface dpdk0 statistics |
  sed -e 's#[{}]##g' -e 's#, #\n#g' |
  grep rx_q._errors
rx_q0_errors=0

Move the cache filling after reconfiguring and starting the port.
There is no need to flush the cache in netdev_dpdk_get_custom_stats.

While at it, the xstats code can be cleaned up:
- remove wrong or Lapalissade comments,
- don't check x*alloc return value,
- expect that consecutive calls to xstats API return the same number of
  elements,
- only write to dev-> when all DPDK calls succeeded,
- add missing lock annotations to netdev_dpdk_clear_xstats and
  netdev_dpdk_get_xstat_name,

1: https://git.dpdk.org/dpdk/tree/lib/librte_ethdev/rte_ethdev.c?h=v20.11#n2696

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389456.html
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-12 11:40:16 +01:00
Nir Anteby
a32cb78b5a netdev-dpdk: Add flow_api support for netdev gre vports.
Add the acceptance of GRE devices to netdev_dpdk_flow_api_supported() API,
to allow offloading of DPDK GRE devices.

Signed-off-by: Nir Anteby <nanteby@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-12-16 12:03:02 +01:00
Ian Stokes
17346b3899 dpdk: Update to use DPDK v21.11.
This commit adds support for DPDK v21.11, it includes the following
changes.

1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*

5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*

In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.

For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-12-09 18:40:14 +00:00
David Marchand
5e86db383f netdev-dpdk: Fix RSS configuration for virtio.
In the future, virtio may support RSS.
In any case, it is safer to rely on exposed capabilities rather than
matching on driver names.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Michael Santana <msantana@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-16 01:03:28 +02:00
Yong Xu
c2567e533f add port-based ingress policing based packet-per-second rate-limiting
OVS has support for using policing to enforce a rate limit in
kilobits per second. This is configured using OVSDB. f.e.

$ ovs-vsctl set interface tap0 ingress_policing_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_burst=100

This patch adds a related feature, allowing policing to enforce a rate
limit in kilo-packets per second. This is also configured using OVSDB.

$ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100

The kilo-bit and kilo-packet rate limits may be used separately or in
combination.

Add separate action for BPS and PPS in netlink message.

Revise code and change action result to pipe to allow
traffic pipe into second action.

This patch implements the feature for:
* OVSDB (northbound API)
* TC policer when used both with and without TC offload (kernel API)

Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-07-01 20:44:07 +02:00
Eli Britstein
c5b56f0eb1 netdev-dpdk: Add flow_api support for netdev vxlan vports.
Add the acceptance of vxlan devices to netdev_dpdk_flow_api_supported()
API, to allow offloading of DPDK vxlan devices.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-24 22:22:08 +02:00
Eli Britstein
6f50f28b99 netdev-dpdk: Introduce DPDK tunnel APIs.
As a pre-step towards tunnel offloads, introduce DPDK APIs.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-24 22:22:08 +02:00
Yi Yang
c17f32a11c netdev-dpdk: Fix incorrect shinfo initialization.
shinfo is used to store reference counter and free callback
of an external buffer, but it is stored in mbuf if the mbuf
has tailroom for it.

This is wrong because the mbuf (and its data) can be freed
before the external buffer, for example:

  pkt2 = rte_pktmbuf_alloc(mp);
  rte_pktmbuf_attach(pkt2, pkt);
  rte_pktmbuf_free(pkt);

After this, pkt is freed, but it still contains shinfo, which
is referenced by pkt2.

This sequence of operations is possible inside DPDK e.g., while
performing TSO operations for 'net_tap' PMD.

Fix this by always storing shinfo at the tail of external buffer.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Co-authored-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Yi Yang <yangyi01@inspur.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-02-01 19:53:03 +01:00
Ian Stokes
252e1e5764 dpdk: Update to use DPDK v20.11.
This commit adds support for DPDK v20.11, it includes the following
changes.

1. travis: Remove explicit DPDK kmods configuration.
2. sparse: Fix build with 20.05 DPDK tracepoints.
3. netdev-dpdk: Remove experimental API flag.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=173216&state=*

4. sparse: Update to DPDK 20.05 trace point header.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=179604&state=*

5. sparse: Fix build with DPDK 20.08.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=200181&state=*

6. build: Add support for DPDK meson build.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=199138&state=*

7. netdev-dpdk: Remove usage of RTE_ETH_DEV_CLOSE_REMOVE flag.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=207850&state=*

8. netdev-dpdk: Fix build with 20.11-rc1.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=209006&state=*

9. sparse: Fix __ATOMIC_* redefinition errors

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=209452&state=*

10. build: Remove DPDK make build references.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=216682&state=*

For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com>
Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Co-authored-by: Eli Britstein <elibr@nvidia.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com>
Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-12-16 17:44:06 +00:00
Gaetan Rivet
f4336f504b netdev-dpdk: Add option to configure VF MAC address.
In some cloud topologies, using DPDK VF representors in guest requires
configuring a VF before it is assigned to the guest.

A first basic option for such configuration is setting the VF MAC
address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
table.

This option can be used as such:

   $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
      options:dpdk-vf-mac=00:11:22:33:44:55

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-11-16 17:47:11 +01:00
Ilya Maximets
f9b0107dd0 netdev-dpdk: Add ability to set MAC address.
It is possible to set the MAC address of DPDK ports by calling
rte_eth_dev_default_mac_addr_set().  OvS does not actually call
this function for non-internal ports, but the implementation is
exposed to be used in a later commit.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Gaetan Rivet <grive@u256.net>
2020-11-16 17:47:11 +01:00
Ian Stokes
86f624e486 DPDK: Remove support for vhost-user zero-copy.
Support for vhost-user dequeue zero-copy was deprecated in OVS 2.14 with
the aim of removing it for OVS 2.15.

OVS only supports zero copy for vhost client mode, as such it will cease
to function due to DPDK commit [1]

Also DPDK is set to remove zero-copy functionality in DPDK 20.11 as
referenced by commit [2]

As such remove support from OVS.

[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client mode")
[2] d21003c9dafa ("doc: announce removal of vhost zero-copy dequeue")

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2020-10-05 16:05:25 +01:00
Jaime Caamaño Ruiz
db7041716b netdev-dpdk: Don't set rx mq mode for net_virtio.
Since DPDK 19.11 [1], it is not allowed to set any RX mq mode for virtio
driver.

[1] 13b3137f3b

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@suse.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-09-15 22:49:57 +02:00
Ian Stokes
e919fd4955 dpdk: Deprecate vhost-user dequeue zero-copy.
Dequeue zero-copy is no longer supported for vhost-user client mode
in DPDK due to commit [1].

In addition to this, zero-copy mode has been proposed to be marked
deprecated in [2] with removal in the next DPDK LTS release.

This commit deprecates support for vhost-user dequeue zero-copy in OVS
with its removal expected in the next OVS release.

[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client
    mode")
[2] http://mails.dpdk.org/archives/dev/2020-August/177236.html

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
2020-08-12 18:20:50 +01:00
Sivaprasad Tummala
71417ef011 netdev-dpdk: linear buffer check with zero-copy
As of DPDK 19.11, in order to use dequeue-zero-copy in DPDK Vhost library,
the application has to disable the linear buffer option. Hence
dequeue-zero-copy is not supported for vhost application that requires
linear buffers.

An alternative DPDK based approach to disable the linear buffers within
the vhost library itself was proposed in [1], however the consensus was
that application should be responsible for disabling linear buffers.

As such this patch disables linear buffers when zero-copy is enabled.

[1]    https://patches.dpdk.org/patch/67200/

Fixes: 127b6a6eea02 ("dpdk: Update to use DPDK 19.11.")
Signed-off-by: Sivaprasad Tummala <Sivaprasad.Tummala@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-08-12 18:19:42 +01:00
Ilya Maximets
82c9d9993d netdev-dpdk: Remove deprecated ring port type.
'dpdkr' ring ports was deprecated in 2.13 release and was not
actually used for a long time.  Remove support now.

More details in
commit b4c5f00c339b ("netdev-dpdk: Deprecate ring ports.")

Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-03-06 12:41:04 +01:00
Flavio Leitner
35b5586ba7 userspace TSO: SCTP checksum offload optional.
Ideally SCTP checksum offload needs be advertised by the
NIC when userspace TSO is enabled. However, very few drivers
do that and it's not a widely used protocol. So, this patch
enables SCTP checksum offload if available, otherwise userspace
TSO can still be enabled but SCTP packets will be dropped on
NICs without support.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
8c5163fe81 userspace TSO: Include UDP checksum offload.
Virtio doesn't expose flags to control which protocols checksum
offload needs to be enabled or disabled. This patch checks if the
NIC supports UDP checksum offload and active it when TSO is enabled.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
514950d37d netdev-dpdk: vhost: disable unsupported offload features.
Disable ECN and UFO since this is not supported yet. Also, disable
all other features when userspace_tso is not enabled.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Ilya Maximets
1223cf123e netdev-dpdk: Don't enable offloading on HW device if not requested.
DPDK drivers has different implementations of transmit functions.
Enabled offloading may cause driver to choose slower variant
significantly affecting performance if userspace TSO wasn't requested.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-by: David Marchand <david.marchand@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-07 09:04:06 +01:00
David Marchand
3d6a6f450a netdev-dpdk: Fix port init when lacking Tx offloads for TSO.
The check on TSO capability did not ensure ip checksum, tcp checksum and
TSO tx offloads were available which resulted in a port init failure
(example below with a ena device):

*2020-02-04T17:42:52.976Z|00084|dpdk|ERR|Ethdev port_id=0 requested Tx
offloads 0x2a doesn't match Tx offloads capabilities 0xe in
rte_eth_dev_configure()*

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")

Reported-by: Ravi Kerur <rkerur@gmail.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-05 17:44:36 +01:00
Flavio Leitner
29cf9c1b3b userspace: Add TCP Segmentation Offload support
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 22:27:25 +00:00
Flavio Leitner
e666e8e0f3 vhost: Disable multi-segmented buffers
There is no support for multi-segmented buffers, so flag
that to vhost library.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 17:14:10 +00:00
Eli Britstein
2f7f9284bd netdev-dpdk: Getter function for dpdk port id API.
Add a getter function for using the dpdk port id outside the scope of
netdev-dpdk.c to be used for HW offload.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-16 13:34:10 +01:00
Eli Britstein
63556d8586 netdev-dpdk: Introduce rte flow query function.
Introduce a rte flow query function as a pre-step towards reading HW
statistics of fully offloaded flows.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-16 13:34:10 +01:00
Eelco Chaudron
e61bdffc2a netdev-dpdk: Add new DPDK RFC 4115 egress policer
This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.

This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.

For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.

ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
  --id=@myqos create qos type=trtcm-policer \
  other-config:cir=52000 other-config:cbs=2048 \
  other-config:eir=52000 other-config:ebs=2048  \
  queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
  --id=@dpdk1Q10 create queue \
    other-config:cir=41600000 other-config:cbs=2048 \
    other-config:eir=0 other-config:ebs=0 -- \
  --id=@dpdk1Q20 create queue \
    other-config:cir=0 other-config:cbs=0 \
    other-config:eir=41600000 other-config:ebs=2048 \

This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-15 19:17:55 +00:00
Eelco Chaudron
23c01b196f netdev-dpdk: Add support for multi-queue QoS to the DPDK datapath
This patch adds support for multi-queue QoS to the DPDK datapath. Most of
the code is based on an earlier patch from a patchset sent out by
zhaozhanxu. The patch was titled "[ovs-dev, v2, 1/4] netdev-dpdk.c: Support
the multi-queue QoS configuration for dpdk datapath"

Signed-off-by: zhaozhanxu <zhaozhanxu@163.com>
Co-authored-by: zhaozhanxu <zhaozhanxu@163.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-15 19:17:04 +00:00
Aaron Conole
cefdd80a29 netdev-dpdk: Avoid undefined behavior processing devargs
In "Use of library functions" in the C standard, the following statement
is written to apply to all library functions:

  If an argument to a function has an invalid value (such as ... a
  null pointer ... the behavior is undefined.

Later, under the "String handling" section, "Comparison functions" no
exception is listed for strcmp, which means NULL is invalid.  It may
be possible for the smap_get to return NULL.

Given the above, we must check that new_devargs is not null.  The check
against NULL for new_devargs later in the function is still valid.

Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-13 16:35:21 +00:00
Eelco Chaudron
3d56e4ac44 netdev-dpdk: Add coverage counter to count vhost IRQs.
When the dpdk vhost library executes an eventfd_write() call,
i.e. waking up the guest, a new callback will be called.

This patch adds the callback to count the number of
interrupts sent to the VM to track the number of times
interrupts where generated.

This might be of interest to find out system-calls were
called in the DPDK fast path.

The coverage counter is called "vhost_notification" and
can be read with:

  $ ovs-appctl coverage/read-counter vhost_notification
  13238319

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-12-19 01:03:52 +01:00
Kevin Traynor
6d77abf4f7 netdev-dpdk: Fix sw stats perf drop.
Accessing the sw stats in the vhost datapath of a PVP test
can incur a performance drop of ~2%.

Most of the time these stats will just be getting zero added
to them. By checking if there is a non-zero update first, we
can avoid accessing them when they won't be updated and avoid
the performance drop.

Fixes: 2f862c712e52 ("netdev-dpdk: Detailed packet drop statistics.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-12-18 13:06:27 +01:00
Eelco Chaudron
988fd46391 netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event.
Currently, OVS does not register and therefore not handle the
interface reset event from the DPDK framework. This would cause a
problem in cases where a VF is used as an interface, and its
configuration changes.

As an example in the following scenario the MAC change is not
detected/acted upon until OVS is restarted without the patch applied:

  $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
  $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
            set Interface dpdk0 type=dpdk -- \
            set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0

  $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-12-18 01:23:55 +01:00
Ian Stokes
127b6a6eea dpdk: Update to use DPDK 19.11.
This commit adds support for DPDK v19.11, it includes the following
changes.

1. travis: Enable compilation and linkage with dpdk 19.11.

2. sparse: Remove dpdk network headers copies.

   https://patchwork.ozlabs.org/patch/1185256/

3. dpdk: Migrate to new PDUMP API.

   https://patchwork.ozlabs.org/patch/1192971/

4. netdev-dpdk: Prefix network structures with rte_.

   https://patchwork.ozlabs.org/patch/1109733/

5. netdev-dpdk: Update by new color definitions.

   https://patchwork.ozlabs.org/patch/1086089/

6. docs: Update docs to reference 19.11.

7. docs: Add note regarding hotplug and igb_uio requirements.

For credit all authors of the original commits to 'dpdk-latest' with the
above changes been added as co-authors for this commmit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Co-authored-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-12-04 20:51:57 +00:00
Ilya Maximets
b4c5f00c33 netdev-dpdk: Deprecate ring ports.
'dpdkr' a.k.a. DPDK ring ports has really poor support in OVS and not
tested on a regular basis.  These ports are intended to work via
shared memory with another DPDK secondary process, but there are lots
of limitations for using this functionality in practice.  Most of them
connected with running secondary DPDK application and memory layout
issues.  More details are available in DPDK guide:
https://doc.dpdk.org/guides-18.11/prog_guide/multi_proc_support.html#multi-process-limitations

Beside the functional limitations it's also hard to use this
functionality correctly.  User must be sure that OVS and secondary DPDK
application are running on different CPU cores, which is hard because
non-PMD threads could float over available CPU cores.  This or any
other misconfiguration will likely lead to crash of OVS.

Another problem is that the user must actually build the secondary
application with the same version of DPDK that was used for OVS build.

Above issues are same as we have while using DPDK pdump.

Beside that, current implementation in OVS is not able to free
allocated rings that could lead to memory exhausting.

Initially these ports was added to use with IVSHMEM for a fast
zero-copy HOST<-->VM communication.  However, IVSHMEM is not used
anymore.  IVSHMEM support was removed from DPDK in 16.11 release
(instructions for IVSHMEM were removed from the OVS docs almost 3 years
ago by commit 90ca71dd317f ("doc: Remove ivshmem instructions.")) and
the patch for QEMU for using regular files as a device backend is no
longer available.  That makes DPDK ring ports barely useful in real
virtualization environment.

This patch adds a deprecation warnings for run-time port creation
and documentation.  Claiming to completely remove this functionality
from OVS in one of the next releases.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-11-28 16:35:30 +01:00
Sriram Vatala
2f862c712e netdev-dpdk: Detailed packet drop statistics.
OVS may be unable to transmit packets for multiple reasons on
the userspace datapath and today there is a single counter to
track packets dropped due to any of those reasons. This patch
adds custom software stats for the different reasons packets
may be dropped during tx/rx on the userspace datapath in OVS.

- MTU drops : drops that occur due to a too large packet size
- Qos drops : drops that occur due to egress/ingress QOS
- Tx failures: drops as returned by the DPDK PMD send function

Note that the reason for tx failures is not specified in OVS.
In practice for vhost ports it is most common that tx failures
are because there are not enough available descriptors,
which is usually caused by misconfiguration of the guest queues
and/or because the guest is not consuming packets fast enough
from the queues.

These counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics" command and are
available for dpdk and vhostuser/vhostuserclient ports.

Also the existing "tx_retries" counter for vhost ports has been
renamed to "ovs_tx_retries", so that all the custom statistics
that OVS accumulates itself will have the prefix "ovs_". This
will prevent any custom stats names overlapping with
driver/HW stats.

Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-11-11 17:14:55 +01:00
Ilya Maximets
b99ab8aaaf netdev-dpdk: Reuse vhost function for dpdk ETH custom stats.
This is yet another refactoring for upcoming detailed drop stats.
It allows to use single function for all the software calculated
statistics in netdev-dpdk for both vhost and ETH ports.

UINT64_MAX used as a marker for non-supported statistics in a
same way as it's done in bridge.c for common netdev stats.

Co-authored-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-11-11 17:13:01 +01:00
David Marchand
9ff24b9c93 netdev-dpdk: Track vhost tx contention.
Add a coverage counter to help diagnose contention on the vhost txqs.
This is seen as dropped packets on the physical ports for rates that
are usually handled fine by OVS.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-11-08 21:26:56 +01:00
Tomasz Konieczny
c2c84474d4 netdev-dpdk: Fix flow control not configuring.
Currently OVS is unable to change flow control configuration in DPDK
because new settings are being overwritten by current settings with
rte_eth_dev_flow_ctrl_get(). The fix restores correct order of
operations and at the same time does not trigger error on devices
without flow control support when flow control not requested.

Fixes: 7e1de65e8dfb ("netdev-dpdk: Fix failure to configure flow control at netdev-init.")
Signed-off-by: Tomasz Konieczny <tomaszx.konieczny@intel.com>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-11-04 17:23:47 +01:00
Ilya Maximets
15ba075d39 netdev-dpdk: Fix Tx queue false sharing.
'tx_q' array is allocated for each DPDK netdev.  'struct dpdk_tx_queue'
is 8 bytes long, so 8 tx queues are sharing the same cache line in
case of 64B cacheline size.  This causes 'false sharing' issue in
mutliqueue case because taking the spinlock implies write to memory
i.e. cache invalidation.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-10-22 19:29:24 +02:00
Paul Chaignon
940ac2ce88 treewide: Use packet batch APIs
This patch replaces direct accesses to dp_packet_batch and dp_packet
internal components by the appropriate API calls.  It extends commit
1270b6e52 (treewide: Wider use of packet batch APIs).

This patch was generated using the following semantic patch (cf.
http://coccinelle.lip6.fr).

// <smpl>
@ dp_packet @
struct dp_packet_batch *b1;
struct dp_packet_batch b2;
struct dp_packet *p;
expression e;
@@

(
- b1->packets[b1->count++] = p;
+ dp_packet_batch_add(b1, p);
|
- b2.packets[b2.count++] = p;
+ dp_packet_batch_add(&b2, p);
|
- p->packet_type == htonl(PT_ETH)
+ dp_packet_is_eth(p)
|
- p->packet_type != htonl(PT_ETH)
+ !dp_packet_is_eth(p)
|
- b1->count == 0
+ dp_packet_batch_is_empty(b1)
|
- !b1->count
+ dp_packet_batch_is_empty(b1)
|
  b1->count = e;
|
  b1->count++
|
  b2.count = e;
|
  b2.count++
|
- b1->count
+ dp_packet_batch_size(b1)
|
- b2.count
+ dp_packet_batch_size(&b2)
)
// </smpl>

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-09-25 14:42:00 -07:00
Ilya Maximets
5c7ba90d81 netdev-dpdk: Refactor vhost custom stats for extensibility.
vHost interfaces currently has only one custom statistic, but there
might be others in the near future. This refactoring makes the code
work in the same way as it done for dpdk and afxdp stats to keep the
common style over the different code places and makes it easily
extensible for the new stats addition.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2019-08-26 14:03:37 +03:00
Ilya Maximets
18366d1651 netdev-dpdk: Fix not reporting rx_oversize_errors in stats.
There is a big code duplication issue with DPDK xstats that led to
missed "rx_oversize_errors" statistics. It's defined but not used.
Fix that by actually using this stat along with code refactoring that
will allow us to not make same mistakes in the future.
Macro definitions are perfectly suitable to automate code generation
in such cases and already used in a couple of places in OVS for similar
purposes.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
2019-08-26 14:03:37 +03:00
Kevin Traynor
080f080c3b netdev-dpdk: Enable tx-retries-max config.
vhost tx retries can provide some mitigation against
dropped packets due to a temporarily slow guest/limited queue
size for an interface, but on the other hand when a system
is fully loaded those extra cycles retrying could mean
packets are dropped elsewhere.

Up to now max vhost tx retries have been hardcoded, which meant
no tuning and no way to disable for debugging to see if extra
cycles spent retrying resulted in rx drops on some other
interface.

Add an option to change the max retries, with a value of
0 effectively disabling vhost tx retries.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-07-08 12:03:28 +01:00
Kevin Traynor
c161357d5d netdev-dpdk: Add custom stat for vhost tx retries.
vhost tx retries may occur, and it can be a sign that
the guest is not optimally configured.

Add a custom stat so a user will know if vhost tx retries are
occurring and hence give a hint that guest config should be
examined.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-07-08 12:02:51 +01:00
Kevin Traynor
730b34859f netdev-dpdk: Fix additional vhost tx retry.
Fix minor issue of one possible additional retry.

Fixes: c6ec9d176dbf ("netdev-dpdk: Fix vHost stats.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-06-28 09:49:32 +01:00