2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

326 Commits

Author SHA1 Message Date
Róbert Mulik
f8b64a61bc Configurable Link State Change (LSC) detection mode
It is possible to set LSC detection mode to polling or interrupt mode
for DPDK interfaces. The default is polling mode. To set interrupt mode,
option dpdk-lsc-interrupt has to be set to true.

For detailed description and usage see the dpdk install documentation.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Jan Scheurich
8492adc270 netdev: Add optional qfill output parameter to rxq_recv()
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.

A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.

This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Pablo Cascón
65a87968f4 netdev-dpdk: don't enable scatter for jumbo RX support for nfp
Currently to RX jumbo packets fails for NICs not supporting scatter.
Scatter is not strictly needed for jumbo RX support. This change fixes
the issue by not enabling scatter only for the PMD/NIC known not to
need it to support jumbo RX.

Note: this change is temporary and not needed for later releases OVS/DPDK

Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Kevin Traynor
91fccdad72 netdev-dpdk: Free mempool only when no in-use mbufs.
DPDK mempools are freed when they are no longer needed.
This can happen when a port is removed or a port's mtu
is reconfigured so that a new mempool is used.

It is possible that an mbuf is attempted to be returned
to a freed mempool from NIC Tx queues and this can lead
to a segfault.

In order to prevent this, only free mempools when they
are not needed and have no in-use mbufs. As this might
not be possible immediately, create a free list of
mempools and sweep it anytime a port tries to get a
mempool.

Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
Cc: mark.b.kavanagh81@gmail.com
Cc: Ilya Maximets <i.maximets@samsung.com>
Reported-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-04-21 16:59:45 +01:00
Kevin Traynor
1dfebee971 netdev-dpdk: Remove 'error' from non error log.
Presently, if OVS tries to setup more queues than
are allowed by a specific NIC, OVS will handle
this case by retrying with a lower amount of queues.

Rather than reporting initial failed queue setups
in the logs as ERROR, they are reported as INFO but
contain the word 'error'. Unless a user has detailed
knowledge of OVS-DPDK workings, this is confusing.

Let's remove 'error' from the INFO log.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-03-23 11:35:34 +00:00
Ilya Maximets
fa9f4eebd3 netdev-dpdk: Fix print format for dpdk port ids.
Since 17.11 release DPDK uses uint16 for port_id. Format
strings for printing functions must be updated accordingly.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-03-23 11:28:35 +00:00
Justin Pettit
e883448e3f dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-02-28 14:53:27 -08:00
Ciara Loftus
10087cba9d netdev-dpdk: Add support for vHost dequeue zero copy (experimental)
Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
option to 'true' when configuring the Interface:

ovs-vsctl set Interface dpdkvhostuserclient0
options:vhost-server-path=/tmp/dpdkvhostuserclient0
options:dq-zero-copy=true

When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.

Testing of the patch showed a ~8% improvement when switching 512B
packets between vHost devices on different VMs on the same host when
zero copy was enabled on the transmitting device.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-31 14:04:35 +00:00
Ilya Maximets
ac1a9bb93f netdev-dpdk: Fix xstats leak on port destruction.
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ilya Maximets
34eb086342 netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ilya Maximets
526259f22c netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Yuanhan Liu
5e75881868 netdev-dpdk: fix port addition for ports sharing same PCI id
Some NICs have only one PCI address associated with multiple ports. This
patch extends the dpdk-devargs option's format to cater for such devices.

To achieve that, this patch uses a new syntax that will be adapted and
implemented in future DPDK release (likely, v18.05):
    http://dpdk.org/ml/archives/dev/2017-December/084234.html

And since it's the DPDK duty to parse the (complete and full) syntax
and this patch is more likely to serve as an intermediate workaround,
here I take a simpler and shorter syntax from it (note it's allowed to
have only one category being provided):
    class=eth,mac=00:11:22:33:44:55:66

Also, old compatibility is kept. Users can still go on with using the
PCI id to add a port (if that's enough for them). Meaning, this patch
will not break anything.

This patch is basically based on the one from Ciara:
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html

Cc: Loftus Ciara <ciara.loftus@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ian Stokes
f6f50552a3 netdev-dpdk: Fix requested MTU size validation.
This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
in netdev_dpdk_set_mtu(), in order to determine if the total length of
the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.

When setting an MTU we first check if the requested total frame length
(which includes associated L2 overhead) will exceed the maximum
frame length supported in netdev_dpdk_set_mtu(). The frame length is
calculated by MTU_TO_FRAME_LEN  as MTU + ETHER_HEADER + ETHER_CRC. The MTU
for the device will be set at a later stage in dpdk_eth_dev_init() using
rte_eth_dev_set_mtu(mtu).

However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
that the frame does not exceed the max frame length for that device varies
between DPDK device drivers. For example ixgbe driver calculates the
frame length for a given MTU as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN

i40e driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2

em driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE

Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
the upper limit MTU for that devices DPDK driver. This leads to a segfault.
This is because the frame length comparison as is, does not take into account
the addition of the vlan tag overhead expected in the drivers. The
netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
dpdk_eth_dev_init() will fail before the queues have been created for the
DPDK device. This coupled with assumptions regarding reconfiguration
requirements for the netdev will lead to a segfault when the rxq is polled
for this device.

A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
validating a requested MTU in netdev_dpdk_set_mtu().
MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)

By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
now takes into account the maximum L2 overhead that a DPDK driver could
allow for in its frame size calculation. This allows OVS to flag an error
rather than the DPDK driver if the frame length exceeds the max DPDK frame
length. OVS can fail gracefully at this point and use the default MTU of
1500 to continue to configure the port.

Note: this fix is a work around, a better approach would be if DPDK devices
could report the maximum MTU value that can be requested on a per device
basis. This capability however is not currently available. A downside of
this patch is that the MTU upper limit will be reduced by 8 bytes for
DPDK devices that do not need to account for vlan tags in the frame length
driver calculations e.g. ixgbe devices upper MTU limit is reduced from
the OVS point of view from 9710 to 9702.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2018-01-26 20:49:18 +00:00
Flavio Leitner
b2e8b12f8a netdev-dpdk: add vhost-user get_status.
Expose relevant vhost-user information in status.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-17 18:12:46 +00:00
zhangliping
4c47ddde34 netdev-dpdk: fix ingress_policer leak on error path
Fix memory leak by freeing the policer if rte_meter_srtcm_config fails.

Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.")
Signed-off-by: zhangliping <zhangliping02@baidu.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-17 18:11:28 +00:00
Michal Weglicki
971f4b394c netdev: Custom statistics.
- New get_custom_stats interface function is added to netdev. It
  allows particular netdev implementation to expose custom
  counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
  port type.
- DPDK-physical implementation uses xstats to collect statistics.
  Only dropped and error counters are exposed.

Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-10 15:29:13 -08:00
Ilya Maximets
ad8b0b4fe7 netdev: Remove useless cutlen.
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
be48173310 netdev-dpdk: Add debug appctl to get mempool information.
New appctl 'netdev-dpdk/get-mempool-info' implemented to get result
of 'rte_mempool_list_dump()' function if no arguments passed and
'rte_mempool_dump()' if DPDK netdev passed as argument.

Could be used for debugging mbuf leaks and other mempool related
issues. Most useful in pair with `grep -v "cache_count.*=0"`.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-20 21:07:46 +00:00
Michal Weglicki
3eb8d4fa0d netdev-dpdk: extend netdev_dpdk_get_status to include if_type and if_descr
This commit extends netdev_dpdk_get_status API to include additional
driver-related information: if_type and if_descr.

v2->v3: Code rebase.
v3->v4: Minor comments applied.
v5->v6: Adds DPDK port specific description in documentation.

Co-authored-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Mark Kavanagh
a14d1cc8a7 netdev-dpdk: vHost IOMMU support
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.

This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.

This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:

    ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Mark Kavanagh
5e925ccc2a netdev-dpdk: DPDK v17.11 upgrade
This commit adds support for DPDK v17.11:
- minor updates to accomodate DPDK API changes
- update references to DPDK version in Documentation
- update DPDK version in travis' linux-build script
- document DPDK v17.11 virtio driver bug

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Guoshuai Li <ligs@dtdream.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Kevin Traynor
255b7bda98 netdev-dpdk: Remove uneeded call to rte_eth_dev_count().
The call to rte_eth_dev_count() was added as workaround
for rte_eth_dev_get_port_by_name() not handling cases
when there was no DPDK ports.

In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name()
does handle this case (DPDK commit f9ae888b1e19).
rte_eth_dev_count() is no longer needed so remove it.

Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
b2e72a9c9d netdev-dpdk: Add comment about variables naming convention.
It'll be nice to document current naming convention for variables of
the following types used in netdev-dpdk:

	* netdev
	* netdev_dpdk
	* netdev_rxq
	* netdev_rxq_dpdk

to be sure that we will not return to chaos which was before
commit d46285a2206f ("netdev-dpdk: Consistent variable naming.").

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
3d0d5ab153 netdev-dpdk: Fix variables naming in set_admin_state function.
Function 'netdev_dpdk_set_admin_state()' was missed while fixing
variables naming according to the following convention:

    'struct netdev':'netdev'
    'struct netdev_dpdk':'dev'
    'struct netdev_rxq':'rxq'
    'struct netdev_rxq_dpdk':'rx'

Fixes: d46285a2206f ("netdev-dpdk: Consistent variable naming.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokess <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
af5b0dad30 netdev-dpdk: Fix mempool creation with large MTU.
Currently mempool name size limited to 25 characters by
RTE_MEMPOOL_NAMESIZE. netdev-dpdk tries to create mempool with the
following name pattern: "ovs_%{hash}_%{socket}_%{mtu}_%{n_mbuf}".

We have 3 chars for "ovs" + 4 chars for delimiters + 8 chars for
hash (because it's the 32 bit integer printed in hex) + 1 char for
socket_id (mostly 1, but it could be 2 on some systems; larger?) = 16.

Only 25 - 16 = 9 characters remains for mtu + n_mbufs.
Minimum usual value for mtu is 1500 --> 2030 (4 chars) after
dpdk_buf_size conversion and the minimum value for n_mbufs is 16384
(5 chars). So, all the 9 characters are used.

If we'll try to create port with mtu = 9500, mempool creation will
fail, because FRAME_LEN_TO_MTU(dpdk_buf_size(9500)) = 10222 (5 chars)
and this value will overflow the RTE_MEMPOOL_NAMESIZE limit.

Same issue will happen if we'll try to create port with big enough
number of queues or will try to create big enough number of PMD
threads (number of tx queues will enlarge the mempool requirements).

Fix that by removing the delimiters. To keep the readability (at least
partial) of the mempool names exact field sizes with zero padding
are used.

Following limits should be suitable for now:
 - Hash length: 8 chars (uint32_t in hex)
 - Socket ID  : 2 chars (For systems with up to 10 sockets)
 - MTU        : 5 chars (MTU (10^5 - 1) should be enough for now)
 - n_mbufs    : 7 chars (Up to 10^7 of mbufs)

   Total      : 22 + 3 (for "ovs") = 25

CC: Antonio Fischetti <antonio.fischetti@intel.com>
CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Fixes: f06546a51dd8 ("Fix mempool names to reflect socket id.")
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-17 16:26:33 +00:00
Ilya Maximets
daf22bf7a8 netdev-dpdk: Fix calling vhost API with negative vid.
Currently, rx and tx functions for vhost interfaces always obtain
'vid' twice. First time inside 'is_vhost_running' for checking
the value and the second time in enqueue/dequeue function calls to
send/receive packets. But second time we're not checking the
returned value. If vhost device will be destroyed between
checking and enqueue/dequeue, DPDK API will be called with
'-1' instead of valid 'vid'. DPDK API does not validate the 'vid'.
This leads to getting random memory value as a pointer to internal
device structure inside DPDK. Access by this pointer leads to
segmentation fault. For example:

  |00503|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
  [New Thread 0x7fb6754910 (LWP 21246)]

  Program received signal SIGSEGV, Segmentation fault.
  rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630
  630             if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))
  (gdb) bt full
  #0  rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630
          dev = 0xffffffff
  #1  __netdev_dpdk_vhost_send at lib/netdev-dpdk.c:1803
          tx_pkts = <optimized out>
          cur_pkts = 0x7f340084f0
          total_pkts = 32
          dropped = 0
          i = <optimized out>
          retries = 0
  ...
  (gdb) p *((struct netdev_dpdk *) netdev)
  $8 = { ... ,
        flags = (NETDEV_UP | NETDEV_PROMISC), ... ,
        vid = {v = -1},
        vhost_reconfigured = false, ... }

Issue can be reproduced by stopping DPDK application (testpmd) inside
guest while heavy traffic flows to this VM.

Fix that by obtaining and checking the 'vid' only once.

CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Billy O'Mahony <billy.o.mahony@intel.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Ilya Maximets
bc57ed901f netdev-dpdk: Remove unused MAX_NB_MBUF.
MAX_NB_MBUF was used as a default mempool size for almost all ports.
Not used since new per-port mempool allocation introduced.
MIN_NB_MBUF still used as a lower limit.

CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Ilya Maximets
24e78f9350 netdev-dpdk: Factor out struct dpdk_mp.
Since commit d555d9bded5f ("netdev-dpdk: Create separate memory pool
for each port."), struct dpdk_mp is redundant because each mempool
can be used by single port only and this port already contains all
the information we store in dpdk_mp.
There is no need to duplicate the information.
Fields of this structure currently used only to generate mempool name.
But it's required only while creation and after that we can use
mp->name directly from the struct rte_mempool.

Let's remove this structure and use struct rte_mempool directly
instead.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Ilya Maximets
173ef76bbf netdev-dpdk: Fix dpdk_mp leak in case of EEXIST.
In case of EEXIST, 'dpdk_mp_create()' will allocate yet another
'struct dpdk_mp' with same 'mp' pointer inside. We need to free
this structure to avoid the leak.

CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
CC: Antonio Fischetti <antonio.fischetti@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Fixes: b6b26021d2e2 ("netdev-dpdk: fix management of pre-existing mempools.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Mark Kavanagh
7ee94cbac8 netdev-dpdk: replace uint8_t with dpdk_port_t
netdev_dpdk_detach() declares a 'port_id' variable, of type uint8_t.
This variable should instead be of type dpdk_port_t.

Fixes: bb37956ac ("netdev-dpdk: Use uint8_t for port_id.")
CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Bhanuprakash Bodireddy
23d4d53f14 netdev-dpdk: Refactor netdev_dpdk structure.
This commit introduces below changes to netdev_dpdk structure.

- Mark cachelines and reorder few member variables.
- Maintain the grouping of related member variables.
- Add comment on the information on pad bytes where ever appropriate, so
  new members can be introduced in the future to fill the gaps.

  Below is how this structure looks with this commit.

                  Member                    size

         OVS_CACHE_LINE_MARKER cacheline0;
             dpdk_port_t port_id;            1
             bool attached;                  1
             ...

         OVS_CACHE_LINE_MARKER cacheline1;
             struct ovs_mutex;              48
             struct dpdk_mp *dpdk_mp;        8
             ...

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 13:36:53 -07:00
Ilya Maximets
ec6edc8cde netdev-dpdk: Fix mp_name leak on snprintf failure.
CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
CC: Antonio Fischetti <antonio.fischetti@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Fixes: 65056fd79694 ("netdev-dpdk: manage failure in mempool name creation.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-10-30 14:58:52 -07:00
antonio.fischetti@intel.com
a08a115d26 netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
For readability purposes dpdk_mp_put is renamed as dpdk_mp_free.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:45:33 +01:00
antonio.fischetti@intel.com
ad9b5b9bc7 netdev-dpdk: Reword mp_size as n_mbufs.
For code readability purposes mp_size is renamed as n_mbufs
in dpdk_mp structure.
This parameter is passed to rte mempool creation functions
and is meant to contain the number of elements inside
the requested mempool.

CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:23:41 +01:00
antonio.fischetti@intel.com
65056fd796 netdev-dpdk: manage failure in mempool name creation.
In case a mempool name could not be generated log a message
and return a null mempool pointer to the caller.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:17:27 +01:00
antonio.fischetti@intel.com
837c1761be netdev-dpdk: skip init for existing mempools.
Skip initialization of mempool packet areas if this was already
done in a previous call to dpdk_mp_create.

CC: Darrell Ball <dlu998@gmail.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:08:01 +01:00
antonio.fischetti@intel.com
f06546a51d Fix mempool names to reflect socket id.
Create mempool names by considering also the NUMA socket number.
So a name reflects what socket the mempool is allocated on.
This change is needed for the NUMA-awareness feature.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 19:49:29 +01:00
antonio.fischetti@intel.com
b6b26021d2 netdev-dpdk: fix management of pre-existing mempools.
Fix an issue on reconfiguration of pre-existing mempools.
This patch avoids to call dpdk_mp_put() - and erroneously
release the mempool - when it already exists.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Aaron Conole <aconole@redhat.com>
CC: Darrell Ball <dlu998@gmail.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Róbert Mulik <robert.mulik@ericsson.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 19:46:58 +01:00
Zoltan Balogh
75fb914892 netdev-dpdk: reset packet_type for reused dp_packets.
DPDK uses dp-packet pool for storing received packets. The pool is
reused by rxq_recv funcions of the DPDK netdevs. The datapath is
capable to modify the packet_type property of packets. For instance
when encapsulated L3 packets are received on a ptap gre port.
In this case the packet_type property of struct dp_packet can be
modified and later the same dp_packet with the modified packet_type
can be reused in the rxq_rec function, so it can contain corrupted
data.

The dp_packet_batch_init_cutlen() in the rxq_recv functions iterates
over dp_packets and sets their cutlen. So I modified this function
to set packet_type to Ethernet for the dp_packets as well. I also
renamed this function because of the added functionality.

The dp_packet_batch_init_cutlen() iterates over batch->count dp_packet.
Therefore setting of batch->count = nb_rx needs to be done before the
former function is invoked. This is an additional fix.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Laszlo Suru <laszlo.suru@ericsson.com>
Co-authored-by: Laszlo Suru <laszlo.suru@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
CC: Sugesh Chandran <sugesh.chandran@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:58:13 -07:00
Bhanuprakash Bodireddy
fd57eebacb netdev-dpdk: Minor cleanup of netdev_dpdk_send__.
The variable 'cnt' is initialized and reused in multiple function calls
inside netdev_dpdk_send__() and is confusing sometimes. Instead introduce
'batch_cnt' to hold the original packet count and 'tx_cnt' to store
the final packet count resulting after filtering and qos operations.

Finally 'tx_cnt' packets gets transmitted on the respective 'qid'.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:14:59 -07:00
Bhanuprakash Bodireddy
8a14bd7b6b netdev-dpdk: Cleanup dpdk_do_tx_copy.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:12:22 -07:00
Bhanuprakash Bodireddy
8a543eb03a netdev-dpdk: Use DP_PACKET_BATCH_FOR_EACH in netdev_dpdk_ring_send.
Use DP_PACKET_BATCH_FOR_EACH macro in netdev_dpdk_ring_send().

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:05:49 -07:00
Gao Zhenyu
3e90f7d753 netdev-dpdk: Execute QoS Checking before copying to mbuf.
In dpdk_do_tx_copy function, all packets were copied to mbuf first,
but QoS checking may drop some of them.
Move the QoS checking in front of copying data to mbuf, it helps to
reduce useless copy.

Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com>
Acked-by: ian.stokes@intel.com<mailto:ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-06 22:27:53 -07:00
Robert Wojciechowicz
d555d9bded netdev-dpdk: Create separate memory pool for each port.
Since it's possible to delete memory pool in DPDK
we can try to estimate better required memory size
when port is reconfigured, e.g. with different number
of rx queues.

CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
wangzhike
894af647a8 netdev-dpdk: update vhost user client port status.
After ovs-vswitchd reboots, vhost user client port status is displayed as
LINK DOWN though the traffic is OK.

The problem is that the port may be udpated while the vhost_reconfigured
is false. Then the vhost_reconfigured is updated to true. As a result,
the vhost port status is kept as LINK-DOWN.

Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
wangzhike
50986e7842 netdev-dpdk: vhost get stats fix.
In netdev_dpdk_vhost_get_stats, '+=' was used in a few places
where '=' was expected.

Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 14:49:55 -07:00
Lance Richardson
602c86681c netdev-dpdk: use 64-bit arithmetic when converting rates.
Force 64-bit arithmetic to be used when converting uint32_t rate
and burst parameters from kilobits per second to bytes per second,
avoiding incorrect behavior for rates exceeding UINT_MAX bits
per second.

Reported-by: "王志克" <wangzhike@jd.com>
Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-By: Mark Michelson <mmichels@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 14:35:31 -07:00
Darrell Ball
11d4c7a843 dp-packet: Refactor DPDK packet initialization.
DPDK uses dp-packet pools and manages the mbuf portion of
each packet. When a pool is created, partial initialization is
also done on the OVS portion (i.e. non-mbuf).  Since packet
memory is reused, this is not very useful for transient
fields and is also misleading.  Furthermore, some of these
transient fields are properly initialized for DPDK packets
entering OVS anyways, which is the only reasonable way to do this.
Another field, cutlen, is initialized in this manner in the pool
and intended to be reset when cutlen is applied on sending the
packet out. However, if cutlen context is set but the packet is
not sent out for some reason, then the packet header would be
corrupted in the memory pool.  It is better to just reset the
cutlen in the packets when received.  I did not detect a
degradation in performance, however, I would be willing to
have some degradation, since this is a proper way to handle
this.  In addition to initializing cutlen in received packets,
the other OVS transient fields are removed from the DPDK pool
initialization.

Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-24 22:09:58 -07:00
Aaron Conole
fc56f5e0f5 netdev-dpdk: include dpdk PCI header directly
As part of a devargs rework in DPDK, the PCI header file was removed, and
needs to be directly included.  This isn't required to build with 17.05 or
earlier, but will be required should a future update happen.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-By: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
2017-08-10 13:44:24 -07:00