2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-26 03:47:27 +00:00

350 Commits

Author SHA1 Message Date
Ilya Maximets
9474073615 netdev-dpdk: Dump flow patterns only if debug enabled.
No need to waste time for fields checking in case DBG disabled.
Additionally sequence of prints replaced with single print
to avoid output interrupting by other log messages.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-11-02 15:16:14 +00:00
Ilya Maximets
faf71e4922 netdev-dpdk: Print port name in offload API messages.
This is useful for understanding which flows offloaded to
which ports.

Code refactored a bit to reduce number of casts.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-11-02 15:15:27 +00:00
Ilya Maximets
5752eae485 dpif-netdev: Fix cmap node use after free on flow disassociation.
Data pointed by cmap node must not be freed while iterating.
ovsrcu_postpone should be used instead.

CC: Finn Christensen <fc@napatech.com>
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-11-02 15:13:54 +00:00
Ilya Maximets
95ca79d542 netdev-dpdk: Secure flow offload API.
rte API is not thread safe. We have to get netdev mutex
before uing it and also before using fields of netdev structure.

This is important because offload API used from the separate
thread and could be used at the same time with other netdev
functions called from the main thread.

CC: Finn Christensen <fc@napatech.com>
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-11-02 15:13:40 +00:00
Ilya Maximets
c0af6425d7 netdev-dpdk: Drop offload API for vhost ports.
vhost ports are not DPDK eth ports and has no rte_flow API.
Stop calling this API with DPDK_ETH_PORT_ID_INVALID to
avoid time wasting and errors in log.

Additionally, DPDK_FLOW_OFFLOAD_API definition moved to .c
file, because there is no need to expose it in header.

CC: Finn Christensen <fc@napatech.com>
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-11-02 15:13:19 +00:00
Ben Pfaff
89c09c1cd1 netdev: Clean up class initialization.
The macros are hard to read.  This makes it a little more readable.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-27 17:48:23 +01:00
Xu Binbin
74cd69a479 netdev-dpdk: Support the link speed of XL710
In the scenario of XL710, the link speed which stored in the table
of Interface is not 40G. Because the implementation of query of link
speed only support to 10G, the parameter 'current' will be a random
value in the scenario of higher link speed. In this case, incorrect
link speed of XL710 nic will be stored in the database.

Signed-off-by: Xu Binbin <xu.binbin1@zte.com.cn>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-27 17:48:23 +01:00
Kevin Traynor
51c6a5a3c8 netdev-dpdk: Use hex for PCI vendor ID.
Match the prefix and formatting.

Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.")
Cc: pshelar@ovn.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-08 22:06:21 +01:00
Sugesh Chandran
7e1de65e8d netdev-dpdk: Fix failure to configure flow control at netdev-init.
Configuring flow control at ixgbe netdev-init is throwing error in port
start.

For eg: without this fix, user cannot configure flow control on ixgbe dpdk
port as below,

"
    ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
        options:dpdk-devargs=0000:05:00.1 options:rx-flow-ctrl=true
"

Instead,  it must be configured as two different commands,

"
    ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
               options:dpdk-devargs=0000:05:00.1
    ovs-vsctl set Interface dpdk0 options:rx-flow-ctrl=true
"

The DPDK ixgbe driver is now validating all the 'rte_eth_fc_conf' fields before
trying to configuring the dpdk ethdev. Hence OVS can no longer set the
'dont care' fields to just '0' as before. This commit make sure all the
'rte_eth_fc_conf' fields are populated with default values before the dev
init.

Also to avoid read error on unsupported ports, the flow control parameters
are now read only when user is trying to configure/update it.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-08 22:06:21 +01:00
Ben Pfaff
773c3cb40f netdev-dpdk: Use ETH_ADDR_BYTES_ARGS instead of open-coding it.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-07-24 22:36:38 +01:00
Ben Pfaff
31a033cb71 netdev-dpdk: Fix sparse complaints.
Neither of these is a real problem.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-07-24 22:36:29 +01:00
Ben Pfaff
2b7b5dbb07 netdev-dpdk: Fix incorrect byte order conversion in log message.
uint8_t values shouldn't be passed to ntohs().

Found by soarse.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-07-24 22:36:21 +01:00
Ian Stokes
43307ad0e2 dpdk: Support both shared and per port mempools.
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.

OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:

https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html

However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.

This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.

This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.

This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.

    ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true

This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.

The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits

c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)

A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Tested-by: Tiago Lam <tiago.lam@intel.com>
2018-07-06 12:46:26 +01:00
Yuanhan Liu
daf90186e2 netdev-dpdk: add debug for rte flow patterns
For debug purpose.

Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-07-06 10:32:52 +01:00
Finn Christensen
e8a2b5bf92 netdev-dpdk: implement flow offload with rte flow
The basic yet the major part of this patch is to translate the "match"
to rte flow patterns. And then, we create a rte flow with MARK + RSS
actions. Afterwards, all packets match the flow will have the mark id in
the mbuf.

The reason RSS is needed is, for most NICs, a MARK only action is not
allowed. It has to be used together with some other actions, such as
QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which
may break the rss. Likely, RSS action is currently the best we could
now. Thus, RSS action is choosen.

For any unsupported flows, such as MPLS, -1 is returned, meaning the
flow offload is failed and then skipped.

Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-07-06 10:32:52 +01:00
John Hurley
88dcf2aa82 netdev-provider: add class op to get block_id
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.

Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:51:47 +02:00
Aaron Conole
b9a3183d3a netdev-dpdk: Avoid warning for snprintf() call.
lib/netdev-dpdk.c: In function :
lib/netdev-dpdk.c:2865:49: warning:  output may be truncated before the last format character [-Wformat-truncation=]
        snprintf(vhost_vring, 16, "vring_%d_size", i);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2018-06-15 11:26:14 -07:00
Ian Stokes
4dd16ca0c3 netdev-dpdk: Handle ENOTSUP for rte_eth_dev_set_mtu.
The function rte_eth_dev_set_mtu is not supported for all DPDK drivers.
Currently if it is not supported we return an error in
dpdk_eth_dev_queue_setup. There are two issues with this.

(i) A device can still function even if rte_eth_dev_set_mtu is not
supported albeit with the default max rx packet length.

(ii) When ENOTSUP is returned it will not be caught in port_reconfigure()
at the dpif-netdev layer. Port_reconfigure() checks if a netdev_reconfigure()
function is supported for a given netdev and ignores EOPNOTSUPP errors as it
assumes errors of this value mean there is no reconfiguration function.
In this case the reconfiguration function is supported for netdev dpdk but
a function called as part of the reconfigure (rte_eth_dev_set_mtu) may
not be supported.

As this is a corner case, this commit warns a user when
rte_eth_dev_set_mtu is not supported and informs them of the default
max rx packet length that will be used instead.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-author: Michal Weglicki <michalx.weglicki@intel.com>
Tested-By: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
2018-06-08 17:27:56 +01:00
Michal Weglicki
e10ca8b921 netdev-dpdk: Enable HW_CRC_STRIP for virtual functions.
Virtual functions such as igb_vf and i40e_vf require HW_CRC_STRIP to be
explicitly enabled before configuration, otherwise device configuration
will fail.

This commit achieves this by adding NETDEV_RX_HW_CRC_STRIP to
dpdk_hw_ol_features. When a dpdk device is added, the driver for the
device is examined, if the device is a virtual function enable
HW_CRC_STRIP.

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Co-Authored: Ian Stokes <ian.stokes@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-06-08 17:27:56 +01:00
Timothy Redaelli
7bbc2e1def netdev-dpdk: fix check for "net_nfp" driver
Currently the check of "net_nfp" driver while enabling scatter compares
only the first 6 bytes, but "net_nfp" is 7 bytes long.

This change fixes the check by comparing the first 7 bytes.

CC: Pablo Cascón <pablo.cascon@netronome.com>
CC: Simon Horman <simon.horman@netronome.com>
Fixes: 65a87968f4cf ("netdev-dpdk: don't enable scatter for jumbo RX support for nfp")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-25 09:09:50 +01:00
Eelco Chaudron
606f665072 netdev-dpdk: Don't use PMD driver if not configured successfully
When initialization of the DPDK PMD driver fails
(dpdk_eth_dev_init()), the reconfigure_datapath() function will remove
the port from dp_netdev, and the port is not used.

Now when bridge_reconfigure() is called again, no changes to the
previous failing netdev configuration are detected and therefore the
ports gets added to dp_netdev and used uninitialized. This is causing
exceptions...

The fix has two parts to it. First in netdev-dpdk.c we remember if the
DPDK port was started or not, and when calling
netdev_dpdk_reconfigure() we also try re-initialization if the port
was not already active. The second part of the change is in
dpif-netdev.c where it makes sure netdev_reconfigure() is called if
the port needs reconfiguration, as netdev_is_reconf_required() is only
true until netdev_reconfigure() is called (even if it fails).

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-25 09:09:50 +01:00
Kevin Traynor
1f84a2d5b5 netdev-dpdk: Remove use of rte_mempool_ops_get_count.
rte_mempool_ops_get_count is not exported by DPDK so it means it
cannot be used by OVS when using DPDK as a shared library.

Remove rte_mempool_ops_get_count but still use rte_mempool_full
and document it's behavior.

Fixes: 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.")
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-25 09:09:50 +01:00
Darrell Ball
7d7ded7af7 odp-execute: Rename 'may_steal' to 'should_steal'.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-05-23 11:36:47 -07:00
Eelco Chaudron
eaa4358119 netdev-dpdk: Fixed netdev_dpdk structure alignment
Currently, the code tells us we have 4 pad bytes left in cacheline0
while actually we are 8 bytes short:

struct netdev_dpdk {
	union {
		OVS_CACHE_LINE_MARKER cacheline0;        /*           1 */
		struct {
			dpdk_port_t port_id;             /*     0     2 */
			_Bool      attached;             /*     2     1 */
			struct eth_addr hwaddr;          /*     4     6 */
			int        mtu;                  /*    12     4 */
			int        socket_id;            /*    16     4 */
			int        buf_size;             /*    20     4 */
			int        max_packet_len;       /*    24     4 */
			enum dpdk_dev_type type;         /*    28     4 */
			enum netdev_flags flags;         /*    32     4 */
			char *     devargs;              /*    40     8 */
			struct dpdk_tx_queue * tx_q;     /*    48     8 */
			struct rte_eth_link link;        /*    56     8 */
			int        link_reset_cnt;       /*    64     4 */
		};                                       /*          72 */
		uint8_t            pad9[128];            /*         128 */
	};                                               /*     0   128 */
	/* --- cacheline 2 boundary (128 bytes) --- */

Re-located one member, link_reset_cnt, and now it's one cache line:

struct netdev_dpdk {
	union {
		OVS_CACHE_LINE_MARKER cacheline0;        /*           1 */
		struct {
			dpdk_port_t port_id;             /*     0     2 */
			_Bool      attached;             /*     2     1 */
			struct eth_addr hwaddr;          /*     4     6 */
			int        mtu;                  /*    12     4 */
			int        socket_id;            /*    16     4 */
			int        buf_size;             /*    20     4 */
			int        max_packet_len;       /*    24     4 */
			enum dpdk_dev_type type;         /*    28     4 */
			enum netdev_flags flags;         /*    32     4 */
			int        link_reset_cnt;       /*    36     4 */
			char *     devargs;              /*    40     8 */
			struct dpdk_tx_queue * tx_q;     /*    48     8 */
			struct rte_eth_link link;        /*    56     8 */
		};                                       /*          64 */
		uint8_t            pad9[64];             /*          64 */
	};                                               /*     0    64 */
	/* --- cacheline 1 boundary (64 bytes) --- */

Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Róbert Mulik
f8b64a61bc Configurable Link State Change (LSC) detection mode
It is possible to set LSC detection mode to polling or interrupt mode
for DPDK interfaces. The default is polling mode. To set interrupt mode,
option dpdk-lsc-interrupt has to be set to true.

For detailed description and usage see the dpdk install documentation.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Jan Scheurich
8492adc270 netdev: Add optional qfill output parameter to rxq_recv()
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.

A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.

This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Pablo Cascón
65a87968f4 netdev-dpdk: don't enable scatter for jumbo RX support for nfp
Currently to RX jumbo packets fails for NICs not supporting scatter.
Scatter is not strictly needed for jumbo RX support. This change fixes
the issue by not enabling scatter only for the PMD/NIC known not to
need it to support jumbo RX.

Note: this change is temporary and not needed for later releases OVS/DPDK

Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-11 08:08:24 +01:00
Kevin Traynor
91fccdad72 netdev-dpdk: Free mempool only when no in-use mbufs.
DPDK mempools are freed when they are no longer needed.
This can happen when a port is removed or a port's mtu
is reconfigured so that a new mempool is used.

It is possible that an mbuf is attempted to be returned
to a freed mempool from NIC Tx queues and this can lead
to a segfault.

In order to prevent this, only free mempools when they
are not needed and have no in-use mbufs. As this might
not be possible immediately, create a free list of
mempools and sweep it anytime a port tries to get a
mempool.

Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
Cc: mark.b.kavanagh81@gmail.com
Cc: Ilya Maximets <i.maximets@samsung.com>
Reported-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-04-21 16:59:45 +01:00
Kevin Traynor
1dfebee971 netdev-dpdk: Remove 'error' from non error log.
Presently, if OVS tries to setup more queues than
are allowed by a specific NIC, OVS will handle
this case by retrying with a lower amount of queues.

Rather than reporting initial failed queue setups
in the logs as ERROR, they are reported as INFO but
contain the word 'error'. Unless a user has detailed
knowledge of OVS-DPDK workings, this is confusing.

Let's remove 'error' from the INFO log.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-03-23 11:35:34 +00:00
Ilya Maximets
fa9f4eebd3 netdev-dpdk: Fix print format for dpdk port ids.
Since 17.11 release DPDK uses uint16 for port_id. Format
strings for printing functions must be updated accordingly.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-03-23 11:28:35 +00:00
Justin Pettit
e883448e3f dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2018-02-28 14:53:27 -08:00
Ciara Loftus
10087cba9d netdev-dpdk: Add support for vHost dequeue zero copy (experimental)
Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
option to 'true' when configuring the Interface:

ovs-vsctl set Interface dpdkvhostuserclient0
options:vhost-server-path=/tmp/dpdkvhostuserclient0
options:dq-zero-copy=true

When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.

Testing of the patch showed a ~8% improvement when switching 512B
packets between vHost devices on different VMs on the same host when
zero copy was enabled on the transmitting device.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-31 14:04:35 +00:00
Ilya Maximets
ac1a9bb93f netdev-dpdk: Fix xstats leak on port destruction.
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ilya Maximets
34eb086342 netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ilya Maximets
526259f22c netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Yuanhan Liu
5e75881868 netdev-dpdk: fix port addition for ports sharing same PCI id
Some NICs have only one PCI address associated with multiple ports. This
patch extends the dpdk-devargs option's format to cater for such devices.

To achieve that, this patch uses a new syntax that will be adapted and
implemented in future DPDK release (likely, v18.05):
    http://dpdk.org/ml/archives/dev/2017-December/084234.html

And since it's the DPDK duty to parse the (complete and full) syntax
and this patch is more likely to serve as an intermediate workaround,
here I take a simpler and shorter syntax from it (note it's allowed to
have only one category being provided):
    class=eth,mac=00:11:22:33:44:55:66

Also, old compatibility is kept. Users can still go on with using the
PCI id to add a port (if that's enough for them). Meaning, this patch
will not break anything.

This patch is basically based on the one from Ciara:
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html

Cc: Loftus Ciara <ciara.loftus@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ian Stokes
f6f50552a3 netdev-dpdk: Fix requested MTU size validation.
This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
in netdev_dpdk_set_mtu(), in order to determine if the total length of
the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.

When setting an MTU we first check if the requested total frame length
(which includes associated L2 overhead) will exceed the maximum
frame length supported in netdev_dpdk_set_mtu(). The frame length is
calculated by MTU_TO_FRAME_LEN  as MTU + ETHER_HEADER + ETHER_CRC. The MTU
for the device will be set at a later stage in dpdk_eth_dev_init() using
rte_eth_dev_set_mtu(mtu).

However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
that the frame does not exceed the max frame length for that device varies
between DPDK device drivers. For example ixgbe driver calculates the
frame length for a given MTU as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN

i40e driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2

em driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE

Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
the upper limit MTU for that devices DPDK driver. This leads to a segfault.
This is because the frame length comparison as is, does not take into account
the addition of the vlan tag overhead expected in the drivers. The
netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
dpdk_eth_dev_init() will fail before the queues have been created for the
DPDK device. This coupled with assumptions regarding reconfiguration
requirements for the netdev will lead to a segfault when the rxq is polled
for this device.

A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
validating a requested MTU in netdev_dpdk_set_mtu().
MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)

By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
now takes into account the maximum L2 overhead that a DPDK driver could
allow for in its frame size calculation. This allows OVS to flag an error
rather than the DPDK driver if the frame length exceeds the max DPDK frame
length. OVS can fail gracefully at this point and use the default MTU of
1500 to continue to configure the port.

Note: this fix is a work around, a better approach would be if DPDK devices
could report the maximum MTU value that can be requested on a per device
basis. This capability however is not currently available. A downside of
this patch is that the MTU upper limit will be reduced by 8 bytes for
DPDK devices that do not need to account for vlan tags in the frame length
driver calculations e.g. ixgbe devices upper MTU limit is reduced from
the OVS point of view from 9710 to 9702.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2018-01-26 20:49:18 +00:00
Flavio Leitner
b2e8b12f8a netdev-dpdk: add vhost-user get_status.
Expose relevant vhost-user information in status.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-17 18:12:46 +00:00
zhangliping
4c47ddde34 netdev-dpdk: fix ingress_policer leak on error path
Fix memory leak by freeing the policer if rte_meter_srtcm_config fails.

Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.")
Signed-off-by: zhangliping <zhangliping02@baidu.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-17 18:11:28 +00:00
Michal Weglicki
971f4b394c netdev: Custom statistics.
- New get_custom_stats interface function is added to netdev. It
  allows particular netdev implementation to expose custom
  counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
  port type.
- DPDK-physical implementation uses xstats to collect statistics.
  Only dropped and error counters are exposed.

Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-10 15:29:13 -08:00
Ilya Maximets
ad8b0b4fe7 netdev: Remove useless cutlen.
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
be48173310 netdev-dpdk: Add debug appctl to get mempool information.
New appctl 'netdev-dpdk/get-mempool-info' implemented to get result
of 'rte_mempool_list_dump()' function if no arguments passed and
'rte_mempool_dump()' if DPDK netdev passed as argument.

Could be used for debugging mbuf leaks and other mempool related
issues. Most useful in pair with `grep -v "cache_count.*=0"`.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-20 21:07:46 +00:00
Michal Weglicki
3eb8d4fa0d netdev-dpdk: extend netdev_dpdk_get_status to include if_type and if_descr
This commit extends netdev_dpdk_get_status API to include additional
driver-related information: if_type and if_descr.

v2->v3: Code rebase.
v3->v4: Minor comments applied.
v5->v6: Adds DPDK port specific description in documentation.

Co-authored-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Mark Kavanagh
a14d1cc8a7 netdev-dpdk: vHost IOMMU support
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.

This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.

This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:

    ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Mark Kavanagh
5e925ccc2a netdev-dpdk: DPDK v17.11 upgrade
This commit adds support for DPDK v17.11:
- minor updates to accomodate DPDK API changes
- update references to DPDK version in Documentation
- update DPDK version in travis' linux-build script
- document DPDK v17.11 virtio driver bug

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Guoshuai Li <ligs@dtdream.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Kevin Traynor
255b7bda98 netdev-dpdk: Remove uneeded call to rte_eth_dev_count().
The call to rte_eth_dev_count() was added as workaround
for rte_eth_dev_get_port_by_name() not handling cases
when there was no DPDK ports.

In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name()
does handle this case (DPDK commit f9ae888b1e19).
rte_eth_dev_count() is no longer needed so remove it.

Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
b2e72a9c9d netdev-dpdk: Add comment about variables naming convention.
It'll be nice to document current naming convention for variables of
the following types used in netdev-dpdk:

	* netdev
	* netdev_dpdk
	* netdev_rxq
	* netdev_rxq_dpdk

to be sure that we will not return to chaos which was before
commit d46285a2206f ("netdev-dpdk: Consistent variable naming.").

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
3d0d5ab153 netdev-dpdk: Fix variables naming in set_admin_state function.
Function 'netdev_dpdk_set_admin_state()' was missed while fixing
variables naming according to the following convention:

    'struct netdev':'netdev'
    'struct netdev_dpdk':'dev'
    'struct netdev_rxq':'rxq'
    'struct netdev_rxq_dpdk':'rx'

Fixes: d46285a2206f ("netdev-dpdk: Consistent variable naming.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokess <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
af5b0dad30 netdev-dpdk: Fix mempool creation with large MTU.
Currently mempool name size limited to 25 characters by
RTE_MEMPOOL_NAMESIZE. netdev-dpdk tries to create mempool with the
following name pattern: "ovs_%{hash}_%{socket}_%{mtu}_%{n_mbuf}".

We have 3 chars for "ovs" + 4 chars for delimiters + 8 chars for
hash (because it's the 32 bit integer printed in hex) + 1 char for
socket_id (mostly 1, but it could be 2 on some systems; larger?) = 16.

Only 25 - 16 = 9 characters remains for mtu + n_mbufs.
Minimum usual value for mtu is 1500 --> 2030 (4 chars) after
dpdk_buf_size conversion and the minimum value for n_mbufs is 16384
(5 chars). So, all the 9 characters are used.

If we'll try to create port with mtu = 9500, mempool creation will
fail, because FRAME_LEN_TO_MTU(dpdk_buf_size(9500)) = 10222 (5 chars)
and this value will overflow the RTE_MEMPOOL_NAMESIZE limit.

Same issue will happen if we'll try to create port with big enough
number of queues or will try to create big enough number of PMD
threads (number of tx queues will enlarge the mempool requirements).

Fix that by removing the delimiters. To keep the readability (at least
partial) of the mempool names exact field sizes with zero padding
are used.

Following limits should be suitable for now:
 - Hash length: 8 chars (uint32_t in hex)
 - Socket ID  : 2 chars (For systems with up to 10 sockets)
 - MTU        : 5 chars (MTU (10^5 - 1) should be enough for now)
 - n_mbufs    : 7 chars (Up to 10^7 of mbufs)

   Total      : 22 + 3 (for "ovs") = 25

CC: Antonio Fischetti <antonio.fischetti@intel.com>
CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Fixes: f06546a51dd8 ("Fix mempool names to reflect socket id.")
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-17 16:26:33 +00:00