2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

326 Commits

Author SHA1 Message Date
Daniele Di Proietto
819f13bd39 netdev-dpdk: Acquire dev->stats_lock only once.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
2016-10-12 15:30:04 -07:00
Daniele Di Proietto
78bd47cf44 netdev-dpdk: Use RCU for egress QoS.
I think it's clearer to use RCU than to check for a pointer twice in the
fast path (before and after taking the spinlock). Now the spinlock is
integrated into 'qos_conf'.

'qos_conf' objects cannot be modified, so, instead of having
'qos_set()', we now have 'qos_is_equal()', which tells us if an object
must be destroyed and recreated.

With this patch we also avoid passing the netdev parameter to qos ops,
since it was unused most of the times.

Lastly, some duplication is removed.

CC: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
2016-10-12 15:25:45 -07:00
Daniele Di Proietto
2ae3d5421b netdev-dpdk: Refactor dpdk_mp_get().
The error handling path in dpdk_mp_get() is getting complicated, it
even requires a boolean variable.

Simplify it by extracting the function dpdk_mp_create().

CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
2016-10-12 15:16:34 -07:00
Ilya Maximets
b614c894ee netdev-dpdk: Configure flow control only when necessary.
It is not necessary to touch the physical device each time, if the
configuration has not been changed. Also, few style issues fixed.

Thread-safety annotation added to 'dpdk_set_rxq_config()'. It was
missed while previous refactoring of the flow control configuration.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-30 10:59:00 -07:00
Ciara Loftus
b685696b8c netdev-dpdk: Allow configurable queue sizes for 'dpdk' ports
The 'options:n_rxq_desc' and 'n_txq_desc' fields allow the number of rx
and tx descriptors for dpdk ports to be modified. By default the values
are set to 2048, but can be modified to an integer between 1 and 4096
that is a power of two. The values can be modified at runtime, however
require the NIC to restart when changed.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-30 10:58:39 -07:00
Mark Kavanagh
58be5c0eec netdev-dpdk: Fix coding style
Coding style violations of the following conventions are present in netdev-dpdk.c:
    - limit lines to 79 characters
    - put a space after (but not before) the "sizeof" keyword
    - put a space between the () used in a cast and the
      expression whose type is cast: (void *) 0.

Resolve occurrences of each, and any other minor style infractions.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-29 11:41:33 -07:00
Mark Kavanagh
2391135ca7 netdev-dpdk: consistent naming for mbuf variables
Pointers to struct rte_mbuf are typically denoted within functions as
'pkt'; similarly, arrays of, and pointer-to-pointer to, struct rte_mbuf
are denoted by 'pkts'.

Update discrepancies to the above convention for consistency.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-29 11:41:07 -07:00
Ilya Maximets
c2adb102e2 netdev-dpdk: Introduce dpdk_mp_mutex.
'dpdk_mutex' protects two independent things: list of dpdk devices
and list of memory pools. Let's spit it in two to avoid global blocking
inside 'netdev_dpdk.*_reconfigure()' as possible.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-29 11:04:11 -07:00
Ilya Maximets
4196454379 netdev-dpdk: More correct log message on vhost_driver_unregister failure.
Current error message incorrect for the client mode.

Fixes: c1ff66ac80b5 ("netdev-dpdk: vHost client mode and reconnect")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-23 13:27:21 -07:00
Ilya Maximets
6881885a3b netdev-dpdk: Add missed lock in set_config for vhost client mode.
'vhost_driver_flags' and 'vhost_id' are mutable and must be protected
by 'dev->mutex'.

Fixes: 2d24d165d6a5 ("netdev-dpdk: Add new 'dpdkvhostuserclient' port type")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-23 13:23:14 -07:00
Ilya Maximets
5f88de0d9f netdev-dpdk: Fix memory leak in dpdk_mp_{get, put}().
'dmp' should be freed on failure and on put.

Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.")
Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-19 14:32:52 -07:00
Ciara Loftus
2d24d165d6 netdev-dpdk: Add new 'dpdkvhostuserclient' port type
The 'dpdkvhostuser' port type no longer supports both server and client
mode. Instead, 'dpdkvhostuser' ports are always 'server' mode and
'dpdkvhostuserclient' ports are always 'client' mode.

Suggested-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-09-19 14:02:06 -07:00
Ciara Loftus
5b9bf9e067 netdev-dpdk: Fix occurance of error log
If NUMA information can't be derived from a vHost User device, only
print an error if the VHOST_NUMA option is enabled in DPDK. Otherwise
'fail' silently.

Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-18 17:04:27 -07:00
Ilya Maximets
6b094bf47b netdev-dpdk: Simplify send function for ETH devices.
'netdev_dpdk_send__()' function can be greatly simplified by using
recently introduced 'netdev_dpdk_filter_packet_len()'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-18 13:15:52 -07:00
Ilya Maximets
c6ec9d176d netdev-dpdk: Fix vHost stats.
This patch introduces function 'netdev_dpdk_filter_packet_len()' which is
intended to find and remove all packets with 'pkt_len > max_packet_len'
from the Tx batch.

It fixes inaccurate counting of 'tx_bytes' in vHost case if there was
dropped packets and allows to simplify send function.

Fixes: 0072e931b207 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-18 13:15:52 -07:00
Ciara Loftus
c1ff66ac80 netdev-dpdk: vHost client mode and reconnect
Until now, vHost ports in OVS have only been able to operate in 'server'
mode whereby OVS creates and manages the vHost socket and essentially
acts as the vHost 'server'. With this commit a new mode, 'client' mode,
is available. In this mode, OVS acts as the vHost 'client' and connects
to the socket created and managed by QEMU which now acts as the vHost
'server'. This mode allows for reconnect capability, which allows a
vHost port to resume normal connectivity in event of switch reset.

By default dpdkvhostuser ports still operate in 'server' mode. That is
unless a valid 'vhost-server-path' is specified for a device like so:

ovs-vsctl set Interface dpdkvhostuser0
options:vhost-server-path=/path/to/socket

'vhost-server-path' represents the full path of the vhost user socket
that has been or will be created by QEMU. Once specified, the port stays
in 'client' mode for the remainder of its lifetime.

QEMU v2.7.0+ is required when using OVS in vHost client mode and QEMU in
vHost server mode.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-15 17:29:12 -07:00
Ciara Loftus
53f50d2400 netdev-dpdk: Consistent naming for vhost
A mix of vhost_user_ and vhost_ is used when naming vhost functions. The
'user_' has been dropped for consistency. Also remove empty init
functions for netdev dpdk classes.

Suggested-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Daniele Di Proietto <diproiettod at vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
2016-08-15 17:29:12 -07:00
Ciara Loftus
4198764443 netdev-dpdk: Remove dpdkvhostcuse ports
This commit removes the 'dpdkvhostcuse' port type from the userspace
datapath. vhost-cuse ports are quickly becoming obsolete as the
vhost-user port type begins to support a greater feature-set thanks to
the addition of things like vhost-user multiqueue and potential
upcoming features like vhost-user client-mode and vhost-user reconnect.
The feature is also expected to be removed from DPDK soon.

One potential drawback of the removal of this support is that a
userspace vHost port type is not available in OVS for use with older
versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old
this should however be a low impact change.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-08-15 17:29:12 -07:00
Ciara Loftus
c3d062a777 netdev-dpdk: Do not attempt to initialise flow control for 'dpdkr' ports
Only 'dpdk' ports support flow control. This patch stops 'dpdkr' ports
from attempting to initialise this feature as this port type does not
support it.

Fixes: 9fd39370c12c ("netdev-dpdk: Add Flow Control support.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-15 13:10:18 -07:00
Ciara Loftus
7cd1261d13 netdev-dpdk: Use rte_eth_is_valid_port instead of manual check
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Mauricio Vásquez B <mauricio.vasquez@polito.it>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-15 13:10:18 -07:00
Mark Kavanagh
0072e931b2 netdev-dpdk: add support for jumbo frames
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
[diproiettod@vmware.com rebased]
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-12 19:33:56 -07:00
Daniele Di Proietto
56abcf497b vswitchd: Introduce 'mtu_request' column in Interface.
The 'mtu_request' column can be used to set the MTU of a specific
interface.

This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.

The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken.  It will be reintroduced by a subsequent commit on
this series.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-08-12 19:32:12 -07:00
Ciara Loftus
4b88d6787d netdev-dpdk: add DPDK pdump capability
This commit provides the ability to 'listen' on DPDK ports and save
packets to a pcap file with a DPDK app that uses the librte_pdump
library. One such app is the 'pdump' app that can be found in the DPDK
'app' directory. Instructions on how to use this can be found in
INSTALL.DPDK-ADVANCED.md

Pdump capability in OVS with DPDK will only be initialised if the
CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are
set in DPDK. libpcap is required if the above configuration is used.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-12 17:56:43 -07:00
Ilya Maximets
dd52de45b7 netdev-dpdk: vhost: Fix double free and use after free with QoS.
While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will
free mbufs while executing 'netdev_dpdk_policer_run()'. After
that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()'
if 'may_steal == true'. This behaviour will break mempool.

Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't
do this ('may_steal == false'). This will lead to using of already freed
packets by the upper layers.

Fix that by copying all packets that we can't steal like it done
for DPDK_DEV_ETH devices and freeing only packets not freed by QoS.

Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-10 18:42:57 -07:00
Ilya Maximets
7f5f2bd0ce netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.
Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.

Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.

Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
                      ports from attached virtio.")
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-09 17:35:50 -07:00
Ian Stokes
7ea266e953 netdev-dpdk: Fix egress policer error detection bug.
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.
Also this commit modifies egress_policer_qos_set() to correctly lock the QoS
spinlock while the egress policer configuration is updated to avoid
segfault.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-09 15:01:10 -07:00
Bhanuprakash Bodireddy
27373420c8 netdev-dpdk: Fix dead initialization reported by clang.
Clang reports that value stored to 'tok' during initialization is never
read.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-09 14:54:20 -07:00
Daniele Di Proietto
3f891bbea6 netdev-dpdk: Fix deadlock in destroy_device().
netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback.  destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.

This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister().  The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.

This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.

Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2016-08-09 11:15:29 -07:00
Ben Pfaff
13c1637f5b smap: New function smap_get_ullong().
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
2016-08-08 11:00:37 -07:00
Maxime Coquelin
d03603c485 netdev-dpdk: When no QoS set, set type to empty string
This patch sets *typep to an empty string instead of letting
it uninitialized when no QoS configuration is set.

It fixes the following vswitchd crash when no QoS has been set
on vhost-user interface:

 $> ovs-appctl -t ovs-vswitchd qos/show vhost-user1

 #0  0x00007efcbadf18d7 in raise () from /lib64/libc.so.6
 #1  0x00007efcbadf353a in abort () from /lib64/libc.so.6
 #2  0x000000000068d5be in ovs_abort_valist at lib/util.c:335
 #3  0x0000000000693d90 in vlog_abort_valist at lib/vlog.c:1204
 #4  0x0000000000693e17 in vlog_abort at lib/vlog.c:1218
 #5  0x000000000068d3ae in ovs_assert_failure at lib/util.c:72
 #6  0x000000000060425c in ds_put_format_valist at lib/dynamic-string.c:168
 #7  0x00000000006042e7 in ds_put_format at lib/dynamic-string.c:142
 #8  0x00000000005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185
 #9  0x000000000068cda1 in process_command at lib/unixctl.c:347
 #11 unixctl_server_run at lib/unixctl.c:400
 #12 0x000000000040a3ff in main at vswitchd/ovs-vswitchd.c:113

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-04 17:35:04 -07:00
Mark Kavanagh
8d38823bdf netdev-dpdk: fix memory leak
DPDK v16.07 introduces the ability to free memzones.
Up until this point, DPDK memory pools created in OVS could
not be destroyed, thus incurring a memory leak.

Leverage the DPDK v16.07 rte_mempool API to free DPDK
mempools when their associated reference count reaches 0 (this
indicates that the memory pool is no longer in use).

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-04 15:45:48 -07:00
Ciara Loftus
0a0f39df1d netdev-dpdk: Add support for DPDK 16.07
This commit introduces support for DPDK 16.07 and consequently breaks
compatibility with DPDK 16.04.

DPDK 16.07 introduces some changes to various APIs. These have been
updated in OVS, including:
* xstats API: changes to structure of xstats
* vhost API:  replace virtio-net references with 'vid'

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-08-03 18:09:37 -07:00
Sugesh Chandran
9fd39370c1 netdev-dpdk: Add Flow Control support.
Add support for flow-control(mac control frame) to DPDK enabled physical
port types. By default, the flow-control is OFF on both rx and tx side.
The flow control can be enabled/disabled either when adding a port to OVS
or at run time.

For eg:
To enable flow control support at tx side while adding a port, add the
'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below.

 'ovs-vsctl add-port br0 dpdk0 -- \
  set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true'

Similarly to enable rx flow control,
 'ovs-vsctl add-port br0 dpdk0 -- \
  set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true'

And to enable the flow control auto-negotiation,
 'ovs-vsctl add-port br0 dpdk0 -- \
  set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true'

To turn ON the tx flow control at run time(After the port is being added
to OVS), the command-line input will be,
 'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true'

The flow control parameters can be turned off by setting 'false' to the
respective parameter. To dsiable the flow control at tx side,
 'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false'

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Bhanuprakash Bodireddy <Bhanuprakash.bodireddy@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-29 17:56:32 -07:00
Ilya Maximets
324c837485 dpif-netdev: XPS (Transmit Packet Steering) implementation.
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
        port: vhost-user1       queue-id: 1
        port: dpdk0     queue-id: 3
pmd thread numa_id 0 core_id 14:
        port: vhost-user1       queue-id: 2
pmd thread numa_id 0 core_id 16:
        port: dpdk0     queue-id: 0
pmd thread numa_id 0 core_id 17:
        port: dpdk0     queue-id: 1
pmd thread numa_id 0 core_id 12:
        port: vhost-user1       queue-id: 0
        port: dpdk0     queue-id: 2
pmd thread numa_id 0 core_id 15:
        port: vhost-user1       queue-id: 3
<------------------------------------------------------------------------>

As we can see above dpdk0 port polled by threads on cores:
	12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

	pmd thread on core 12 will send packets to tx queue 0
	pmd thread on core 13 will send packets to tx queue 1
	...
	pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

	core 12 --> TX queue-id 0 % 4 == 0
	core 13 --> TX queue-id 1 % 4 == 1
	core 16 --> TX queue-id 4 % 4 == 0
	core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

	* TX queue-ids are allocated dynamically.
	* When PMD thread first time tries to send packets to new port
	  it allocates less used TX queue for this port.
	* PMD threads periodically performes revalidation of
	  allocated TX queue-ids. If queue wasn't used in last
	  XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
        * XPS is not working if we have enough TX queues.

Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-27 12:56:04 -07:00
xubinbin
b379037079 netdev-dpdk: remove duplicated code in netdev_dpdk_get_status
Put "driver_name" into "args" twice, that's meaninglessness.
So need to remove duplicated code.

Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-25 12:58:27 -07:00
William Tu
7d6d1a40dc netdev-dpdk: Apply batch truncation API.
Instead of looping into each packet and check whether to truncate, the
patch moves it out of the loop and uses batch API.  If truncation is
not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis
can skip the per-packet checking overhead.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-25 12:47:20 -07:00
Terry Wilson
ee89ea7b47 json: Move from lib to include/openvswitch.
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.

Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.

Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-22 17:09:17 -07:00
William Tu
64839cf432 netdev-provider: Apply batch object to netdev provider.
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata.  This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers.  With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-21 16:46:32 -07:00
Ilya Maximets
81acebdaaf netdev-dpdk: Obtain number of queues for vhost ports from attached virtio.
Currently, there are few inconsistencies in ways to configure number of
queues for netdev device:

	* dpif-netdev can't know about exact number of queues
	  allocated inside netdev.
	  This leads to constant mapping of queue-ids to 'real' ones.

	* We are able to configure 'n_rxq' for vhost-user devices, but
	  there is only one sane number of rx queues which must be used
	  and configured manually (number of queues that allocated
	  in QEMU).

This patch disables configuration of 'n_rxq' for DPDK vHost devices.
Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.

Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues
in the device.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-08 15:27:21 -07:00
Ilya Maximets
b59cc14e03 netdev-dpdk: Use instant sending instead of queueing of packets.
Current implementarion of TX packet's queueing is broken in several ways:

	* TX queue flushing implemented on receive assumes that all
	  core_id-s are sequential and starts from zero. This may lead
	  to situation when packets will stuck in queue forever and,
	  also, this influences on latency.

	* For a long time flushing logic depends on uninitialized
	  'txq_needs_locking', because it usually calculated after
	  'netdev_dpdk_alloc_txq' but used inside of this function
	  for initialization of 'flush_tx'.

Testing shows no performance difference with and without queueing.
Lets remove queueing at all because it doesn't work properly now and
also does not increase performance.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-07-06 15:46:35 -07:00
Ilya Maximets
1f5b157ece netdev-dpdk: Fix using uninitialized link_status.
'rte_eth_link_get_nowait()' works only with physical ports.
In case of vhost-user port, 'link' will stay uninitialized and there
will be random messages in log about link status.

Ex.:
|dpdk(dpdk_watchdog2)|DBG|Port -1 Link Up - speed 10000 Mbps - full-duplex

Fix that by calling 'check_link_status()' only for physical ports.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-06-24 14:43:29 -07:00
William Tu
aaca4fe0ce ofp-actions: Add truncate action.
The patch adds a new action to support packet truncation.  The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).

One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload.  Example
use case is below as well as shown in the testcases:

    - Output to port 1 with max_len 100 bytes.
    - The output packet size on port 1 will be MIN(original_packet_size, 100).
    # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'

    - The scope of max_len is limited to output action itself.  The following
      packet size of output:1 and output:2 will be intact.
    # ovs-ofctl add-flow br0 \
            'actions=output(port=1,max_len=100),output:1,output:2'
    - The Datapath actions shows:
    # Datapath actions: trunc(100),1,1,2

Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
2016-06-24 09:17:00 -07:00
Ciara Loftus
db8f13b020 netdev-dpdk: NUMA Aware vHost User
This commit allows for vHost User memory from QEMU, DPDK and OVS, as
well as the servicing PMD, to all come from the same socket.

The socket id of a vhost-user port used to be set to that of the master
lcore. Now it is possible to update the socket id if it is detected
(during VM boot) that the vhost device memory is not on this node. If
this is the case, a new mempool is created from the new node, and the
PMD thread currently servicing the port will no longer, in favour of a
thread from the new node (if enabled in the pmd-cpu-mask).

To avail of this functionality, one must enable the
CONFIG_RTE_LIBRTE_VHOST_NUMA DPDK configuration option.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-06-17 17:11:55 -07:00
Kevin Traynor
31871ee383 netdev-dpdk: Remove vhost send retries when no packets have been sent.
If the guest is connected but not servicing the virt queue, this leads
to vhost send retries until timeout. This is fine in isolation but if
there are other high rate queues also being serviced by the same PMD
it can lead to a performance hit on those queues. Change to only retry
when at least some packets have been successfully sent on the previous
attempt.

Also, limit retries to avoid a similar delays if packets are being sent
at a very low rate due to few available descriptors.

Reported-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-06-14 18:45:03 -07:00
Daniele Di Proietto
6930c7e01c ovs-numa: Introduce function to set current thread affinity.
This commit moves the code that sets the pmd threads affinity from
netdev-dpdk to ovs-numa.  There's one small part left in netdev-dpdk, to
set the lcore_id.

Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when
starting a pmd thread.

This change will allow having a dummy implementation of the set affinity
call, for testing purposes.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-06-07 11:15:01 -07:00
Zoltán Balogh
e543851d21 netdev-dpdk: vhost-user port link state fix
OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is
DOWN, even when traffic is running through the port between a Virtual
Machine and the vSwitch. Changing admin state with the
"ovs-ofctl mod-port <BR> <PORT> up/down" command over OpenFlow does
affect neither the reported link state nor the traffic.

The patch below does the flowing:
 - Triggers link state change by altering netdev's change_seq member.
 - Controls sending/receiving of packets through vhost-user port
   according to the port's current admin state.
 - Sets admin state of newly created vhost-user port to UP.

Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-06-02 14:26:02 -07:00
Ian Stokes
9509913aa7 netdev-dpdk.c: Add ingress-policing functionality.
This patch provides the modifications required in netdev-dpdk.c and
vswitch.xml to enable ingress policing for DPDK interfaces.

This patch implements the necessary netdev functions to netdev-dpdk.c as
well as various helper functions required for ingress policing.

The vswitch.xml has been modified to explain the expected parameters and
behaviour when using ingress policing.

The INSTALL.DPDK.md guide has been modified to provide an example
configuration of ingress policing.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-05-24 13:37:25 -07:00
Ian Stokes
f3926f297b netdev-dpdk.c: Add generic policer functions.
Add generic policer functions to avoid code duplication.

Policing can be implemented on both egress and ingress paths.
Currently the QoS egress-policer implementation uses it's own specific run
and packet handle policer functions. This patch makes the policer functions
generic so that they can be used regardless of whether the policer is egress
or ingress by just requiring a pointer to the rte_meter used for policing
to be passed.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-05-24 13:34:23 -07:00
Daniele Di Proietto
050c60bfb5 netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.

When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().

The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().

This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
  via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
2016-05-23 10:27:42 -07:00
Daniele Di Proietto
790fb3b745 netdev: Add reconfigure request mechanism.
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.

This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()).  The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required().  After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.

This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
2016-05-23 10:27:42 -07:00