I think it's clearer to use RCU than to check for a pointer twice in the
fast path (before and after taking the spinlock). Now the spinlock is
integrated into 'qos_conf'.
'qos_conf' objects cannot be modified, so, instead of having
'qos_set()', we now have 'qos_is_equal()', which tells us if an object
must be destroyed and recreated.
With this patch we also avoid passing the netdev parameter to qos ops,
since it was unused most of the times.
Lastly, some duplication is removed.
CC: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
The error handling path in dpdk_mp_get() is getting complicated, it
even requires a boolean variable.
Simplify it by extracting the function dpdk_mp_create().
CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
It is not necessary to touch the physical device each time, if the
configuration has not been changed. Also, few style issues fixed.
Thread-safety annotation added to 'dpdk_set_rxq_config()'. It was
missed while previous refactoring of the flow control configuration.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The 'options:n_rxq_desc' and 'n_txq_desc' fields allow the number of rx
and tx descriptors for dpdk ports to be modified. By default the values
are set to 2048, but can be modified to an integer between 1 and 4096
that is a power of two. The values can be modified at runtime, however
require the NIC to restart when changed.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Coding style violations of the following conventions are present in netdev-dpdk.c:
- limit lines to 79 characters
- put a space after (but not before) the "sizeof" keyword
- put a space between the () used in a cast and the
expression whose type is cast: (void *) 0.
Resolve occurrences of each, and any other minor style infractions.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Pointers to struct rte_mbuf are typically denoted within functions as
'pkt'; similarly, arrays of, and pointer-to-pointer to, struct rte_mbuf
are denoted by 'pkts'.
Update discrepancies to the above convention for consistency.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
'dpdk_mutex' protects two independent things: list of dpdk devices
and list of memory pools. Let's spit it in two to avoid global blocking
inside 'netdev_dpdk.*_reconfigure()' as possible.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Current error message incorrect for the client mode.
Fixes: c1ff66ac80b5 ("netdev-dpdk: vHost client mode and reconnect")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
'vhost_driver_flags' and 'vhost_id' are mutable and must be protected
by 'dev->mutex'.
Fixes: 2d24d165d6a5 ("netdev-dpdk: Add new 'dpdkvhostuserclient' port type")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
'dmp' should be freed on failure and on put.
Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.")
Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The 'dpdkvhostuser' port type no longer supports both server and client
mode. Instead, 'dpdkvhostuser' ports are always 'server' mode and
'dpdkvhostuserclient' ports are always 'client' mode.
Suggested-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
If NUMA information can't be derived from a vHost User device, only
print an error if the VHOST_NUMA option is enabled in DPDK. Otherwise
'fail' silently.
Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
'netdev_dpdk_send__()' function can be greatly simplified by using
recently introduced 'netdev_dpdk_filter_packet_len()'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This patch introduces function 'netdev_dpdk_filter_packet_len()' which is
intended to find and remove all packets with 'pkt_len > max_packet_len'
from the Tx batch.
It fixes inaccurate counting of 'tx_bytes' in vHost case if there was
dropped packets and allows to simplify send function.
Fixes: 0072e931b207 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Until now, vHost ports in OVS have only been able to operate in 'server'
mode whereby OVS creates and manages the vHost socket and essentially
acts as the vHost 'server'. With this commit a new mode, 'client' mode,
is available. In this mode, OVS acts as the vHost 'client' and connects
to the socket created and managed by QEMU which now acts as the vHost
'server'. This mode allows for reconnect capability, which allows a
vHost port to resume normal connectivity in event of switch reset.
By default dpdkvhostuser ports still operate in 'server' mode. That is
unless a valid 'vhost-server-path' is specified for a device like so:
ovs-vsctl set Interface dpdkvhostuser0
options:vhost-server-path=/path/to/socket
'vhost-server-path' represents the full path of the vhost user socket
that has been or will be created by QEMU. Once specified, the port stays
in 'client' mode for the remainder of its lifetime.
QEMU v2.7.0+ is required when using OVS in vHost client mode and QEMU in
vHost server mode.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
A mix of vhost_user_ and vhost_ is used when naming vhost functions. The
'user_' has been dropped for consistency. Also remove empty init
functions for netdev dpdk classes.
Suggested-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Daniele Di Proietto <diproiettod at vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
This commit removes the 'dpdkvhostcuse' port type from the userspace
datapath. vhost-cuse ports are quickly becoming obsolete as the
vhost-user port type begins to support a greater feature-set thanks to
the addition of things like vhost-user multiqueue and potential
upcoming features like vhost-user client-mode and vhost-user reconnect.
The feature is also expected to be removed from DPDK soon.
One potential drawback of the removal of this support is that a
userspace vHost port type is not available in OVS for use with older
versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old
this should however be a low impact change.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Only 'dpdk' ports support flow control. This patch stops 'dpdkr' ports
from attempting to initialise this feature as this port type does not
support it.
Fixes: 9fd39370c12c ("netdev-dpdk: Add Flow Control support.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.
Using this approach, the amount of memory allocated to each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame of a specific
size can be carried in a single mbuf, as opposed to partitioning
it across multiple mbuf segments.
The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
parameter.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
[diproiettod@vmware.com rebased]
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The 'mtu_request' column can be used to set the MTU of a specific
interface.
This column is useful because it will allow changing the MTU of DPDK
devices (implemented in a future commit), which are not accessible
outside the ovs-vswitchd process, but it can be used for kernel
interfaces as well.
The current implementation of set_mtu() in netdev-dpdk is removed
because it's broken. It will be reintroduced by a subsequent commit on
this series.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
This commit provides the ability to 'listen' on DPDK ports and save
packets to a pcap file with a DPDK app that uses the librte_pdump
library. One such app is the 'pdump' app that can be found in the DPDK
'app' directory. Instructions on how to use this can be found in
INSTALL.DPDK-ADVANCED.md
Pdump capability in OVS with DPDK will only be initialised if the
CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are
set in DPDK. libpcap is required if the above configuration is used.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will
free mbufs while executing 'netdev_dpdk_policer_run()'. After
that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()'
if 'may_steal == true'. This behaviour will break mempool.
Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't
do this ('may_steal == false'). This will lead to using of already freed
packets by the upper layers.
Fix that by copying all packets that we can't steal like it done
for DPDK_DEV_ETH devices and freeing only packets not freed by QoS.
Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Binding/unbinding of virtio driver inside VM leads to reconfiguration
of PMD threads. This behaviour may be abused by executing bind/unbind
in an infinite loop to break normal networking on all ports attached
to the same instance of Open vSwitch.
Fix that by avoiding reconfiguration if it's not necessary.
Number of queues will not be decreased to 1 on device disconnection but
it's not very important in comparison with possible DOS attack from the
inside of guest OS.
Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost
ports from attached virtio.")
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When egress policer is set as a QoS type for a port, an error may occur during
setup if incorrect parameters are used for the rte_meter. If this occurs
the egress policer construct and set functions should free any allocated
memory relevant to the policer and set the QoS configuration pointer to
null. The netdev_dpdk_set_qos function should check the error value returned
for any QoS construct/set calls with an assertion to avoid segfault.
Also this commit modifies egress_policer_qos_set() to correctly lock the QoS
spinlock while the egress policer configuration is updated to avoid
segfault.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Clang reports that value stored to 'tok' during initialization is never
read.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback. destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.
This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.
This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.
Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This patch sets *typep to an empty string instead of letting
it uninitialized when no QoS configuration is set.
It fixes the following vswitchd crash when no QoS has been set
on vhost-user interface:
$> ovs-appctl -t ovs-vswitchd qos/show vhost-user1
#0 0x00007efcbadf18d7 in raise () from /lib64/libc.so.6
#1 0x00007efcbadf353a in abort () from /lib64/libc.so.6
#2 0x000000000068d5be in ovs_abort_valist at lib/util.c:335
#3 0x0000000000693d90 in vlog_abort_valist at lib/vlog.c:1204
#4 0x0000000000693e17 in vlog_abort at lib/vlog.c:1218
#5 0x000000000068d3ae in ovs_assert_failure at lib/util.c:72
#6 0x000000000060425c in ds_put_format_valist at lib/dynamic-string.c:168
#7 0x00000000006042e7 in ds_put_format at lib/dynamic-string.c:142
#8 0x00000000005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185
#9 0x000000000068cda1 in process_command at lib/unixctl.c:347
#11 unixctl_server_run at lib/unixctl.c:400
#12 0x000000000040a3ff in main at vswitchd/ovs-vswitchd.c:113
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
DPDK v16.07 introduces the ability to free memzones.
Up until this point, DPDK memory pools created in OVS could
not be destroyed, thus incurring a memory leak.
Leverage the DPDK v16.07 rte_mempool API to free DPDK
mempools when their associated reference count reaches 0 (this
indicates that the memory pool is no longer in use).
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This commit introduces support for DPDK 16.07 and consequently breaks
compatibility with DPDK 16.04.
DPDK 16.07 introduces some changes to various APIs. These have been
updated in OVS, including:
* xstats API: changes to structure of xstats
* vhost API: replace virtio-net references with 'vid'
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Add support for flow-control(mac control frame) to DPDK enabled physical
port types. By default, the flow-control is OFF on both rx and tx side.
The flow control can be enabled/disabled either when adding a port to OVS
or at run time.
For eg:
To enable flow control support at tx side while adding a port, add the
'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below.
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true'
Similarly to enable rx flow control,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true'
And to enable the flow control auto-negotiation,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true'
To turn ON the tx flow control at run time(After the port is being added
to OVS), the command-line input will be,
'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true'
The flow control parameters can be turned off by setting 'false' to the
respective parameter. To dsiable the flow control at tx side,
'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false'
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Bhanuprakash Bodireddy <Bhanuprakash.bodireddy@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.
For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
port: vhost-user1 queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1 queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1 queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1 queue-id: 3
<------------------------------------------------------------------------>
As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.
By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.
In previous example:
pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5
So, for dpdk0 port after truncating in netdev-dpdk:
core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1
As a result only 2 of 4 queues used.
To fix this issue some kind of XPS implemented in following way:
* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
allocated TX queue-ids. If queue wasn't used in last
XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
* XPS is not working if we have enough TX queues.
Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Put "driver_name" into "args" twice, that's meaninglessness.
So need to remove duplicated code.
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Instead of looping into each packet and check whether to truncate, the
patch moves it out of the loop and uses batch API. If truncation is
not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis
can skip the per-packet checking overhead.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata. This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers. With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, there are few inconsistencies in ways to configure number of
queues for netdev device:
* dpif-netdev can't know about exact number of queues
allocated inside netdev.
This leads to constant mapping of queue-ids to 'real' ones.
* We are able to configure 'n_rxq' for vhost-user devices, but
there is only one sane number of rx queues which must be used
and configured manually (number of queues that allocated
in QEMU).
This patch disables configuration of 'n_rxq' for DPDK vHost devices.
Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.
Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues
in the device.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Current implementarion of TX packet's queueing is broken in several ways:
* TX queue flushing implemented on receive assumes that all
core_id-s are sequential and starts from zero. This may lead
to situation when packets will stuck in queue forever and,
also, this influences on latency.
* For a long time flushing logic depends on uninitialized
'txq_needs_locking', because it usually calculated after
'netdev_dpdk_alloc_txq' but used inside of this function
for initialization of 'flush_tx'.
Testing shows no performance difference with and without queueing.
Lets remove queueing at all because it doesn't work properly now and
also does not increase performance.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
'rte_eth_link_get_nowait()' works only with physical ports.
In case of vhost-user port, 'link' will stay uninitialized and there
will be random messages in log about link status.
Ex.:
|dpdk(dpdk_watchdog2)|DBG|Port -1 Link Up - speed 10000 Mbps - full-duplex
Fix that by calling 'check_link_status()' only for physical ports.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The patch adds a new action to support packet truncation. The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).
One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload. Example
use case is below as well as shown in the testcases:
- Output to port 1 with max_len 100 bytes.
- The output packet size on port 1 will be MIN(original_packet_size, 100).
# ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'
- The scope of max_len is limited to output action itself. The following
packet size of output:1 and output:2 will be intact.
# ovs-ofctl add-flow br0 \
'actions=output(port=1,max_len=100),output:1,output:2'
- The Datapath actions shows:
# Datapath actions: trunc(100),1,1,2
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
This commit allows for vHost User memory from QEMU, DPDK and OVS, as
well as the servicing PMD, to all come from the same socket.
The socket id of a vhost-user port used to be set to that of the master
lcore. Now it is possible to update the socket id if it is detected
(during VM boot) that the vhost device memory is not on this node. If
this is the case, a new mempool is created from the new node, and the
PMD thread currently servicing the port will no longer, in favour of a
thread from the new node (if enabled in the pmd-cpu-mask).
To avail of this functionality, one must enable the
CONFIG_RTE_LIBRTE_VHOST_NUMA DPDK configuration option.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
If the guest is connected but not servicing the virt queue, this leads
to vhost send retries until timeout. This is fine in isolation but if
there are other high rate queues also being serviced by the same PMD
it can lead to a performance hit on those queues. Change to only retry
when at least some packets have been successfully sent on the previous
attempt.
Also, limit retries to avoid a similar delays if packets are being sent
at a very low rate due to few available descriptors.
Reported-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This commit moves the code that sets the pmd threads affinity from
netdev-dpdk to ovs-numa. There's one small part left in netdev-dpdk, to
set the lcore_id.
Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when
starting a pmd thread.
This change will allow having a dummy implementation of the set affinity
call, for testing purposes.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is
DOWN, even when traffic is running through the port between a Virtual
Machine and the vSwitch. Changing admin state with the
"ovs-ofctl mod-port <BR> <PORT> up/down" command over OpenFlow does
affect neither the reported link state nor the traffic.
The patch below does the flowing:
- Triggers link state change by altering netdev's change_seq member.
- Controls sending/receiving of packets through vhost-user port
according to the port's current admin state.
- Sets admin state of newly created vhost-user port to UP.
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This patch provides the modifications required in netdev-dpdk.c and
vswitch.xml to enable ingress policing for DPDK interfaces.
This patch implements the necessary netdev functions to netdev-dpdk.c as
well as various helper functions required for ingress policing.
The vswitch.xml has been modified to explain the expected parameters and
behaviour when using ingress policing.
The INSTALL.DPDK.md guide has been modified to provide an example
configuration of ingress policing.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Add generic policer functions to avoid code duplication.
Policing can be implemented on both egress and ingress paths.
Currently the QoS egress-policer implementation uses it's own specific run
and packet handle policer functions. This patch makes the policer functions
generic so that they can be used regardless of whether the policer is egress
or ingress by just requiring a pointer to the rte_meter used for policing
to be passed.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.
When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().
The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().
This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.
This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()). The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required(). After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.
This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>