netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which
can trigger the destroy_device() callback. destroy_device() will try to
take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a
deadlock.
This problem can be solved by dropping the mutexes before calling
rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and
construct() call are already serialized by netdev_mutex.
This commit also makes clear that dev->vhost_id is constant and can be
accessed without taking any mutexes in the lifetime of the devices.
Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak")
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This patch sets *typep to an empty string instead of letting
it uninitialized when no QoS configuration is set.
It fixes the following vswitchd crash when no QoS has been set
on vhost-user interface:
$> ovs-appctl -t ovs-vswitchd qos/show vhost-user1
#0 0x00007efcbadf18d7 in raise () from /lib64/libc.so.6
#1 0x00007efcbadf353a in abort () from /lib64/libc.so.6
#2 0x000000000068d5be in ovs_abort_valist at lib/util.c:335
#3 0x0000000000693d90 in vlog_abort_valist at lib/vlog.c:1204
#4 0x0000000000693e17 in vlog_abort at lib/vlog.c:1218
#5 0x000000000068d3ae in ovs_assert_failure at lib/util.c:72
#6 0x000000000060425c in ds_put_format_valist at lib/dynamic-string.c:168
#7 0x00000000006042e7 in ds_put_format at lib/dynamic-string.c:142
#8 0x00000000005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185
#9 0x000000000068cda1 in process_command at lib/unixctl.c:347
#11 unixctl_server_run at lib/unixctl.c:400
#12 0x000000000040a3ff in main at vswitchd/ovs-vswitchd.c:113
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
DPDK v16.07 introduces the ability to free memzones.
Up until this point, DPDK memory pools created in OVS could
not be destroyed, thus incurring a memory leak.
Leverage the DPDK v16.07 rte_mempool API to free DPDK
mempools when their associated reference count reaches 0 (this
indicates that the memory pool is no longer in use).
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This commit introduces support for DPDK 16.07 and consequently breaks
compatibility with DPDK 16.04.
DPDK 16.07 introduces some changes to various APIs. These have been
updated in OVS, including:
* xstats API: changes to structure of xstats
* vhost API: replace virtio-net references with 'vid'
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Add support for flow-control(mac control frame) to DPDK enabled physical
port types. By default, the flow-control is OFF on both rx and tx side.
The flow control can be enabled/disabled either when adding a port to OVS
or at run time.
For eg:
To enable flow control support at tx side while adding a port, add the
'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below.
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true'
Similarly to enable rx flow control,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true'
And to enable the flow control auto-negotiation,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true'
To turn ON the tx flow control at run time(After the port is being added
to OVS), the command-line input will be,
'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true'
The flow control parameters can be turned off by setting 'false' to the
respective parameter. To dsiable the flow control at tx side,
'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false'
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Bhanuprakash Bodireddy <Bhanuprakash.bodireddy@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.
For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<------------------------------------------------------------------------>
pmd thread numa_id 0 core_id 13:
port: vhost-user1 queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1 queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1 queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1 queue-id: 3
<------------------------------------------------------------------------>
As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.
By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.
In previous example:
pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5
So, for dpdk0 port after truncating in netdev-dpdk:
core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1
As a result only 2 of 4 queues used.
To fix this issue some kind of XPS implemented in following way:
* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
allocated TX queue-ids. If queue wasn't used in last
XPS_TIMEOUT_MS milliseconds it will be freed while revalidation.
* XPS is not working if we have enough TX queues.
Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Put "driver_name" into "args" twice, that's meaninglessness.
So need to remove duplicated code.
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Instead of looping into each packet and check whether to truncate, the
patch moves it out of the loop and uses batch API. If truncation is
not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis
can skip the per-packet checking overhead.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
To easily allow both in- and out-of-tree building of the Python
wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to
include/openvswitch. This also requires moving lib/{hmap,shash}.h.
Both hmap.h and shash.h were #include-ing "util.h" even though the
headers themselves did not use anything from there, but rather from
include/openvswitch/util.h. Fixing that required including util.h
in several C files mostly due to OVS_NOT_REACHED and things like
xmalloc.
Signed-off-by: Terry Wilson <twilson@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces
batch process functions and 'struct dp_packet_batch' to associate with
batch-level metadata. This patch applies the packet batch object to
the netdev provider interface (dummy, Linux, BSD, and DPDK) so that
batch APIs can be used in providers. With batch metadata visible in
providers, optimizations can be introduced at per-batch level instead
of per-packet.
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, there are few inconsistencies in ways to configure number of
queues for netdev device:
* dpif-netdev can't know about exact number of queues
allocated inside netdev.
This leads to constant mapping of queue-ids to 'real' ones.
* We are able to configure 'n_rxq' for vhost-user devices, but
there is only one sane number of rx queues which must be used
and configured manually (number of queues that allocated
in QEMU).
This patch disables configuration of 'n_rxq' for DPDK vHost devices.
Configuration of rx and tx queues now automatically applied from
connected virtio device. Standard reconfiguration mechanism was used to
apply this changes.
Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues
in the device.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Current implementarion of TX packet's queueing is broken in several ways:
* TX queue flushing implemented on receive assumes that all
core_id-s are sequential and starts from zero. This may lead
to situation when packets will stuck in queue forever and,
also, this influences on latency.
* For a long time flushing logic depends on uninitialized
'txq_needs_locking', because it usually calculated after
'netdev_dpdk_alloc_txq' but used inside of this function
for initialization of 'flush_tx'.
Testing shows no performance difference with and without queueing.
Lets remove queueing at all because it doesn't work properly now and
also does not increase performance.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
'rte_eth_link_get_nowait()' works only with physical ports.
In case of vhost-user port, 'link' will stay uninitialized and there
will be random messages in log about link status.
Ex.:
|dpdk(dpdk_watchdog2)|DBG|Port -1 Link Up - speed 10000 Mbps - full-duplex
Fix that by calling 'check_link_status()' only for physical ports.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The patch adds a new action to support packet truncation. The new action
is formatted as 'output(port=n,max_len=m)', as output to port n, with
packet size being MIN(original_size, m).
One use case is to enable port mirroring to send smaller packets to the
destination port so that only useful packet information is mirrored/copied,
saving some performance overhead of copying entire packet payload. Example
use case is below as well as shown in the testcases:
- Output to port 1 with max_len 100 bytes.
- The output packet size on port 1 will be MIN(original_packet_size, 100).
# ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)'
- The scope of max_len is limited to output action itself. The following
packet size of output:1 and output:2 will be intact.
# ovs-ofctl add-flow br0 \
'actions=output(port=1,max_len=100),output:1,output:2'
- The Datapath actions shows:
# Datapath actions: trunc(100),1,1,2
Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
This commit allows for vHost User memory from QEMU, DPDK and OVS, as
well as the servicing PMD, to all come from the same socket.
The socket id of a vhost-user port used to be set to that of the master
lcore. Now it is possible to update the socket id if it is detected
(during VM boot) that the vhost device memory is not on this node. If
this is the case, a new mempool is created from the new node, and the
PMD thread currently servicing the port will no longer, in favour of a
thread from the new node (if enabled in the pmd-cpu-mask).
To avail of this functionality, one must enable the
CONFIG_RTE_LIBRTE_VHOST_NUMA DPDK configuration option.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
If the guest is connected but not servicing the virt queue, this leads
to vhost send retries until timeout. This is fine in isolation but if
there are other high rate queues also being serviced by the same PMD
it can lead to a performance hit on those queues. Change to only retry
when at least some packets have been successfully sent on the previous
attempt.
Also, limit retries to avoid a similar delays if packets are being sent
at a very low rate due to few available descriptors.
Reported-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This commit moves the code that sets the pmd threads affinity from
netdev-dpdk to ovs-numa. There's one small part left in netdev-dpdk, to
set the lcore_id.
Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when
starting a pmd thread.
This change will allow having a dummy implementation of the set affinity
call, for testing purposes.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is
DOWN, even when traffic is running through the port between a Virtual
Machine and the vSwitch. Changing admin state with the
"ovs-ofctl mod-port <BR> <PORT> up/down" command over OpenFlow does
affect neither the reported link state nor the traffic.
The patch below does the flowing:
- Triggers link state change by altering netdev's change_seq member.
- Controls sending/receiving of packets through vhost-user port
according to the port's current admin state.
- Sets admin state of newly created vhost-user port to UP.
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This patch provides the modifications required in netdev-dpdk.c and
vswitch.xml to enable ingress policing for DPDK interfaces.
This patch implements the necessary netdev functions to netdev-dpdk.c as
well as various helper functions required for ingress policing.
The vswitch.xml has been modified to explain the expected parameters and
behaviour when using ingress policing.
The INSTALL.DPDK.md guide has been modified to provide an example
configuration of ingress policing.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Add generic policer functions to avoid code duplication.
Policing can be implemented on both egress and ingress paths.
Currently the QoS egress-policer implementation uses it's own specific run
and packet handle policer functions. This patch makes the policer functions
generic so that they can be used regardless of whether the policer is egress
or ingress by just requiring a pointer to the rte_meter used for policing
to be passed.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This introduces in dpif-netdev and netdev-dpdk the first use for the
newly introduce reconfigure netdev call.
When a request to change the number of queues comes, netdev-dpdk will
remember this and notify the upper layer via
netdev_request_reconfigure().
The datapath, instead of periodically calling netdev_set_multiq(), can
detect this and call reconfigure().
This mechanism can also be used to:
* Automatically match the number of rxq with the one provided by qemu
via the new_device callback.
* Provide a way to change the MTU of dpdk devices at runtime.
* Move a DPDK vhost device to the proper NUMA socket.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
A netdev provider, especially a PMD provider (like netdev DPDK) might
not be able to change some of its parameters (such as MTU, or number of
queues) without stopping everything and restarting.
This commit introduces a mechanism that allows a netdev provider to
request a restart (netdev_request_reconfigure()). The upper layer can
be notified via netdev_wait_reconf_required() and
netdev_is_reconf_required(). After closing all the rxqs the upper layer
can finally call netdev_reconfigure(), to make sure that the new
configuration is in place.
This will be used by next commit to reconfigure rx and tx queues in
netdev-dpdk.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Prevent pthread_setaffinity_np() being called with a potentially
invalid cpu_set_t and add a default (core 0x1).
Also, only call pthread_getaffinity_np() if no dpdk-lcore-mask specified.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Only set the thread affinity back to the pre rte_eal_init() value
when the user has not specified a coremask.
Fixes: 88964e6428dc("netdev-dpdk: Autofill lcore coremask if absent")
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Clang complains:
lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every path
through here [-Werror,-Wthread-safety-analysis]
}
^
lib/netdev-dpdk.c:1815:5: note: mutex acquired here
ovs_mutex_lock(&dev->mutex);
^
./include/openvswitch/thread.h:60:9: note: expanded from macro 'ovs_mutex_lock'
ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR)
^
Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on RFC 2819.")
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When no vhost-sock-dir value is provided, print the default location.
Update the documentation to reflect the fact that vhost-sock-dir values
are now subdirectory loctions rather than full paths.
Fixes: d8a8f353c23e ("netdev-dpdk: Restrict vhost_sock_dir")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Implementation of new statistics extension for DPDK ports:
- Add new counters definition to netdev struct and open flow,
based on RFC2819.
- Initialize netdev statistics as "filtered out"
before passing it to particular netdev implementation
(because of that change, statistics which are not
collected are reported as filtered out, and some
unit tests were modified in this respect).
- New statistics are retrieved using experimenter code and
are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- Add new vendor id: INTEL_VENDOR_ID.
- New statistics are printed to output via ofctl only if those
are present in reply message.
- Add new file header: include/openflow/intel-ext.h which
contains new statistics definition.
- Extended statistics are implemented only for dpdk-physical
and dpdk-vhost port types.
- Dpdk-physical implementation uses xstats to collect statistics.
- Dpdk-vhost implements only part of statistics (RX packet sized
based counters).
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
[blp@ovn.org made software devices more consistent]
Signed-off-by: Ben Pfaff <blp@ovn.org>
A previous patch introduced the ability to pass arbitrary EAL command
line options via the dpdk_extras database entry. This commit enhances
that by warning the user when such a configuration is detected and
prefering the value in the database.
Suggested-by: Sean K Mooney <sean.k.mooney@intel.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Tested-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
A previous change moved some commonly used arguments from commandline to
the database, and with it the ability to pass arbitrary arguments to
EAL. This change allows arbitrary eal arguments to be provided
via a new db entry 'other_config:dpdk-extra' which will tokenize the
string and add it to the argument list. The only argument which will not
be supported with this change is '--no-huge', which appears to break the
system in other ways.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com>
Tested-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The user has control over the DPDK internal lcore coremask, but this
parameter can be autofilled with a bit more intelligence. If the user
does not fill this parameter in, we use the lowest set bit in the
current task CPU affinity. Otherwise, we will reassign the current
thread to the specified lcore mask, in addition to the dpdk lcore
threads.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com>
Tested-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Since the vhost-user sockets directory now comes from the database, it is
possible for any user with database access to program an arbitrary filesystem
location for the sockets directory. This could result in unprivileged users
creating or deleting arbitrary filesystem files by using specially crafted
names. To prevent this, 'vhost-sock-dir' is now relative to ovs_rundir()
and must not contain "..".
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.
This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
When the DPDK init function is called, it changes the executing thread's
CPU affinity to a single core specified in -c. This will result in the
userspace bridge configuration thread being rebound, even if that is not
the intent.
This change fixes that behavior by rebinding to the original thread
affinity after calling dpdk_init().
Co-authored-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com>
Tested-by: Sean K Mooney <sean.k.mooney@intel.com>
Acked-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Following changes are applied:
- INSTALL.DPDK.md: CONFIG_RTE_BUILD_COMBINE_LIBS step has been
removed because it is no longer present in DPDK configuration
(combined library is created by default),
- INSTALL.DPDK.md: VHost Cuse configuration is updated,
- netdev-dpdk.c: Link speed definition is changed in DPDK and
netdev_dpdk_get_features is updated accordingly,
- netdev-dpdk.c: TSO and checksum offload has been disabled for
vhostuser device.
- .travis/linux-build.sh: DPDK version is updated and legacy
flags have been removed in configuration.
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
In different functions we use different variable names ('netdev_', 'netdev',
'dev', 'vhost_dev', ...) for the same objects.
This commit changes the code to comply with the following convention:
'struct netdev':'netdev'
'struct netdev_dpdk':'dev'
'struct virtio_net':'virtio_dev'
'struct netdev_rxq':'rxq'
'struct netdev_rxq_dpdk':'rx'
Also, 'dev->up.' is replaced by 'netdev->', where 'netdev' was already
defined.
Suggested-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
All code is now in include/openvswitch/list.h.
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
According to QEMU documentation (docs/specs/vhost-user.txt) one queue
should be enabled initially. More queues are enabled dynamically, by
sending message VHOST_USER_SET_VRING_ENABLE.
Currently all queues in OVS disabled by default. This breaks above
specification. So, queue #0 should be enabled by default to support
QEMU versions less than 2.5 and fix probable issues if QEMU will not
send VHOST_USER_SET_VRING_ENABLE for queue #0 according to documentation.
Also this will fix currently broken vhost-cuse support in OVS.
Fixes: 585a5beaa2a4 ("netdev-dpdk: vhost-user: Fix sending packets to
queues not enabled by guest.")
Reported-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Since netdev can have multiple IP address use
generic api netdev_get_addr_list(). This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
According to netdev-provider API:
'The "destruct" function is not allowed to fail.'
netdev-dpdk breaks this restriction for vhost-user ports.
This leads to SIGABRT or SIGSEGV in dpdk_watchdog thread
because 'dealloc' will be called anyway indifferently
to result of 'destruct'.
For example, if we call
# ovs-vsctl set interface vhost1 ofport_request=5
while QEMU still attached, we'll get:
------------------[cut]------------------
|dpdk|ERR|Can not remove port, vhost device still attached
VHOST_CONFIG: socket created, fd:98
VHOST_CONFIG: fail to bind fd:98, remove file:/home/vhost1 and try again.
|dpdk|ERR|vhost-user socket device setup failure for socket /home/vhost1
|bridge|WARN|could not open network device vhost1 (Unknown error -1)
ovs-vswitchd(dpdk_watchdog1): lib/netdev-dpdk.c:532: ovs_mutex_lock_at()
passed uninitialized ovs_mutex
Program received signal SIGABRT, Aborted.
------------------[cut]------------------
Fix that by removing port anyway even when guest is still
attached. Guest becomes an orphan in that case but OVS
will not crash and will continue forwarding for other ports.
VM restart required to restore connectivity.
Fixes: 58397e6c1e6c ("netdev-dpdk: add dpdk vhost-cuse ports")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Made to simplify creation of derived classes.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
mbufs could be chained (by the "next" field of rte_mbuf struct), when
an mbuf is not big enough to hold a big packet, say when TSO is enabled.
rte_pktmbuf_free_seg() frees the head mbuf only, leading mbuf leaks.
This patch fix it by invoking the right API rte_pktmbuf_free(), to
free all mbufs in the chain.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This patch provides the modifications required in netdev-dpdk.c and
vswitch.xml to allow for a DPDK user space QoS algorithm.
This patch adds a QoS configuration structure for netdev-dpdk and
expected QoS operations 'dpdk_qos_ops'. Various helper functions
are also supplied.
Also included are the modifications required for vswitch.xml to allow a
new QoS implementation for netdev-dpdk devices. This includes a new QoS type
`egress-policer` as well as its expected QoS table entries.
The QoS functionality implemented for DPDK devices is `egress-policer`.
This can be used to drop egress packets at a configurable rate.
The INSTALL.DPDK.md guide has also been modified to provide an example
configuration of `egress-policer` QoS.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Current mbuf initialization relies on magic numbers and does not
accomodate mbufs of different sizes.
Resolve this issue by ensuring that mbufs are always aligned to a 1k
boundary (a typical DPDK NIC Rx buffer alignment).
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently virtio driver in guest operating system have to be configured
to use exactly same number of queues. If number of queues will be less,
some packets will get stuck in queues unused by guest and will not be
received.
Fix that by using new 'vring_state_changed' callback, which is
available for vhost-user since DPDK 2.2.
Implementation uses additional mapping from configured tx queues to
enabled by virtio driver. This requires mandatory locking of TX queues
in __netdev_dpdk_vhost_send(), but this locking was almost always anyway
because of calling set_multiq with n_txq = 'ovs_numa_get_n_cores() + 1'.
OVS_VHOST_MAX_QUEUE_NUM = 1024 chosen based on the fact that this is
the maximum number of queues supported by QEMU.
Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This check prevents an obvious way for a vhost-user socket to escape the
intended directory.
There might be other ways to escape the directory (none comes to mind at
the moment), but this is a problem that should be properly solved by
mandatory access control.
A similar check is done for a bridge name, since that name is used as
part of a socket as well.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>