According to QEMU documentation (docs/specs/vhost-user.txt) one queue
should be enabled initially. More queues are enabled dynamically, by
sending message VHOST_USER_SET_VRING_ENABLE.
Currently all queues in OVS disabled by default. This breaks above
specification. So, queue #0 should be enabled by default to support
QEMU versions less than 2.5 and fix probable issues if QEMU will not
send VHOST_USER_SET_VRING_ENABLE for queue #0 according to documentation.
Also this will fix currently broken vhost-cuse support in OVS.
Fixes: 585a5beaa2a4 ("netdev-dpdk: vhost-user: Fix sending packets to
queues not enabled by guest.")
Reported-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Since netdev can have multiple IP address use
generic api netdev_get_addr_list(). This also make it
easier to handle IPv4 and IPv6 address across vswitchd
layers.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Device can have multiple IP address but netdev_get_in4/6()
returns only one configured IPv6 address. Following
patch fixes it.
OVS router is also updated to return source ip address for
given destination, This is required when interface has multiple
IP address configured.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
According to netdev-provider API:
'The "destruct" function is not allowed to fail.'
netdev-dpdk breaks this restriction for vhost-user ports.
This leads to SIGABRT or SIGSEGV in dpdk_watchdog thread
because 'dealloc' will be called anyway indifferently
to result of 'destruct'.
For example, if we call
# ovs-vsctl set interface vhost1 ofport_request=5
while QEMU still attached, we'll get:
------------------[cut]------------------
|dpdk|ERR|Can not remove port, vhost device still attached
VHOST_CONFIG: socket created, fd:98
VHOST_CONFIG: fail to bind fd:98, remove file:/home/vhost1 and try again.
|dpdk|ERR|vhost-user socket device setup failure for socket /home/vhost1
|bridge|WARN|could not open network device vhost1 (Unknown error -1)
ovs-vswitchd(dpdk_watchdog1): lib/netdev-dpdk.c:532: ovs_mutex_lock_at()
passed uninitialized ovs_mutex
Program received signal SIGABRT, Aborted.
------------------[cut]------------------
Fix that by removing port anyway even when guest is still
attached. Guest becomes an orphan in that case but OVS
will not crash and will continue forwarding for other ports.
VM restart required to restore connectivity.
Fixes: 58397e6c1e6c ("netdev-dpdk: add dpdk vhost-cuse ports")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Made to simplify creation of derived classes.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
mbufs could be chained (by the "next" field of rte_mbuf struct), when
an mbuf is not big enough to hold a big packet, say when TSO is enabled.
rte_pktmbuf_free_seg() frees the head mbuf only, leading mbuf leaks.
This patch fix it by invoking the right API rte_pktmbuf_free(), to
free all mbufs in the chain.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
This patch provides the modifications required in netdev-dpdk.c and
vswitch.xml to allow for a DPDK user space QoS algorithm.
This patch adds a QoS configuration structure for netdev-dpdk and
expected QoS operations 'dpdk_qos_ops'. Various helper functions
are also supplied.
Also included are the modifications required for vswitch.xml to allow a
new QoS implementation for netdev-dpdk devices. This includes a new QoS type
`egress-policer` as well as its expected QoS table entries.
The QoS functionality implemented for DPDK devices is `egress-policer`.
This can be used to drop egress packets at a configurable rate.
The INSTALL.DPDK.md guide has also been modified to provide an example
configuration of `egress-policer` QoS.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Current mbuf initialization relies on magic numbers and does not
accomodate mbufs of different sizes.
Resolve this issue by ensuring that mbufs are always aligned to a 1k
boundary (a typical DPDK NIC Rx buffer alignment).
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently virtio driver in guest operating system have to be configured
to use exactly same number of queues. If number of queues will be less,
some packets will get stuck in queues unused by guest and will not be
received.
Fix that by using new 'vring_state_changed' callback, which is
available for vhost-user since DPDK 2.2.
Implementation uses additional mapping from configured tx queues to
enabled by virtio driver. This requires mandatory locking of TX queues
in __netdev_dpdk_vhost_send(), but this locking was almost always anyway
because of calling set_multiq with n_txq = 'ovs_numa_get_n_cores() + 1'.
OVS_VHOST_MAX_QUEUE_NUM = 1024 chosen based on the fact that this is
the maximum number of queues supported by QEMU.
Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
This check prevents an obvious way for a vhost-user socket to escape the
intended directory.
There might be other ways to escape the directory (none comes to mind at
the moment), but this is a problem that should be properly solved by
mandatory access control.
A similar check is done for a bridge name, since that name is used as
part of a socket as well.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Current implementation of dpdk_dev_parse_name does not perform a robust
error handling, port names as "dpdkr" and "dpdkr1x" are considered valid.
With this patch only positive port numbers in decimal notation are considered
valid.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
A ring name length of 10 characters is not enough for dpdkr ports
starting from dpdkr10, then it is increased to RTE_RING_NAMESIZE
characters.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Fix issue whereby vhost_thread is waiting for dpdk_watchdog
thread to quiesce and at the same time dpdk_watchdog thread
is waiting for vhost_thread to give up dpdk_mutex.
Reported-by: Patrik Andersson R <patrik.r.andersson@ericsson.com>
Signed-off-by: Patrik Andersson R <patrik.r.andersson@ericsson.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Currently, all of the PMD netdevs can only have the same number of
rx queues, which is specified in other_config:n-dpdk-rxqs.
Fix that by introducing of new option for PMD interfaces: 'n_rxq', which
specifies the maximum number of rx queues to be created for this
interface.
Example:
ovs-vsctl set Interface dpdk0 options:n_rxq=8
Old 'other_config:n-dpdk-rxqs' deleted.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Memory pool for vhost-user ports always created even if construction
fails. And message about successfull socket creation also printed.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While killing OVS may not call rte_vhost_driver_unregister()
for vhost-user ports. As a result corresponding socket will
remain in a system and opening of that port after restart
will fail.
(Even after this patch this remains a problem for signals
that OVS does not or cannot catch, such as SIGSEGV and
SIGKILL.)
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Most of the network cards today supports multiple receive
and transmit queues (MQ). The core idea is that on packet
reception, a NIC can send different packets to different
queues to distribute processing among CPUs running in parallel.
The packet distribution is based on a result of a filter applied
on each packet headers. The filter should keep all packets from
the same flow on the same queue to avoid re-ordering while
distributing different flows among all available queues.
This is how the packet moves in a typical vhost-user use-case:
NIC OVS
DPDK port ==== bridge --- vhost-user ==== qemu ==== virtio eth0
The DPDK ports, OVS bridges, virtio network driver and
recently QEMU (vhost-user) supports MQ. This patch adds MQ
support to OVS that leverages DPDK vhost library to implement
vhost-user interfaces.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Following changes have been applied:
- INSTALL.DPDK.md: change DPDK version number,
- build.sh: change DPDK version number.
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
DPDK build was broken after commit 2f8932e8403a ("poll: Suppress logging
for pmd threads.") due to the following error:
lib/netdev-dpdk.c:245:13: error: static declaration of ‘thread_is_pmd’
follows non-static declaration
lib/ovs-thread.h:526:6: note: previous declaration of ‘thread_is_pmd’
was here
The version used in this file operates in the fastpath, so it cannot
switch to using the newly introduced version; the new version lives
outside of the dpdk portions of OVS so its implementation cannot be
shared with this function. Rename it to resolve the conflict.
Fixes: 2f8932e8403a ("poll: Suppress logging for pmd threads.")
Suggested-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
This avoids a segmentation fault in case of memory allocation failure.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
This change prevents netdev_dpdk from accessing pointer
which is not valid.
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Although netdev does explicit locking, it is only valid from the ovs
perspective, then only the ring ends used by ovs should be declared as
single producer/consumer.
The other ends that are used by the application should be declared as
multiple producer/consumer that is the most general case.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Unregister and delete the socket associated with a vhost-user
port when the port is deleted and/or the switch is brought down.
Do not delete the socket if the vhost-user device is still attached
to the guest.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
dpdk datapath needs to run as root. Block the --user
option for now. It is likely we will revisit this issue for possibly
supporting --user option for dpdk datapath process as well.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
set in 'ol_flags'. Otherwise the hash is garbage and doesn't
relate to the packet.
This fixes an issue with vhost, which, being a virtual NIC, doesn't
compute the hash.
Reported-by: Dongjun <dongj@dtdream.com>
Suggested-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
New stats for vhost ports are rx_bytes, tx_bytes, multicast, rx_errors and
rx_length_errors. New stats for PMD ports are rx_dropped, rx_length_errors,
rx_crc_errors and rx_missed_errors. DPDK imissed packets are now classified
as dropped instead of errors.
Signed-off-by: Timo Puha <timox.puha@intel.com>
Tested-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Update relevant artifacts to add support for DPDK v2.1.0
- INSTALL.DPDK.md
- acinclude.m4: Change DPDK library name
- netdev-dpdk: Limit minimum mbuf size to to adapt to DPDK bug fix that
changes the treatment of the requested mbuf size
- build.sh: Change DPDK version number
Note that this breaks compatibility with DPDK v2.0.0 although only
for the library name change.
Note that throughput for vhost ports with mergeable buffers is reduced
about 10% due to a necessary bug fix in DPDK vhost code.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Timo Puha <timox.puha@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The netdev-dpdk uses the struct ether_addr rather than struct eth_addr
internal ovs datatype.
To facilitate using either the .ea OR the struct ether_addr.addr_bytes
argument for printing/logging, add a new ETH_ADDR_BYTES_ARG() define.
Signed-off-by: Aaron Conole <aconole@redhat.com>
[blp@nicira.com made stylistic changes]
Signed-off-by: Ben Pfaff <blp@nicira.com>
Define struct eth_addr and use it instead of a uint8_t array for all
ethernet addresses in OVS userspace. The struct is always the right
size, and it can be assigned without an explicit memcpy, which makes
code more readable.
"struct eth_addr" is a good type name for this as many utility
functions are already named accordingly.
struct eth_addr can be accessed as bytes as well as ovs_be16's, which
makes the struct 16-bit aligned. All use seems to be 16-bit aligned,
so some algorithms on the ethernet addresses can be made a bit more
efficient making use of this fact.
As the struct fits into a register (in 64-bit systems) we pass it by
value when possible.
This patch also changes the few uses of Linux specific ETH_ALEN to
OVS's own ETH_ADDR_LEN, and removes the OFP_ETH_ALEN, as it is no
longer needed.
This work stemmed from a desire to make all struct flow members
assignable for unrelated exploration purposes. However, I think this
might be a nice code readability improvement by itself.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
It has been observed that some DPDK device (e.g intel xl710) report an
high number of queues but make some of them available only for special
functions (SRIOV). Therefore the queues will be counted in
rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail.
This commit works around the issue by retrying the device initialization
with a smaller number of queues, if a queue fails to setup.
Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
netdev_dpdk_set_multiq() should not set the number of configured rxq
and txq if the driver initialization fails (meaning that the driver
failed to setup the queues). Otherwise, on a subsequent call to
netdev_dpdk_set_multiq(), the code may believe that the queues have
already been setup and there's no work to be done.
This commit fixes the problem by restoring the old values if
dpdk_eth_dev_init() fails.
Reported-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
When tx queue is shared among CPUS,the pkts always be flush
in 'netdev_dpdk_eth_send'. So it is unnecessarily for flushing
in netdev_dpdk_rxq_recv Otherwise tx will be accessed without
locking.
Signed-off-by: Wei li <liw@dtdream.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser.
A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.
vhost-cuse ('dpdkvhost') ports are still available as 'dpdkvhostcuse'
ports and will be enabled if vhost-cuse support is detected in the
DPDK build specified during compilation of the switch. Otherwise,
vhost-user ports are enabled.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This commit changes the semantics of 'netdev_set_multiq()' to allow OVS
DPDK to run on device with limited multi queue support.
* If a netdev doesn't have the requested number of rxqs it can simply
inform the datapath without failing.
* If a netdev doesn't have the requested number of txqs it should try
to create as many as possible and use locking.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
Right now ethernet and ring devices use a mutex, while vhost devices use
a mutex or a spinlock to protect statistics. This commit introduces a
single spinlock that's always used for stats updates.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We used to reserve DPDK lcore 0 for non pmd operations, making it
difficult to use core 0 for packet processing.
DPDK 2.0 properly support non EAL threads with lcore LCORE_ID_ANY.
Using non EAL threads for non pmd threads, we do not need to reserve
any core for non pmd operations
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
DPDK lcore_id is unsigned. We need to support big values like
LCORE_ID_ANY (=UINT32_MAX). Therefore I am changing the type everywhere
in OVS.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This patch simplifies Rx/Tx NIC configuration by removing
custom values and using the defaults provided by the DPDK
PMDs. This also enables Rx vectorisation which improves
performance.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The MAX_PKT_BURST and NETDEV_MAX_RX_BATCH macros had a confusing
relationship. They basically purport to do the same thing, making it
unclear which is the source of truth.
Furthermore, while NETDEV_MAX_RX_BATCH was 256, MAX_PKT_BURST was 32,
meaning we never process a batch larger than 32 packets further adding
to the confusion.
This patch resolves the issue by removing MAX_PKT_BURST completely,
and shrinking the new NETDEV_MAX_BURST macro to only 32. This should
have no change in the execution path except shrinking a couple of
structs and memory allocations (can't hurt).
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
The max allowed burst size for a single vhost enqueue is 32.
This code facilitates trying to send greater than the burst
size of packets to the vhost interface by adding a retry loop
and calling vhost enqueue multiple times. As this could
potentially block, a timeout is added.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Change phy rx burst size from 192 to 32. This aligns the
burst size with the other dpdk interfaces and significantly
improves performance when forwarding to dpdk vhost ports.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Update relevant artifacts to add support for DPDK v2.0.0
- INSTALL.DPDK.md
- travis build script
- acinclude.m4: add 'mssse3' flag to OVS_CFLAGS
- netdev-dpdk: fix build with unified offload types in DPDK v2.0.0
Note that this breaks compatibility with DPDK v1.8.0
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
When using DPDK rings (dpdkr port type), packet buffers get shared
to consumers of the rings (e.g. Virtual Machines). The packet buffers
also include the RSS hash. This is a hash of a number of fields
in the packet and is used in order to do a fast lookup in the EMC.
However, if a consumer of the packet modifies the packet without
regenerating the RSS hash, the EMC will use the same hash for lookup
even though the packet may belong to a different flow. This would
cause unnecessary collisions in the EMC reducing performance in the
presence of multiple flows.
To avoid receiving an incorrect RSS hash on reception from a DPDK
ring, the RSS hash needs to be reset on transmission. This will reduce
performance of the forwarding path as the RSS hash will need to
calculated for every packet received from an dpdkr but will behave
correctly in the presence of a large number of flows that get
modified by the consumer of a DPDK ring
Signed-off-by: Mark D. Gray <mark.d.gray@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
ovsrcu_synchronize() is used when setting virtio_dev to NULL.
This results in an ovsrcu_quiesce_end() call which means the
cuse thread may not go into quiescent state again for an
indefinite time. Add an ovsrcu_quiesce_start() call to prevent
this.
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
If rte_mempool_create() fails with ENOMEM, try asking for a smaller
mempools. This patch enables OVS DPDK to run on systems without 1GB
hugepages
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>