Reset the DPDK hwol flags in dp_packet_init_. The new hwol bad checksum
flag is uninitialized for non-dpdk ports and this is noticed as test
failures using netdev-dummy ports, when built with the --with-dpdk
flag set. Hence, in this case, packets may be falsely marked as having a
bad checksum. The existing APIs are simplified at the same time by
making them specific to either DPDK or otherwise; they also now
manage a single field.
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html
Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation functions.")
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Commit 5dcde09c80a8 was introduced to make detaching more
automatic without using an additional command beyond
ovs-vsctl del-port <br> <port>.
Sometimes, since commit 5dcde09c80a8, dpdk devices are
not detached when del-port is issued; command example:
sudo ovs-vsctl del-port br0 dpdk1
This can happen when vswitchd is (re)started with an existing
database and devices are already bound to dpdk.
A minimal recipe to reproduce the issue is:
1/ Starting with
darrell@prmh-nsx-perf-server125:~$ sudo ovs-vsctl show
1c50d8ee-b17f-4fac-a595-03b0da8c8275
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "dpdk1"
Interface "dpdk1"
type: dpdk
options: {dpdk-devargs="0000:04:00.1"}
Port "dpdk0"
Interface "dpdk0"
type: dpdk
options: {dpdk-devargs="0000:04:00.0"}
darrell@prmh-nsx-perf-server125:~$ /usr/src/dpdk-16.11/tools/dpdk-devbind.py --status
Network devices using DPDK-compatible driver
============================================
0000:04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci
0000:04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci
2/ restart vswitchd
3/ run
sudo ovs-vsctl del-port br0 dpdk1
and find the interface is NOT detached; there is
no info log ‘Device '0000:04:00.1' detached’.
A more verbose discussion is here:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333462.html
along with another possible solution.
Since we are nearing the end of a release, a safe approach is needed,
at this time.
One approach is to revert 5dcde09c80a8. This patch does not do that
but reinstates the command ovs-appctl netdev-dpdk/detach to handle
cases when del-port will not work.
To detach the device, run the reinstated command
ovs-appctl netdev-dpdk/detach 0000:04:00.1
Observe console output
‘Device '0000:04:00.1' has been detached’
Fixes: 5dcde09c80a8 ("netdev-dpdk: Fix device leak on port deletion.")
CC: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Fischetti, Antonio <antonio.fischetti@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Upgrading to DPDK 17.05.1 stable release adds new
significant features relevant to OVS, including,
but not limited to:
- tun/tap PMD,
- VFIO hotplug support,
- Generic flow API.
Following changes are applied:
- netdev-dpdk: Changes required by DPDK API modifications.
- doc: Because of DPDK API changes, backward compatibility
with previous DPDK releases will be broken, thus all
relevant documentation entries are updated.
- .travis: DPDK version change from 16.11.1 to 17.05.1.
- rhel/openvswitch-fedora.spec.in: DPDK version change
from 16.11 to 17.05.1
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
DPDK provides an API to set the MTU of compatible physical devices -
rte_eth_dev_set_mtu(). Prior to DPDK v16.07 however, this API was not
implemented in some DPDK PMDs (i40e, specifically). To allow the use
of jumbo frames with affected NICs in OvS-DPDK, MTU configuration was
achieved by setting the jumbo frame flag, and corresponding maximum
permitted Rx frame size, in an rte_eth_conf structure for the NIC
port, and subsequently invoking rte_eth_dev_configure() with that
configuration.
However, that method does not set the MTU field of the underlying DPDK
structure (rte_eth_dev) for the corresponding physical device;
consequently, rte_eth_dev_get_mtu() reports the incorrect MTU for an
OvS-DPDK phy device with non-standard MTU.
Resolve this issue by invoking rte_eth_dev_set_mtu() when setting up
or modifying the MTU of a DPDK phy port.
Fixes: 0072e93 ("netdev-dpdk: add support for jumbo frames")
Reported-by: Aaron Conole <aconole@redhat.com>
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Rx checksum offload is enabled by default on DPDK NICs where
supported. Previously Rx checksum offload not supported was
logged only once. It meant that if multiple NICs did not
support Rx checksum offload, it was only reported for the
first NIC configured.
Fixes: 1a2bb11817a4 ("netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.")
Reported-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Rx checksum offload is enabled by default on DPDK physical NICs
where available, with reconfiguration through
options:rx-checksum-offload. However, changing rx-checksum-offload
did not result in a reconfiguration of the NIC and wrong status is
reported for it.
As there seems to be diminishing reasons why a user would want
to disable Rx checksum offload, just remove the broken reconfiguration
option.
Fixes: 1a2bb11817a4 ("netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Suggested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based
on packet_type of flow. If it's about an Ethernet packet, it is set to
ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set
according to the name space type.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Some pmd driver(e.g: vNIC thunderx PMD) want mbuf_size to be multiple of
cache_line_size. With out this fix, Netdev-dpdk initialization would
fail for those PMD.
Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Add a new API interface for offloading dpif flows to netdev.
The API consist on the following:
flow_put - offload a new flow
flow_get - query an offloaded flow
flow_del - delete an offloaded flow
flow_flush - flush all offloaded flows
flow_dump_* - dump all offloaded flows
In upcoming commits we will introduce an implementation of this
API for netdev-linux.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Since vhost-user server mode ports are the preferred mechanism for
interconnecting Open vSwitch with VMs when using DPDK, and since there
are currently no known use cases for vhost-user server mode ports apart
from version incompatibilities with QEMU, announce that server mode ports
are considered deprecated and will be removed in a future release.
Cc: Ciara Loftus <ciara.loftus@intel.com>
Cc: Kevin Traynor <ktraynor@redhat.com>
Suggested-by: Darrell Ball <dball@vmware.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently ovs-appctl dpctl/show only shows the Rx checksum offload
status when true. Change to also show the status when false.
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently, signed integer is used for 'port_id' variable and
'-1' as identifier of bad or uninitialized 'port_id'.
This inconsistent with dpdk library and, also, in few cases,
leads to passing '-1' to dpdk functions where uint8_t expected.
Such behaviour doesn't produce any issues, but it's better to
use same type as in dpdk library for consistency.
Introduced 'dpdk_port_t' typedef for better maintainability.
Also, magic number '-1' replaced with DPDK_ETH_PORT_ID_INVALID
macro.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Currently, once created device in dpdk will exist forever
even after del-port operation untill we manually call
'ovs-appctl netdev-dpdk/detach <name>', where <name> is not
the port's name but the name of dpdk eth device or pci address.
Few issues with current implementation:
1. Different API for usual (system) and DPDK devices.
(We have to call 'ovs-appctl netdev-dpdk/detach' each
time after 'del-port' to actually free the device)
This is a big issue mostly for virtual DPDK devices.
2. Follows from 1:
For DPDK devices 'del-port' leads just to
'rte_eth_dev_stop' and subsequent 'add-port' will
just start the already existing device. Such behaviour
will not reset the device to initial state as it could
be expected. For example: virtual pcap pmd will continue
reading input file instead of reading it from the beginning.
3. Follows from 2:
After execution of the following commands 'port1' will be
configured with the 'old-options' while 'ovs-vsctl show'
will show us 'new-options' in dpdk-devargs field:
ovs-vsctl add-port port1 -- set interface port1 type=dpdk \
options:dpdk-devargs=<eth_pmd_name1>,<old-options>
ovs-vsctl del-port port1
ovs-vsctl add-port port1 -- set interface port1 type=dpdk \
options:dpdk-devargs=<eth_pmd_name1>,<new-options>
4. Follows from 1:
Not detached device consumes 'port_id'. Since we have very
limited number of 'port_id's (32 in common case) this may
lead to quick exhausting of id pool and inability to add any
other port.
To avoid above issues we need to detach all the attached devices on
port destruction.
appctl 'netdev-dpdk/detach' removed because not needed anymore.
We need to use internal 'attached' variable to track ports on
which rte_eth_dev_attach() was called and returned successfully
to avoid closing and detaching devices that do not support hotplug or
by any other reason attached using the 'dpdk-extra' cmdline options.
CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: 55e075e65ef9 ("netdev-dpdk: Arbitrary 'dpdk' port naming")
Fixes: 69876ed78611 ("netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
'devargs' for virtual devices contains not only name but
also a list of arguments like this:
'net_pcap0,rx_pcap=file_rx.pcap,tx_pcap=file_tx.pcap'
or
'eth_af_packet0,iface=eth0'
We must cut off the arguments from this string before calling
'rte_eth_dev_get_port_by_name()' to avoid double attaching of
the same device.
CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: 69876ed78611 ("netdev-dpdk: Add support for virtual DPDK PMDs (vdevs)")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Billy O'Mahony <billy.o.mahony@intel.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
This patch enables already implemented ifInMulticastPkts counter in sFlow for
DPDK interfaces. Metric is retrieved from DPDK by using extended statistic API
and stored in 'multicast' member of netdev_stats structure, which represents
number of incoming packets that were addressed to a multicast address.
Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Fix line lengths to be <= 79 as per coding style and so that checkpatch
will not show up existing warnings on these files.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
In current implementation port_id is used as an ifindex for all netdev-dpdk
interfaces.
For physical DPDK interfaces using port_id as ifindex causes that '0' is set as
ifindex for 'dpdk0' interface, '1' for 'dpdk1' and so on. For the DPDK vHost
interfaces ifindexes are not even assigned (0 is used by default) due to the
fact that vHost ports don't use port_id field from the DPDK library.
This causes multiple negative side-effects. First of all 0 is an invalid
ifindex value. The other issue is possible overlapping of 'dpdkX' interfaces
ifindex values with the ifindexes of kernel space interfaces which may cause
problems in any external tools that use those values. Neither 'dpdk0', nor any
DPDK vHost interfaces are visible in sFlow collector tools, as all interfaces
with ifindexes smaller than 1 are ignored.
Proposed solution to these issues is to calculate a hash of interface's name
and use calculated value as an ifindex. This way interfaces keep their
ifindexes during OVS-DPDK restarts, ports re-initialization events, etc., show
up in sFlow collectors and meet RFC 2863 specification regarding re-using
ifindex values by the same virtual interfaces and maximum ifindex value.
Signed-off-by: Przemyslaw Lal <przemyslawx.lal@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked by: Darrell Ball <dlu998@gmail.com>
DPDK 16.07 introduced the support for mempool offload support.
rte_pktmbuf_pool_create is the recommended method for creating pktmbuf
pools. Buffer pools created with rte_mempool_create may not get offloaded
to the underlying offloaded mempools.
This patch, changes the rte_mempool_create to use helper wrapper
"rte_pktmbuf_pool_create" provided by dpdk, so that it can leverage
offloaded mempools.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Acked-by: Jianbo Liu <jianbo.liu@linaro.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This gives much better performance for linux apps in the guest without
affecting dpdk applications in the guest.
I'm creating this patch on the basis of performance results outlined below.
In summary it appears that enabling INDIRECT_DESC on DPDK vHostUser ports
leads to very large increase in performance when using linux stack
applications in the guest with no noticable performance drop for DPDK based
applications in the guest.
Test#1 (VM-VM iperf3 performance)
VMs use DPDK vhostuser ports
OVS bridge is configured for normal action.
OVS version 603381a (on 2.7.0 branch but before release,
also seen on v2.6.0 and v2.6.1)
DPDK v16.11
QEMU v2.5.0 (also seen with v2.7.1)
Results:
INDIRECT_DESC enabled 5.30 Gbit/s
INDIRECT_DESC disabled 0.05 Gbit/s
Test#2 (Phy-VM-Phy RFC2544 Throughput)
DPDK PMDs are polling NIC, DPDK loopback app running in guest.
OVS bridge is configured with port forwarding to VM (via dpdkvhostuser ports).
OVS version 603381a (on 2.7.0 branch but before release),
other versions not tested.
DPDK v16.11
QEMU v2.5.0 (also seen with v2.7.1)
Results:
INDIRECT_DESC enabled 2.75 Mpps @64B pkts (0.176 Gbit/s)
INDIRECT_DESC disabled 2.75 Mpps @64B pkts (0.176 Gbit/s)
Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Darrell Ball <dlu998@gmail.com>
netdev_dpdk_mempool_configure obtains a handle to a
DPDK memory pool via a call to dpdk_mp_get. If dpdk_mp_get
fails, the former informs the user that insufficient memory
is available, and returns ENOMEM. However, this is
potentially misleading, as there are a number of reasons why
creation of a mempool can fail (as per rte_mempool_create),
including:
- insufficient memory available
- mempool already exists
- other memory allocation error
Update the error log to reflect this fact, and return rte_errno
in the event of error, instead of ENOMEM.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
The dpdk_mp_get() function can return a NULL pointer which leads to a
segfault when a mempool cannot be created. The lack of a return value
check for the function netdev_dpdk_mempool_configure() when called in
netdev_dpdk_reconfigure() can result in a segfault also as
a NULL pointer for the mempool will be passed to rte_eth_rx_queue_setup().
Fix this by adding appropriate NULL pointer and return value checks to
dpdk_mp_get(), netdev_dpdk_reconfigure() and dpdk_vhost_reconfigure_helper().
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Fixes: 2ae3d542 ("netdev-dpdk: Refactor dpdk_mp_get().")
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
CC: Daniele Di Proietto <diproiettod@vmware.com>
CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
"rx_error" stat for a DPDK interface was calculated with the assumption that
dropped packets due to hardware buffer overload were counted as errors
in DPDK and the rte ierror stat included rte imissed packets i.e.
rx_errors = rte_stats.ierrors - rte_stats.imissed
This results in negative statistic values as imissed packets are no longer
counted as part of ierror since DPDK v.16.04.
Fix this by setting rx_errors equal to ierrors only.
Fixes: 9e3ddd45 (netdev-dpdk: Add some missing statistics.)
CC: Timo Puha <timox.puha@intel.com>)
Reported-by: Stepan Andrushko <stepanx.andrushko@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Some refactoring for _construct() and _destruct() methods:
* Rename netdev_dpdk_init() to common_construct(). init() has a
different meaning in the netdev context.
* Remove DPDK_DEV_ETH and DPDK_DEV_VHOST checks in common_construct()
and move them to different functions
* Introduce common_destruct().
* Avoid taking 'dev->mutex' in construct and destruct: we're guaranteed
to be the only thread with access to the object.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Since commit 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
we don't call rte_eth_start() from netdev_open() anymore, we only call
it from netdev_reconfigure(). This commit does that also for 'dpdkr'
devices, and remove some useless code.
Calling rte_eth_start() also from netdev_open() was unnecessary and
wasteful. Not doing it reduces code duplication and makes adding a port
faster (~900ms before the patch, ~400ms after).
Another reason why this is useful is that some DPDK driver might have
problems with reconfiguration. For example, until DPDK commit
8618d19b52b1("net/vmxnet3: reallocate shared memzone on re-config"),
vmxnet3 didn't support being restarted with a different number of
queues.
Technically, the netdev interface changed because before opening rxqs or
calling netdev_send() the user must check if reconfiguration is
required. This patch also documents that, even though no change to the
userspace datapath (the only user) is required.
Lastly, this patch makes sure the errors returned by ofproto_port_add
(which includes the first port reconfiguration) are reported back to the
database.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Calling rte_eth_dev_stop() while the device is running causes a crash.
We could use rte_eth_dev_set_link_down(), but not every PMD implements
that, and I found one NIC where that has no effect.
Instead, this commit checks if the device has the NETDEV_UP flag when
transmitting or receiving (similarly to what we do for vhostuser). I
didn't notice any performance difference with this check in case the
device is up.
An alternative would be to remove the device queues from the pmd threads
tx and receive cache, but that requires reconfiguration and I'd prefer
to avoid it, because the change can come from OpenFlow.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
set_config() is used to identify a DPDK device, so it's better to report
its detailed error message to the user. Tunnel devices and patch ports
rely a lot on set_config() as well.
This commit adds a param to set_config() that can be used to return
an error message and makes use of that in netdev-dpdk and netdev-vport.
Before this patch:
$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set
configuration (Invalid argument). See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
After this patch:
$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing
'options:dpdk-devargs'. The old 'dpdk<port_id>' names are not
supported. See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires
valid 'peer' argument. See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type
requires valid 'remote_ip' argument. See ovs-vswitchd log for
details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
We can hotplug attach DPDK ports specified via the 'dpdk-devargs'
option now.
But the socket id of DPDK ports can't be assigned correctly,
it is always 0. The socket id of DPDK ports should be assigned
according to the numa id of the device.
Fixes: 55e075e65ef9e ("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Prior to this commit, the 'dpdk' port type could only be used for
physical DPDK devices. Now, virtual devices (or 'vdevs') are supported.
'vdev' devices are those which use virtual DPDK Poll Mode Drivers eg.
null, pcap. To add a DPDK vdev, a valid 'dpdk-devargs' must be set for
the given dpdk port. The format expected is 'eth_<driver_name><x>' where
'x' is a number between 0 and RTE_MAX_ETHPORTS -1.
For example to add a port that uses the 'null' DPDK PMD driver:
ovs-vsctl set Interface null0 options:dpdk-devargs=eth_null0
Not all DPDK vdevs have been verified to work at this point in time.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Stephen Finucane <stephen@that.guru> # docs only
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
'dpdk' ports no longer have naming restrictions. Now, instead of
specifying the dpdk port ID as part of the name, the PCI address of the
device must be specified via the 'dpdk-devargs' option. eg.
ovs-vsctl add-port br0 my-port
ovs-vsctl set Interface my-port type=dpdk
options:dpdk-devargs=0000:06:00.3
The user must no longer hotplug attach DPDK ports by issuing the
specific ovs-appctl netdev-dpdk/attach command. The hotplug is now
automatically invoked when a valid PCI address is set in the
dpdk-devargs. The format for ovs-appctl netdev-dpdk/detach command
has changed in that the user now must specify the relevant PCI address
as input instead of the port name.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Stephen Finucane <stephen@that.guru> # docs only
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
In order to use dpdk ports in ovs they have to be bound to a DPDK
compatible driver before ovs is started.
This patch adds the possibility to hotplug (or hot-unplug) a device
after ovs has been started. The implementation adds two appctl commands:
netdev-dpdk/attach and netdev-dpdk/detach
After the user attaches a new device, it has to be added to a bridge
using the add-port command, similarly, before detaching a device,
it has to be removed using the del-port command.
Signed-off-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Co-authored-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Stephen Finucane <stephen@that.guru> # docs only
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Rename some structures that call themselves ivshmem,
as they are just a collection of dpdk rings and other
information.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.
The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,
'ovs-vsctl add-port br0 dpdk0 -- \
set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
OR (to disable at run time after port is being added to OVS)
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
Similarly to turn ON rx checksum offloading at run time,
'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
The Tx checksum offloading support is not implemented due to the following
reasons.
1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.
2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,
*) Mark every packets for tx checksum offloading at 'tunnel_push' and
recirculate.
*) At the time of xmit, validate the same flag and instruct the NIC to do the
checksum calculation. In case NIC doesnt support Tx checksum offloading,
the checksum calculation has to be done in software before sending out the
packets.
No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.
Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Expected behavior for attribute removal from the database is
resetting it to default value. Currently this doesn't work for
n_rxq/n_txq options of pmd netdevs (last requested value used):
# ovs-vsctl set interface dpdk0 options:n_rxq=4
# ovs-vsctl remove interface dpdk0 options n_rxq
# ovs-appctl dpif/show | grep dpdk0
<...>
dpdk0 1/1: (dpdk: configured_rx_queues=4, <...> \
requested_rx_queues=4, <...>)
Fix that by using NR_QUEUE or 1 as a default value for 'smap_get_int'.
Fixes: a14b8947fd13 ("dpif-netdev: Allow different numbers of
rx queues for different ports.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
The copy should be used here.
Additionally, 'strlen' changed to the faster check.
Fixes: 821b86649a90 ("netdev-dpdk: Don't try to unregister empty vhost_id.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
If 'vhost-server-path' not provided for vhostuserclient port,
'netdev_dpdk_vhost_destruct()' will try to unregister an empty string.
This leads to error message in log:
netdev_dpdk|ERR|vhost2: Unable to unregister vhost driver for socket ''.
CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: 2d24d165d6a5 ("netdev-dpdk: Add new 'dpdkvhostuserclient' port type")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
When OVS&DPDK is used, DPDK doesn't support features 'advertised',
'supported' and 'peer'. If a physical port added to bridge, features
descirbed above can't be assigned, and the values are random.
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit announces support for DPDK 16.11. Compatibility with DPDK
v16.07 is not broken yet thanks to only minor code changes being needed
for the upgrade.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
'dev->requested_{rxq,txq}_size' and 'dev->{rxq,txq}_size' are
relevant only for DPDK_DEV_ETH devices and should be skipped
in 'netdev_dpdk_get_config()' for other ports.
CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: b685696b8c81 ("netdev-dpdk: Allow configurable queue sizes for 'dpdk' ports")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
When we set physical port's admin state via ovs-appctl, the application
seems to work and returns "OK". But the application doesn't work perfectly,
the state stored in database doesn't change.
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
When we set a vhost port's admin state via ovs-appctl, the
application doesn't work and returns the information
"Not a DPDK Interface".
Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
qos_conf can be NULL. This can be easily reproduced by setting egress
QoS on a port:
```
ovs-vsctl set port dpdk2 qos=@newqos -- --id=@newqos create qos
type=egress-policer other-config:cir=46000000 other-config:cbs=2048
```
Reported-by: Ian Stokes <ian.stokes@intel.com>
Fixes: 78bd47cf44a5 ("netdev-dpdk: Use RCU for egress QoS.")
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
The rte_pdump header file was not included in the file that requires it.
Fix this.
Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
There's a lot of code in netdev-dpdk which is not at all related to the
netdev interface, mostly the library initialization code.
This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'.
Also a new module 'dpdk-stub' is introduced to implement some functions
when DPDK is not available. This replaces the old 'netdev-nodpdk'
module.
Some redundant includes are removed or reorganized as a consequence.
No functional change.
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
It is customary to have the vlog module name similar to the filename.
Plus a following commit will introduce a 'dpdk' module.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
It's better to use the classes init() functions to perform
initialization required for classes.
This will make it easier to move dpdk_init__() to a separate module in a
future commit.
No functional change.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
If rte_eal_init() fails, we do not register the DPDK netdev classes,
therefore it's impossible to reach the classes construct functions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
Since DPDK commit 30e639989227("mempool: support non-EAL thread"),
non-EAL threads can use the mempool API safely. Plus, nonpmd threads
access to netdev is already serialized with 'non_pmd_mutex' in
dpif-netdev.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
We're in the slowpath. I find it easier to allocate and free memory,
than to handle snprintf() error conditions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
We can run out of hugepage memory coming from rte_*alloc() more easily
than heap coming from malloc().
Therefore:
* We should not use hugepage memory if we're going to access it only in
the slowpath.
* We shouldn't abort if we're out of hugepage memory.
* We should gracefully handle out of memory conditions.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>