No need to waste time for fields checking in case DBG disabled.
Additionally sequence of prints replaced with single print
to avoid output interrupting by other log messages.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This is useful for understanding which flows offloaded to
which ports.
Code refactored a bit to reduce number of casts.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Data pointed by cmap node must not be freed while iterating.
ovsrcu_postpone should be used instead.
CC: Finn Christensen <fc@napatech.com>
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
rte API is not thread safe. We have to get netdev mutex
before uing it and also before using fields of netdev structure.
This is important because offload API used from the separate
thread and could be used at the same time with other netdev
functions called from the main thread.
CC: Finn Christensen <fc@napatech.com>
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
vhost ports are not DPDK eth ports and has no rte_flow API.
Stop calling this API with DPDK_ETH_PORT_ID_INVALID to
avoid time wasting and errors in log.
Additionally, DPDK_FLOW_OFFLOAD_API definition moved to .c
file, because there is no need to expose it in header.
CC: Finn Christensen <fc@napatech.com>
Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The macros are hard to read. This makes it a little more readable.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
In the scenario of XL710, the link speed which stored in the table
of Interface is not 40G. Because the implementation of query of link
speed only support to 10G, the parameter 'current' will be a random
value in the scenario of higher link speed. In this case, incorrect
link speed of XL710 nic will be stored in the database.
Signed-off-by: Xu Binbin <xu.binbin1@zte.com.cn>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Match the prefix and formatting.
Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.")
Cc: pshelar@ovn.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Configuring flow control at ixgbe netdev-init is throwing error in port
start.
For eg: without this fix, user cannot configure flow control on ixgbe dpdk
port as below,
"
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
options:dpdk-devargs=0000:05:00.1 options:rx-flow-ctrl=true
"
Instead, it must be configured as two different commands,
"
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \
options:dpdk-devargs=0000:05:00.1
ovs-vsctl set Interface dpdk0 options:rx-flow-ctrl=true
"
The DPDK ixgbe driver is now validating all the 'rte_eth_fc_conf' fields before
trying to configuring the dpdk ethdev. Hence OVS can no longer set the
'dont care' fields to just '0' as before. This commit make sure all the
'rte_eth_fc_conf' fields are populated with default values before the dev
init.
Also to avoid read error on unsupported ports, the flow control parameters
are now read only when user is trying to configure/update it.
Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Neither of these is a real problem.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
uint8_t values shouldn't be passed to ntohs().
Found by soarse.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.
OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html
However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.
This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.
This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.
This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.
ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.
The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits
c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)
A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Tested-by: Tiago Lam <tiago.lam@intel.com>
The basic yet the major part of this patch is to translate the "match"
to rte flow patterns. And then, we create a rte flow with MARK + RSS
actions. Afterwards, all packets match the flow will have the mark id in
the mbuf.
The reason RSS is needed is, for most NICs, a MARK only action is not
allowed. It has to be used together with some other actions, such as
QUEUE, RSS, etc. However, QUEUE action can specify one queue only, which
may break the rss. Likely, RSS action is currently the best we could
now. Thus, RSS action is choosen.
For any unsupported flows, such as MPLS, -1 is returned, meaning the
flow offload is failed and then skipped.
Co-authored-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Co-authored-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.
Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
lib/netdev-dpdk.c: In function :
lib/netdev-dpdk.c:2865:49: warning: output may be truncated before the last format character [-Wformat-truncation=]
snprintf(vhost_vring, 16, "vring_%d_size", i);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
The function rte_eth_dev_set_mtu is not supported for all DPDK drivers.
Currently if it is not supported we return an error in
dpdk_eth_dev_queue_setup. There are two issues with this.
(i) A device can still function even if rte_eth_dev_set_mtu is not
supported albeit with the default max rx packet length.
(ii) When ENOTSUP is returned it will not be caught in port_reconfigure()
at the dpif-netdev layer. Port_reconfigure() checks if a netdev_reconfigure()
function is supported for a given netdev and ignores EOPNOTSUPP errors as it
assumes errors of this value mean there is no reconfiguration function.
In this case the reconfiguration function is supported for netdev dpdk but
a function called as part of the reconfigure (rte_eth_dev_set_mtu) may
not be supported.
As this is a corner case, this commit warns a user when
rte_eth_dev_set_mtu is not supported and informs them of the default
max rx packet length that will be used instead.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-author: Michal Weglicki <michalx.weglicki@intel.com>
Tested-By: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Virtual functions such as igb_vf and i40e_vf require HW_CRC_STRIP to be
explicitly enabled before configuration, otherwise device configuration
will fail.
This commit achieves this by adding NETDEV_RX_HW_CRC_STRIP to
dpdk_hw_ol_features. When a dpdk device is added, the driver for the
device is examined, if the device is a virtual function enable
HW_CRC_STRIP.
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Co-Authored: Ian Stokes <ian.stokes@intel.com>
Acked-by: Cian Ferriter <cian.ferriter@intel.com>
Tested-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Currently the check of "net_nfp" driver while enabling scatter compares
only the first 6 bytes, but "net_nfp" is 7 bytes long.
This change fixes the check by comparing the first 7 bytes.
CC: Pablo Cascón <pablo.cascon@netronome.com>
CC: Simon Horman <simon.horman@netronome.com>
Fixes: 65a87968f4cf ("netdev-dpdk: don't enable scatter for jumbo RX support for nfp")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
When initialization of the DPDK PMD driver fails
(dpdk_eth_dev_init()), the reconfigure_datapath() function will remove
the port from dp_netdev, and the port is not used.
Now when bridge_reconfigure() is called again, no changes to the
previous failing netdev configuration are detected and therefore the
ports gets added to dp_netdev and used uninitialized. This is causing
exceptions...
The fix has two parts to it. First in netdev-dpdk.c we remember if the
DPDK port was started or not, and when calling
netdev_dpdk_reconfigure() we also try re-initialization if the port
was not already active. The second part of the change is in
dpif-netdev.c where it makes sure netdev_reconfigure() is called if
the port needs reconfiguration, as netdev_is_reconf_required() is only
true until netdev_reconfigure() is called (even if it fails).
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
rte_mempool_ops_get_count is not exported by DPDK so it means it
cannot be used by OVS when using DPDK as a shared library.
Remove rte_mempool_ops_get_count but still use rte_mempool_full
and document it's behavior.
Fixes: 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.")
Reported-by: Timothy Redaelli <tredaelli@redhat.com>
Reported-by: Markos Chandras <mchandras@suse.de>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
It is possible to set LSC detection mode to polling or interrupt mode
for DPDK interfaces. The default is polling mode. To set interrupt mode,
option dpdk-lsc-interrupt has to be set to true.
For detailed description and usage see the dpdk install documentation.
Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
If the caller provides a non-NULL qfill pointer and the netdev
implemementation supports reading the rx queue fill level, the rxq_recv()
function returns the remaining number of packets in the rx queue after
reception of the packet burst to the caller. If the implementation does
not support this, it returns -ENOTSUP instead. Reading the remaining queue
fill level should not substantilly slow down the recv() operation.
A first implementation is provided for ethernet and vhostuser DPDK ports
in netdev-dpdk.c.
This output parameter will be used in the upcoming commit for PMD
performance metrics to supervise the rx queue fill level for DPDK
vhostuser ports.
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Currently to RX jumbo packets fails for NICs not supporting scatter.
Scatter is not strictly needed for jumbo RX support. This change fixes
the issue by not enabling scatter only for the PMD/NIC known not to
need it to support jumbo RX.
Note: this change is temporary and not needed for later releases OVS/DPDK
Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
DPDK mempools are freed when they are no longer needed.
This can happen when a port is removed or a port's mtu
is reconfigured so that a new mempool is used.
It is possible that an mbuf is attempted to be returned
to a freed mempool from NIC Tx queues and this can lead
to a segfault.
In order to prevent this, only free mempools when they
are not needed and have no in-use mbufs. As this might
not be possible immediately, create a free list of
mempools and sweep it anytime a port tries to get a
mempool.
Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
Cc: mark.b.kavanagh81@gmail.com
Cc: Ilya Maximets <i.maximets@samsung.com>
Reported-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Presently, if OVS tries to setup more queues than
are allowed by a specific NIC, OVS will handle
this case by retrying with a lower amount of queues.
Rather than reporting initial failed queue setups
in the logs as ERROR, they are reported as INFO but
contain the word 'error'. Unless a user has detailed
knowledge of OVS-DPDK workings, this is confusing.
Let's remove 'error' from the INFO log.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Since 17.11 release DPDK uses uint16 for port_id. Format
strings for printing functions must be updated accordingly.
CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Zero copy is disabled by default. To enable it, set the 'dq-zero-copy'
option to 'true' when configuring the Interface:
ovs-vsctl set Interface dpdkvhostuserclient0
options:vhost-server-path=/tmp/dpdkvhostuserclient0
options:dq-zero-copy=true
When packets from a vHost device with zero copy enabled are destined for
a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port
must be set to a smaller value. 128 is recommended. This can be achieved
like so:
ovs-vsctl set Interface dpdkport options:n_txq_desc=128
Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send
to should not exceed 128. Due to this requirement, the feature is
considered 'experimental'.
Testing of the patch showed a ~8% improvement when switching 512B
packets between vHost devices on different VMs on the same host when
zero copy was enabled on the transmitting device.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Some NICs have only one PCI address associated with multiple ports. This
patch extends the dpdk-devargs option's format to cater for such devices.
To achieve that, this patch uses a new syntax that will be adapted and
implemented in future DPDK release (likely, v18.05):
http://dpdk.org/ml/archives/dev/2017-December/084234.html
And since it's the DPDK duty to parse the (complete and full) syntax
and this patch is more likely to serve as an intermediate workaround,
here I take a simpler and shorter syntax from it (note it's allowed to
have only one category being provided):
class=eth,mac=00:11:22:33:44:55:66
Also, old compatibility is kept. Users can still go on with using the
PCI id to add a port (if that's enough for them). Meaning, this patch
will not break anything.
This patch is basically based on the one from Ciara:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html
Cc: Loftus Ciara <ciara.loftus@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
in netdev_dpdk_set_mtu(), in order to determine if the total length of
the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.
When setting an MTU we first check if the requested total frame length
(which includes associated L2 overhead) will exceed the maximum
frame length supported in netdev_dpdk_set_mtu(). The frame length is
calculated by MTU_TO_FRAME_LEN as MTU + ETHER_HEADER + ETHER_CRC. The MTU
for the device will be set at a later stage in dpdk_eth_dev_init() using
rte_eth_dev_set_mtu(mtu).
However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
that the frame does not exceed the max frame length for that device varies
between DPDK device drivers. For example ixgbe driver calculates the
frame length for a given MTU as
mtu + ETHER_HDR_LEN + ETHER_CRC_LEN
i40e driver calculates it as
mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2
em driver calculates it as
mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE
Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
the upper limit MTU for that devices DPDK driver. This leads to a segfault.
This is because the frame length comparison as is, does not take into account
the addition of the vlan tag overhead expected in the drivers. The
netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
dpdk_eth_dev_init() will fail before the queues have been created for the
DPDK device. This coupled with assumptions regarding reconfiguration
requirements for the netdev will lead to a segfault when the rxq is polled
for this device.
A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
validating a requested MTU in netdev_dpdk_set_mtu().
MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:
mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)
By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
now takes into account the maximum L2 overhead that a DPDK driver could
allow for in its frame size calculation. This allows OVS to flag an error
rather than the DPDK driver if the frame length exceeds the max DPDK frame
length. OVS can fail gracefully at this point and use the default MTU of
1500 to continue to configure the port.
Note: this fix is a work around, a better approach would be if DPDK devices
could report the maximum MTU value that can be requested on a per device
basis. This capability however is not currently available. A downside of
this patch is that the MTU upper limit will be reduced by 8 bytes for
DPDK devices that do not need to account for vlan tags in the frame length
driver calculations e.g. ixgbe devices upper MTU limit is reduced from
the OVS point of view from 9710 to 9702.
CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Expose relevant vhost-user information in status.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
- New get_custom_stats interface function is added to netdev. It
allows particular netdev implementation to expose custom
counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
port type.
- DPDK-physical implementation uses xstats to collect statistics.
Only dropped and error counters are exposed.
Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
New appctl 'netdev-dpdk/get-mempool-info' implemented to get result
of 'rte_mempool_list_dump()' function if no arguments passed and
'rte_mempool_dump()' if DPDK netdev passed as argument.
Could be used for debugging mbuf leaks and other mempool related
issues. Most useful in pair with `grep -v "cache_count.*=0"`.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit extends netdev_dpdk_get_status API to include additional
driver-related information: if_type and if_descr.
v2->v3: Code rebase.
v3->v4: Minor comments applied.
v5->v6: Adds DPDK port specific description in documentation.
Co-authored-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.
This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.
This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:
ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit adds support for DPDK v17.11:
- minor updates to accomodate DPDK API changes
- update references to DPDK version in Documentation
- update DPDK version in travis' linux-build script
- document DPDK v17.11 virtio driver bug
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Guoshuai Li <ligs@dtdream.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
The call to rte_eth_dev_count() was added as workaround
for rte_eth_dev_get_port_by_name() not handling cases
when there was no DPDK ports.
In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name()
does handle this case (DPDK commit f9ae888b1e19).
rte_eth_dev_count() is no longer needed so remove it.
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
It'll be nice to document current naming convention for variables of
the following types used in netdev-dpdk:
* netdev
* netdev_dpdk
* netdev_rxq
* netdev_rxq_dpdk
to be sure that we will not return to chaos which was before
commit d46285a2206f ("netdev-dpdk: Consistent variable naming.").
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Function 'netdev_dpdk_set_admin_state()' was missed while fixing
variables naming according to the following convention:
'struct netdev':'netdev'
'struct netdev_dpdk':'dev'
'struct netdev_rxq':'rxq'
'struct netdev_rxq_dpdk':'rx'
Fixes: d46285a2206f ("netdev-dpdk: Consistent variable naming.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokess <ian.stokes@intel.com>
Currently mempool name size limited to 25 characters by
RTE_MEMPOOL_NAMESIZE. netdev-dpdk tries to create mempool with the
following name pattern: "ovs_%{hash}_%{socket}_%{mtu}_%{n_mbuf}".
We have 3 chars for "ovs" + 4 chars for delimiters + 8 chars for
hash (because it's the 32 bit integer printed in hex) + 1 char for
socket_id (mostly 1, but it could be 2 on some systems; larger?) = 16.
Only 25 - 16 = 9 characters remains for mtu + n_mbufs.
Minimum usual value for mtu is 1500 --> 2030 (4 chars) after
dpdk_buf_size conversion and the minimum value for n_mbufs is 16384
(5 chars). So, all the 9 characters are used.
If we'll try to create port with mtu = 9500, mempool creation will
fail, because FRAME_LEN_TO_MTU(dpdk_buf_size(9500)) = 10222 (5 chars)
and this value will overflow the RTE_MEMPOOL_NAMESIZE limit.
Same issue will happen if we'll try to create port with big enough
number of queues or will try to create big enough number of PMD
threads (number of tx queues will enlarge the mempool requirements).
Fix that by removing the delimiters. To keep the readability (at least
partial) of the mempool names exact field sizes with zero padding
are used.
Following limits should be suitable for now:
- Hash length: 8 chars (uint32_t in hex)
- Socket ID : 2 chars (For systems with up to 10 sockets)
- MTU : 5 chars (MTU (10^5 - 1) should be enough for now)
- n_mbufs : 7 chars (Up to 10^7 of mbufs)
Total : 22 + 3 (for "ovs") = 25
CC: Antonio Fischetti <antonio.fischetti@intel.com>
CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Fixes: f06546a51dd8 ("Fix mempool names to reflect socket id.")
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>