The kernel needs to be at least 4.19-rc7 to include the commit
9d2f67e43b73 ("net/packet: fix packet drop as of virtio gso")
otherwise the TSO packets are dropped when using raw sockets.
Fixes: 29cf9c1b3b ("userspace: Add TCP Segmentation Offload support")
Reported-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Dequeue zero-copy is no longer supported for vhost-user client mode
in DPDK due to commit [1].
In addition to this, zero-copy mode has been proposed to be marked
deprecated in [2] with removal in the next DPDK LTS release.
This commit deprecates support for vhost-user dequeue zero-copy in OVS
with its removal expected in the next OVS release.
[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client
mode")
[2] http://mails.dpdk.org/archives/dev/2020-August/177236.html
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Modify travis linux build script to use DPDK 19.11.2 stable release and
update docs to reference 19.11.2 stable release. Update release faq to
reflect latest validated DPDK versions for all branches.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added DPCLS functionality. The newly added commands are
documented, and sample output is provided.
Running the DPCLS autovalidator with unit tests by default is possible
through re-compiling the autovalidator to have the highest priority at
startup time. This avoids making changes to all tests, and enables
debug and CI builds to validate every lookup implementation with all
unit tests.
Add NEWS updates for CPU ISA, dynamic subtables, and AVX512 lookup.
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
On many systems the check-kmod and check-kernel test suites have
many failures due to the lack of feature support in the older
iproute2 utility packages shipped with those systems. Add a
note indicating that it might be necessary to update the iproute2
utility package in order to fix those errors.
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
'dpdkr' ring ports was deprecated in 2.13 release and was not
actually used for a long time. Remove support now.
More details in
commit b4c5f00c33 ("netdev-dpdk: Deprecate ring ports.")
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Ideally SCTP checksum offload needs be advertised by the
NIC when userspace TSO is enabled. However, very few drivers
do that and it's not a widely used protocol. So, this patch
enables SCTP checksum offload if available, otherwise userspace
TSO can still be enabled but SCTP packets will be dropped on
NICs without support.
Fixes: 29cf9c1b3b ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
When using TSO in OVS-DPDK with an i40e device, the following
patch is required for DPDK, which fixes an issue on the TSO path:
https://patches.dpdk.org/patch/64136/
Document this as a limitation until a DPDK release with the fix
included is supported by OVS.
Also, document best known methods for performance tuning when
testing TSO with the tool iperf.
Fixes: 29cf9c1b3b ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.
A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.
It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.
If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.
It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.
This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.
For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.
ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
--id=@myqos create qos type=trtcm-policer \
other-config:cir=52000 other-config:cbs=2048 \
other-config:eir=52000 other-config:ebs=2048 \
queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
--id=@dpdk1Q10 create queue \
other-config:cir=41600000 other-config:cbs=2048 \
other-config:eir=0 other-config:ebs=0 -- \
--id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
other-config:eir=41600000 other-config:ebs=2048 \
This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
'dpdkr' a.k.a. DPDK ring ports has really poor support in OVS and not
tested on a regular basis. These ports are intended to work via
shared memory with another DPDK secondary process, but there are lots
of limitations for using this functionality in practice. Most of them
connected with running secondary DPDK application and memory layout
issues. More details are available in DPDK guide:
https://doc.dpdk.org/guides-18.11/prog_guide/multi_proc_support.html#multi-process-limitations
Beside the functional limitations it's also hard to use this
functionality correctly. User must be sure that OVS and secondary DPDK
application are running on different CPU cores, which is hard because
non-PMD threads could float over available CPU cores. This or any
other misconfiguration will likely lead to crash of OVS.
Another problem is that the user must actually build the secondary
application with the same version of DPDK that was used for OVS build.
Above issues are same as we have while using DPDK pdump.
Beside that, current implementation in OVS is not able to free
allocated rings that could lead to memory exhausting.
Initially these ports was added to use with IVSHMEM for a fast
zero-copy HOST<-->VM communication. However, IVSHMEM is not used
anymore. IVSHMEM support was removed from DPDK in 16.11 release
(instructions for IVSHMEM were removed from the OVS docs almost 3 years
ago by commit 90ca71dd31 ("doc: Remove ivshmem instructions.")) and
the patch for QEMU for using regular files as a device backend is no
longer available. That makes DPDK ring ports barely useful in real
virtualization environment.
This patch adds a deprecation warnings for run-time port creation
and documentation. Claiming to completely remove this functionality
from OVS in one of the next releases.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Modify travis linux build script to use the latest DPDK stable release
18.11.5. Update docs for latest DPDK stable releases.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
The conventional way for packet dumping in OVS is to use ovs-tcpdump
that works via traffic mirroring. DPDK pdump could probably be used
for some lower level debugging, but it is not commonly used for
various reasons.
There are lots of limitations for using this functionality in practice.
Most of them connected with running secondary pdump process and
memory layout issues like requirement to disable ASLR in kernel.
More details are available in DPDK guide:
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
Beside the functional limitations it's also hard to use this
functionality correctly. User must be sure that OVS and pdump utility
are running on different CPU cores, which is hard because non-PMD
threads could float over available CPU cores. This or any other
misconfiguration will likely lead to crash of the pdump utility
or/and OVS.
Another problem is that the user must actually have this special pdump
utility in a system and it might be not available in distributions.
This change disables pdump support by default introducing special
configuration option '--enable-dpdk-pdump'. Deprecation warnings will
be shown to users on configuration and in runtime.
Claiming to completely remove this functionality from OVS in one
of the next releases.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
OVS may be unable to transmit packets for multiple reasons on
the userspace datapath and today there is a single counter to
track packets dropped due to any of those reasons. This patch
adds custom software stats for the different reasons packets
may be dropped during tx/rx on the userspace datapath in OVS.
- MTU drops : drops that occur due to a too large packet size
- Qos drops : drops that occur due to egress/ingress QOS
- Tx failures: drops as returned by the DPDK PMD send function
Note that the reason for tx failures is not specified in OVS.
In practice for vhost ports it is most common that tx failures
are because there are not enough available descriptors,
which is usually caused by misconfiguration of the guest queues
and/or because the guest is not consuming packets fast enough
from the queues.
These counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics" command and are
available for dpdk and vhostuser/vhostuserclient ports.
Also the existing "tx_retries" counter for vhost ports has been
renamed to "ovs_tx_retries", so that all the custom statistics
that OVS accumulates itself will have the prefix "ovs_". This
will prevent any custom stats names overlapping with
driver/HW stats.
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The DPDK library allows OVS fast access to packet I/O in userspace. It
is not a datapath. This commit avoids using that term.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
OVS supports OVS Extensions as various vendor messages or as vendor
types in stats or multipart messages. Added a document to describe the
extensions as currently supported by OVS.
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Move back the dpdk-testpmd reference to the right section of this
document so that the link in howto/dpdk does not point to
"vhost-user-client tx retries config".
Fixes: 080f080c3b ("netdev-dpdk: Enable tx-retries-max config.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A preceding commit removed the last remaining dependencies on OVN code,
so remove the OVN code.
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN is separated into its own repo. This commit removes the OVN source,
OVN tests, and OVN documentation. It also removes mentions of OVN from
most documentation. The only place where OVN has been left is in
changelogs/NEWS, since we shouldn't mess with the history of the
project.
There is an exception here. The ovsdb-cluster tests rely on ovn-nbctl
and ovn-sbctl to run. Therefore those ovn utilities, as well as their
dependencies remain in the repo with this commit.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
SMC cache was introduced in 2.10 with experimental tag.
SMC cache is a layer of software cache located after EMC
cache. The purpose is to improve the performance of use
cases that many flows missing the EMC cache.
One can enable SMC cache using smc-enable=true option.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Origins for this patch are captured at
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048923.html.
Summarizing here, when a test fails, it would be good to pause test execution
and let the developer poke around the system to see current status of system.
As part of this patch, made a small tweaks to ovs-macros.at, so that when test
suite fails, ovs_on_exit() function will be called. And in this function, a check
is made to see if an environment variable to OVS_PAUSE_TEST is set. If it is
set, then test suite is paused and will continue to wait for user input
Ctrl-D. Meanwhile user can poke around the system to see why test case has
failed. Once done with investigation, user can press ctrl-d to cleanup the
test suite.
For example, to re-run test case 139:
export OVS_PAUSE_TEST=1
cd tests/system-userspace-testsuite.dir/139
sudo -E ./run
When error occurs, above command would display something like this:
=====================================================
Set environment variable to use various ovs utilities
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
Press ENTER to continue:
=====================================================
And from another window, one can execute ovs-xxx commands like:
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
$ ovs-ofctl dump-ports br0
.
.
To be able to pause while performing `make check`, one can do:
$ OVS_PAUSE_TEST=1 make check TESTSUITEFLAGS='-v'
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
vhost tx retries can provide some mitigation against
dropped packets due to a temporarily slow guest/limited queue
size for an interface, but on the other hand when a system
is fully loaded those extra cycles retrying could mean
packets are dropped elsewhere.
Up to now max vhost tx retries have been hardcoded, which meant
no tuning and no way to disable for debugging to see if extra
cycles spent retrying resulted in rx drops on some other
interface.
Add an option to change the max retries, with a value of
0 effectively disabling vhost tx retries.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
vhost tx retries may occur, and it can be a sign that
the guest is not optimally configured.
Add a custom stat so a user will know if vhost tx retries are
occurring and hence give a hint that guest config should be
examined.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
vhost tx retry is applicable to vhost-user and vhost-user-client,
but was in the section that compares them. Also, moved further
down the doc as prefer to have more fundamental info about vhost
nearer the top.
Fixes: 6d6513bfc6 ("doc: Add info on vhost tx retries.")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Documents OvS fuzzing effort and performs a rudimentary security
analysis of existing OvS fuzzing harnesses.
Feedback on the documentation and analysis appreciated.
Signed-off-by: Bhargava Shastry <bshas3@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Modify travis linux build script to use the latest DPDK stable release
18.11.2. Update docs for latest DPDK stable releases.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
OpenFlow 1.5 changed "meter" from an instruction to an action. This commit
supports it properly.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Post-copy Live Migration for vHost supported since DPDK 18.11 and
QEMU 2.12. New global config option 'vhost-postcopy-support' added
to control this feature. Ex.:
ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true
Changing this value requires restarting the daemon. It's safe to
enable this knob even if QEMU doesn't support post-copy LM.
Feature marked as experimental and disabled by default because it may
cause PMD thread hang on destination host on page fault for the time
of page downloading from the source.
Feature is not compatible with 'mlockall' and 'dequeue zero-copy'.
Support added only for vhost-user-client.
Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Modify travis linux build script to use the latest
DPDK stable release 18.11.1. Update docs for latest
DPDK stable releases.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
The commits that implemented these features forgot to update the
documentation.
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Conditional EMC insert helps a lot in scenarios with high numbers
of parallel flows, but in current implementation this option affects
all the threads and ports at once. There are scenarios where we have
different number of flows on different ports. For example, if one
of the VMs encapsulates traffic using additional headers, it will
receive large number of flows but only few flows will come out of
this VM. In this scenario it's much faster to use EMC instead of
classifier for traffic from the VM, but it's better to disable EMC
for the traffic which flows to VM.
To handle above issue introduced 'emc-enable' configurable to
enable/disable EMC on a per-port basis. Ex.:
ovs-vsctl set interface dpdk0 other_config:emc-enable=false
EMC probability kept as is and it works for all the ports with
'emc-enable=true'.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Port rx queues that have not been statically assigned to PMDs are currently
assigned based on periodically sampled load measurements.
The assignment is performed at specific instances – port addition, port
deletion, upon reassignment request via CLI etc.
Due to change in traffic pattern over time it can cause uneven load among
the PMDs and thus resulting in lower overall throughout.
This patch enables the support of auto load balancing of PMDs based on
measured load of RX queues. Each PMD measures the processing load for each
of its associated queues every 10 seconds. If the aggregated PMD load reaches
95% for 6 consecutive intervals then PMD considers itself to be overloaded.
If any PMD is overloaded, a dry-run of the PMD assignment algorithm is
performed by OVS main thread. The dry-run does NOT change the existing
queue to PMD assignments.
If the resultant mapping of dry-run indicates an improved distribution
of the load then the actual reassignment will be performed.
The automatic rebalancing will be disabled by default and has to be
enabled via configuration option. The interval (in minutes) between
two consecutive rebalancing can also be configured via CLI, default
is 1 min.
Following example commands can be used to set the auto-lb params:
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-rebalance-intvl="5"
Co-authored-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Co-authored-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Rohith Basavaraja <rohith.basavaraja@gmail.com>
Signed-off-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This was rendering in italics instead of cross-referencing as intended.
Signed-off-by: Stephen Finucane <stephen@that.guru>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit adds support for DPDK v18.11, it includes the following
changes.
1. Enable compilation and linkage with dpdk 18.11.0
The following dpdk commits which were introduced after dpdk 17.11.x
require OVS updates to accommodate to the dpdk changes.
- ce17edde ("ethdev: introduce Rx queue offloads API")
- ab3ce1e0 ("ethdev: remove old offload API")
- c06ddf96 ("meter: add configuration profile")
- e58638c3 ("ethdev: fix TPID handling in flow API")
- cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
- ac8d22de ("ethdev: flatten RSS configuration in flow API")
2. Limit configured rss hash functions to only those supported
by the eth device.
3. Set default RSS key in struct action_rss_data, required by OVS
commit- e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow")
when configured with "other_config:hw-offload=true".
4. DEV_RX_OFFLOAD_CRC_STRIP has been removed from DPDK 18.11.
DEV_RX_OFFLOAD_KEEP_CRC can now be used to keep the CRC.
Use the correct flag and check it is supported.
5. rte_eth_dev_attach/detach have been removed from DPDK 18.11.
Replace them with rte_dev_probe/remove.
6. Update docs and travis to use DPDK18.11.
This commit squashes the following commits present on the dpdk-latest
branch:
7f021f902bb3 ("netdev-dpdk: Upgrade to dpdk v18.08")
270d9216f1ed ("netdev-dpdk: Set scatter based on capabilities")
bef2cdc8f412 ("netdev-dpdk: Fix returning the field of malloced struct.")
73c1a65167fc ("redhat: change variable used for non-root user support")
eb485f60ce44 ("dpdk: Update to use DPDK 18.11.")
For credit all authors of the original commits above have been added as
co-authors for this commmit.
From: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Co-authored-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Co-authored-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Commit dfaf00e started using the result of dpdk_buf_size() to calculate
the available size on each mbuf, as opposed to using the previous
MBUF_SIZE macro. However, this was calculating the mbuf size by adding
up the MTU with RTE_PKTMBUF_HEADROOM and only then aligning to
NETDEV_DPDK_MBUF_ALIGN. Instead, the accounting for the
RTE_PKTMBUF_HEADROOM should only happen after alignment, as per below.
Before alignment:
ROUNDUP(MTU(1500) + RTE_PKTMBUF_HEADROOM(128), 1024) = 2048
After aligment:
ROUNDUP(MTU(1500), 1024) + 128 = 2176
This might seem insignificant, however, it might have performance
implications in DPDK, where each mbuf is expected to have 2k +
RTE_PKTMBUF_HEADROOM of available space. This is because not only some
NICs have course grained alignments of 1k, they will also take
RTE_PKTMBUF_HEADROOM bytes from the overall available space in an mbuf
when setting up their Rx requirements. Thus, only the "After alignment"
case above would guarantee a 2k of available room, as the "Before
alignment" would report only 1920B.
Some extra information can be found at:
https://mails.dpdk.org/archives/dev/2018-November/119219.html
Note: This has been found by Ian Stokes while going through some
af_packet checks.
Reported-by: Ian Stokes <ian.stokes@intel.com>
Fixes: dfaf00e ("netdev-dpdk: fix mbuf sizing")
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
buffer alignment (typically 1024B). So, for example, in order to
successfully receive and capture a 1500B packet, mbufs with a
data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
* the dp packet, which includes a struct rte mbuf (704B)
* RTE_PKTMBUF_HEADROOM (128B)
* packet data (aligned to 1k, as previously described)
* RTE_PKTMBUF_TAILROOM (typically 0)
Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:
mbuf.data_len == mbuf.buf_len - mbuf.data_off
This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.
Co-authored-by: Tiago Lam <tiago.lam@intel.com>
Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Remove note regarding zero-copy compatibility with QEMU >= 2.7.
When zero-copy was introduced to OVS it was incompatible with QEMU >=
2.7. This issue has since been fixed in DPDK with commit
803aeecef123 ("vhost: fix dequeue zero copy with virtio1") and
backported to DPDK LTS branches. Remove the reference to this
issue in the zero-copy documentation.
Cc: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Modify travis linux build script to use the latest
DPDK stable release 17.11.4. Update docs for latest
DPDK stable releases.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
A failure is quite harsh in this scenario. It's better to
simply skip all the tests and let the user look at the logs
to understand the missing hugepages.
Signed-off-by: Bala Sankaran <bsankara@redhat.com>
Co-authored-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This allows a system that doesn't have a dedicated DPDK nic to
execute some DPDK tests. In this fashion, tests that operate on
virtual ports (such as dpdkvhostuserclient) can be executed in
a wider set of environments.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Bala Sankaran <bsankara@redhat.com>
Signed-off-by: Bala Sankaran <bsankara@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Prior to OVS 2.9 automatic assignment of Rxqs to PMDs
(i.e. CPUs) was done by round-robin.
That was changed in OVS 2.9 to ordering the Rxqs based on
their measured processing cycles. This was to assign the
busiest Rxqs to different PMDs, improving aggregate
throughput.
For the most part the new scheme should be better, but
there could be situations where a user prefers a simple
round-robin scheme because Rxqs from a single port are
more likely to be spread across multiple PMDs, and/or
traffic is very bursty/unpredictable.
Add 'pmd-rxq-assign' config to allow a user to select
round-robin based assignment.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>