Support offload of output action, also configuring count action for
allowing query statistics of HW offloaded flows.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
This patch adds a new policer to the DPDK datapath based on RFC 4115's
Two-Rate, Three-Color marker. It's a two-level hierarchical policer
which first does a color-blind marking of the traffic at the queue
level, followed by a color-aware marking at the port level. At the end
traffic marked as Green or Yellow is forwarded, Red is dropped. For
details on how traffic is marked, see RFC 4115.
This egress policer can be used to limit traffic at different rated
based on the queues the traffic is in. In addition, it can also be used
to prioritize certain traffic over others at a port level.
For example, the following configuration will limit the traffic rate at a
port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
Information Rate). High priority traffic is routed to queue 10, which marks
all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
marked as EIR, i.e. Yellow.
ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
--id=@myqos create qos type=trtcm-policer \
other-config:cir=52000 other-config:cbs=2048 \
other-config:eir=52000 other-config:ebs=2048 \
queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
--id=@dpdk1Q10 create queue \
other-config:cir=41600000 other-config:cbs=2048 \
other-config:eir=0 other-config:ebs=0 -- \
--id=@dpdk1Q20 create queue \
other-config:cir=0 other-config:cbs=0 \
other-config:eir=41600000 other-config:ebs=2048 \
This configuration accomplishes that the high priority traffic has a
guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
use the EIR, so a total of 2000pps at max. These additional 1000pps is
shared with the low priority traffic. The low priority traffic can use at
maximum 1000pps.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This reverts commit 9e334d91b3.
This commit is not needed since OVS doesn't use six anymore.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since Python 2 support was removed in 1ca0323e7c ("Require Python 3 and
remove support for Python 2."), python3-six is not needed anymore.
Moreover python3-six is not available on RHEL/CentOS7 without using EPEL
and so this patch is needed in order to release OVS 2.13 on RHEL7.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
'dpdkr' a.k.a. DPDK ring ports has really poor support in OVS and not
tested on a regular basis. These ports are intended to work via
shared memory with another DPDK secondary process, but there are lots
of limitations for using this functionality in practice. Most of them
connected with running secondary DPDK application and memory layout
issues. More details are available in DPDK guide:
https://doc.dpdk.org/guides-18.11/prog_guide/multi_proc_support.html#multi-process-limitations
Beside the functional limitations it's also hard to use this
functionality correctly. User must be sure that OVS and secondary DPDK
application are running on different CPU cores, which is hard because
non-PMD threads could float over available CPU cores. This or any
other misconfiguration will likely lead to crash of OVS.
Another problem is that the user must actually build the secondary
application with the same version of DPDK that was used for OVS build.
Above issues are same as we have while using DPDK pdump.
Beside that, current implementation in OVS is not able to free
allocated rings that could lead to memory exhausting.
Initially these ports was added to use with IVSHMEM for a fast
zero-copy HOST<-->VM communication. However, IVSHMEM is not used
anymore. IVSHMEM support was removed from DPDK in 16.11 release
(instructions for IVSHMEM were removed from the OVS docs almost 3 years
ago by commit 90ca71dd31 ("doc: Remove ivshmem instructions.")) and
the patch for QEMU for using regular files as a device backend is no
longer available. That makes DPDK ring ports barely useful in real
virtualization environment.
This patch adds a deprecation warnings for run-time port creation
and documentation. Claiming to completely remove this functionality
from OVS in one of the next releases.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Modify travis linux build script to use the latest DPDK stable release
18.11.5. Update docs for latest DPDK stable releases.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode. This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.
This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface. This will guarantee the best possible performance.
If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.
This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.
The full list of currently supported modes:
* native-with-zerocopy - former DRV
* native - new one, DRV without zero-copy
* generic - former SKB
* best-effort - new one, chooses the best available from
3 above modes
Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.
TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
The conventional way for packet dumping in OVS is to use ovs-tcpdump
that works via traffic mirroring. DPDK pdump could probably be used
for some lower level debugging, but it is not commonly used for
various reasons.
There are lots of limitations for using this functionality in practice.
Most of them connected with running secondary pdump process and
memory layout issues like requirement to disable ASLR in kernel.
More details are available in DPDK guide:
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
Beside the functional limitations it's also hard to use this
functionality correctly. User must be sure that OVS and pdump utility
are running on different CPU cores, which is hard because non-PMD
threads could float over available CPU cores. This or any other
misconfiguration will likely lead to crash of the pdump utility
or/and OVS.
Another problem is that the user must actually have this special pdump
utility in a system and it might be not available in distributions.
This change disables pdump support by default introducing special
configuration option '--enable-dpdk-pdump'. Deprecation warnings will
be shown to users on configuration and in runtime.
Claiming to completely remove this functionality from OVS in one
of the next releases.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
OVS may be unable to transmit packets for multiple reasons on
the userspace datapath and today there is a single counter to
track packets dropped due to any of those reasons. This patch
adds custom software stats for the different reasons packets
may be dropped during tx/rx on the userspace datapath in OVS.
- MTU drops : drops that occur due to a too large packet size
- Qos drops : drops that occur due to egress/ingress QOS
- Tx failures: drops as returned by the DPDK PMD send function
Note that the reason for tx failures is not specified in OVS.
In practice for vhost ports it is most common that tx failures
are because there are not enough available descriptors,
which is usually caused by misconfiguration of the guest queues
and/or because the guest is not consuming packets fast enough
from the queues.
These counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics" command and are
available for dpdk and vhostuser/vhostuserclient ports.
Also the existing "tx_retries" counter for vhost ports has been
renamed to "ovs_tx_retries", so that all the custom statistics
that OVS accumulates itself will have the prefix "ovs_". This
will prevent any custom stats names overlapping with
driver/HW stats.
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
At the same time disambiguate some feature descriptions.
'Meters' is changed to 'Meter action' to clarify that the entry
describes the Openflow meter action rather than port based meters.
'NAT' is changed to 'Conntrack NAT' to indicate that this entry
represents NAT done in 'conntrack', rather than basic Openflow
IP address and L4 port modifications.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The patch adds support for using need_wakeup flag in AF_XDP rings.
A new option, use-need-wakeup, is added. When this option is used,
it means that OVS has to explicitly wake up the kernel RX, using poll()
syscall and wake up TX, using sendto() syscall. This feature improves
the performance by avoiding unnecessary sendto syscalls for TX.
For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
up the kernel RX processing when fill queue is replenished.
The need_wakeup feature is merged into Linux kernel bpf-next tee with commit
77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
OVS enables it by default, if libbpf supports it. If users enable it but
runs in an older version of libbpf, then the need_wakeup feature has no effect,
and a warning message is logged.
For virtual interface, it's better set use-need-wakeup=false, since
the virtual device's AF_XDP xmit is synchronous: the sendto syscall
enters kernel and process the TX packet on tx queue directly.
On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
to physical port improves from 6.1Mpps to 7.3Mpps.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Since Python 3 is now mandatory, Extra Packages for Enterprise Linux
(EPEL) repository is needed in order to build OVS on RHEL7.
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The DPDK library allows OVS fast access to packet I/O in userspace. It
is not a datapath. This commit avoids using that term.
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Some users would find it useful to know the particular OVS version that
introduced a feature to the OVS tree kernel module or to the OVS
userspace (DPDK) datapath implementation. This patch updates the FAQ
to include that information.
This information is primarily gleaned from the top-level NEWS file.
For most of these, I did not verify them by looking carefully through
the history, so some of them may be inaccurate, although a few people
made corrections in review.
Requested-by: Jianjun Shen <shenj@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This workaround only applied to kernels earlier than 2.6.37, but OVS
only supports 3.10 and later.
As the original author of this code, I won't miss it.
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
OVS supports OVS Extensions as various vendor messages or as vendor
types in stats or multipart messages. Added a document to describe the
extensions as currently supported by OVS.
Signed-off-by: Ashish Varma <ashishvarma.ovs@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Move back the dpdk-testpmd reference to the right section of this
document so that the link in howto/dpdk does not point to
"vhost-user-client tx retries config".
Fixes: 080f080c3b ("netdev-dpdk: Enable tx-retries-max config.")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Someone else wrote this script originally, I think, but I've extended
it quite a bit.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This commit fixes potential unintended mistake in userspace-tunneling guide: for the example in userspace-tunneling guide, there is no bridge named "br-eth1", but only a bridge name "br-phy" which has a port named "eth1"
Signed-off-by: Liu Chang <liuchang@cmss.chinamobile.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
A preceding commit removed the last remaining dependencies on OVN code,
so remove the OVN code.
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Python 2 reaches end-of-life on January 1, 2020, which is only
a few months away. This means that OVS needs to stop depending
on in the next release that should occur roughly that same time.
Therefore, this commit removes all support for Python 2. It
also makes Python 3 a mandatory build dependency.
Some of the interesting consequences:
- HAVE_PYTHON, HAVE_PYTHON2, and HAVE_PYTHON3 conditionals have
been removed, since we now know that Python3 is available.
- $PYTHON and $PYTHON2 are removed, and $PYTHON3 is always
available.
- Many tests for Python 2 support have been removed, and the ones
that depended on Python 3 now run unconditionally. This allowed
several macros in the testsuite to be removed, making the code
clearer. This does make some of the changes to the testsuite
files large due to indentation level changes.
- #! lines for Python now use /usr/bin/python3 instead of
/usr/bin/python.
- Packaging depends on Python 3 packages.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Tested-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch first defines the dpif interface for a datapath to support
adding, deleting, getting and dumping conntrack timeout policy.
The timeout policy is identified by a 4 bytes unsigned integer in
datapath, and it currently support timeout for TCP, UDP, and ICMP
protocols.
Moreover, this patch provides the implementation for Linux kernel
datapath in dpif-netlink.
In Linux kernel, the timeout policy is maintained per L3/L4 protocol,
and it is identified by 32 bytes null terminated string. On the other
hand, in vswitchd, the timeout policy is a generic one that consists of
all the supported L4 protocols. Therefore, one of the main task in
dpif-netlink is to break down the generic timeout policy into 6
sub policies (ipv4 tcp, udp, icmp, and ipv6 tcp, udp, icmp),
and push down the configuration using the netlink API in
netlink-conntrack.c.
This patch also adds missing symbols in the windows datapath so
that the build on windows can pass.
Appveyor CI:
* https://ci.appveyor.com/project/YiHungWei/ovs/builds/26387754
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
We don't own ovs.org, and I doubt Ojai Valley School would enjoy
receiving our email.
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Add support in ovsdb-tool for migrating clustered dbs to standalone dbs.
E.g. usage to migrate nb/sb db to standalone db from raft:
ovsdb-tool cluster-to-standalone ovnnb_db.db ovnnb_db_cluster.db
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
There's nothing in OVS specific to Sphinx for Python 2, but the
compile-time check only looked for a binary named "sphinx-build", which is
typically provided only for Python 2. With Python 3, the binary is
typically called "sphinx-build-3". With this commit, either name is
accepted.
Acked-by: Numan Siddique <nusididq@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
OVN is separated into its own repo. This commit removes the OVN source,
OVN tests, and OVN documentation. It also removes mentions of OVN from
most documentation. The only place where OVN has been left is in
changelogs/NEWS, since we shouldn't mess with the history of the
project.
There is an exception here. The ovsdb-cluster tests rely on ovn-nbctl
and ovn-sbctl to run. Therefore those ovn utilities, as well as their
dependencies remain in the repo with this commit.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
1. Correct typo where it should be ovsdb-client backup vs ovsdb-tool backup.
2. Update for which case will ovsdb-client not work.
Acked-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Aliasgar Ginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch store the latest election timer in snapshot during log
compression, and when server restarts it reads the value from the log.
Without this, any previous changes to election timer will be lost
in the log, and if server restarts, it will use the default value
instead of the changed value.
Fixes: commit 8e35461 ("ovsdb raft: Support leader election time change online.")
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
1. Start OVS components in containers so that building and shipping
of OVS components is easy.
2. Load OVS kernel modules on host from container to avoid installing ovs
on host.
3. Update documentation about how to build/run ovs in docker.
Acked-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: aginwala <aginwala@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
A new unixctl command cluster/change-election-timer is implemented to
change leader election timeout base value according to the scale needs.
The change takes effect upon consensus of the cluster, implemented through
the append-request RPC. A new field "election-timer" is added to raft log
entry for this purpose.
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology. It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
SMC cache was introduced in 2.10 with experimental tag.
SMC cache is a layer of software cache located after EMC
cache. The purpose is to improve the performance of use
cases that many flows missing the EMC cache.
One can enable SMC cache using smc-enable=true option.
Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Origins for this patch are captured at
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048923.html.
Summarizing here, when a test fails, it would be good to pause test execution
and let the developer poke around the system to see current status of system.
As part of this patch, made a small tweaks to ovs-macros.at, so that when test
suite fails, ovs_on_exit() function will be called. And in this function, a check
is made to see if an environment variable to OVS_PAUSE_TEST is set. If it is
set, then test suite is paused and will continue to wait for user input
Ctrl-D. Meanwhile user can poke around the system to see why test case has
failed. Once done with investigation, user can press ctrl-d to cleanup the
test suite.
For example, to re-run test case 139:
export OVS_PAUSE_TEST=1
cd tests/system-userspace-testsuite.dir/139
sudo -E ./run
When error occurs, above command would display something like this:
=====================================================
Set environment variable to use various ovs utilities
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
Press ENTER to continue:
=====================================================
And from another window, one can execute ovs-xxx commands like:
export OVS_RUNDIR=/opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/139
$ ovs-ofctl dump-ports br0
.
.
To be able to pause while performing `make check`, one can do:
$ OVS_PAUSE_TEST=1 make check TESTSUITEFLAGS='-v'
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Vasu Dasari <vdasari@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
vhost tx retries can provide some mitigation against
dropped packets due to a temporarily slow guest/limited queue
size for an interface, but on the other hand when a system
is fully loaded those extra cycles retrying could mean
packets are dropped elsewhere.
Up to now max vhost tx retries have been hardcoded, which meant
no tuning and no way to disable for debugging to see if extra
cycles spent retrying resulted in rx drops on some other
interface.
Add an option to change the max retries, with a value of
0 effectively disabling vhost tx retries.
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>