2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

581 Commits

Author SHA1 Message Date
Michael Phelan
9b1a897e25 dpdk: Use DPDK 21.11.2 release.
Update OVS CLI and relevant documentation to use DPDK 21.11.2.

DPDK 21.11.2 contains fixes for the CVEs listed below:
CVE-2022-28199 [1]
CVE-2022-2132 [2]

A bug was introduced in DPDK 21.11.1 by the commit

01e3dee29c02 ("vhost: fix unsafe vring addresses modifications").

This bug can cause a deadlock when vIOMMU is enabled and NUMA
reallocation of the virtqueues happen.
A fix [3] has been posted and pushed to the DPDK 21.11 branch.

If a user wishes to avoid the issue then it is recommended to use
DPDK 21.11.0 until the release of DPDK 21.11.3.
It should be noted that DPDK 21.11.0 does not benefit from the
numerous bug and CVE fixes addressed since its release.
If a user wishes to benefit from these fixes it is recommended to
use DPDK 21.11.2.

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199
[2] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132
[3] https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.marchand@redhat.com/

Signed-off-by: Michael Phelan <michael.phelan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-10-04 10:03:57 +01:00
ldejing
7af5c33c16 datapath-windows: Add IPv6 conntrack ip fragment support on windows
Implementation on Windows:
IPv6 conntrack ip fragment feature use a link list to store ip
fragment. When ipv6 fragment module receives a fragment packet,
it will store length of the fragment, until to the received length
equal to the packet length before fragmented, it will reassemble
fragment packet to a complete packet and send the complete packet
to conntrack module. After conntrack processed the packet, fragment
module will divide the complete packet into small fragment and send
it to destination. Currently, ipv6 was implemented in a indenpent
module, for the reason it can reduce the risk of introduce bug to
ipv4 fragmenb module.

Testing Topology:
On the Windows VM runs on the ESXi host, two hyper-v ports attached
to the ovs bridge; one hyper-v port worked as client and the
other port worked as server.

Testing Case:
1.UdpV6
  a) UdpV6 fragment with multiple ipv6 extension fields.
  b) UdpV6 fragment in normal scenario.
  c) UdpV6 fragment in nat scenario.

2.IcmpV6
  a) IcmpV6 fragment in normal scenario.
  b) IcmpV6 fragment in nat scenario.

Signed-off-by: ldejing <ldejing@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2022-09-20 02:40:03 +03:00
ldejing
54a618f0bd datapath-windows: Alg support for ftp and tftp in conntrack
This patch mainly support alg field in ct action when process
ftp/tftp traffic. Tftp with alg mainly parse the tftp packet
 (IPv4/IPv6), extract connect info from the tftp packet and
 create the related connection. For ftp, previous version has
 supported process of ftp traffic. However, previous version
 regard traffic from or to port 21 as ftp traffic, this is
 incorrect in some scenario. This version adds alg field in ct for
 ftp traffic, we could use ct(alg=ftp) to process any ftp traffic
 from/to any port.

IPv4/IPv6.

Test cases:
    1) ftp ipv4/ipv6 use alg field in the normal and nat scenario.
    2) tftp ipv4/ipv6 use alg field in the normal and nat scenario.

Signed-off-by: ldejing <ldejing@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2022-09-20 02:27:20 +03:00
Ilya Maximets
8f62b45e40 releases: Mark 2.17 as a new LTS release.
With release of OVS v3.0.0, according to our release process,
2.17.x becomes a new LTS series.

Acked-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-15 21:25:58 +02:00
Ilya Maximets
516f181a21 docs: Remove remaining references to OVS kmod and XenServer.
README file still mentions a kernel module and some parts of
the documentation still have XenServer references, e.g. 'xs-*'
database configuration options.  Removing them.

Fixes: 422e90437854 ("make: Remove the Linux datapath.")
Fixes: 83c9518e7c67 ("xenserver: Remove xenserver.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-15 19:46:00 +02:00
Greg Rose
83c9518e7c xenserver: Remove xenserver.
Remove the current xenserver implementation - it is obsolete and
since 3.0 we do not support kernel module builds [1].

1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html

[i.maximets]
Can be added back if people willing to maintain it will be found.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-15 13:07:13 +02:00
Ilya Maximets
e2e8d7cd31 Prepare for 3.0.0.
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 23:18:54 +02:00
Adrian Moreno
445dceb884 python: Introduce unit tests.
Use pytest to run unit tests as part of the standard testsuite.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 20:14:24 +02:00
Adrian Moreno
7e588e82f0 python: Add flow filtering syntax.
Based on pyparsing, create a very simple filtering syntax.

It supports basic logic statements (and, &, or, ||, not, !), numerical
operations (<, >), equality (=, !=), and masking (~=). The latter is only
supported in certain fields (IntMask, EthMask, IPMask).

Masking operation is semantically equivalent to "includes",
therefore:

    ip_src ~= 192.168.1.1

means that ip_src field is either a host IP address equal to 192.168.1.1
or an IPMask that includes it (e.g: 192.168.1.1/24).

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 20:14:24 +02:00
Adrian Moreno
dcd17a896c python: Add mask, ip and eth decoders.
Add more decoders that can be used by KVParser.

For IPv4 and IPv6 addresses, create a new class that wraps
netaddr.IPAddress.
For Ethernet addresses, create a new class that wraps netaddr.EUI.
For Integers, create a new class that performs basic bitwise mask
comparisons

netaddr is added as a new shoft dependency:
- extras_require in setup.py
- Suggests in deb and rpm packages

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 17:40:46 +02:00
Greg Rose
3476bd3932 Documentation: Remove kernel module documentation.
As of Open vSwitch release 2.18 the OVS kernel module is no longer
supported.  Pull the documentation references.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 13:45:55 +02:00
Greg Rose
86642de3ad tests: Remove support for check-kmod test.
The OVS kernel module is no longer supported as of OVS 2.18

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 13:45:55 +02:00
Greg Rose
c94ae8a754 rhel: Stop packaging OVS kernel module.
Remove the kernel driver specification for RHEL 6.x, 7.x, 8.x and Fedora.

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 13:45:55 +02:00
Ilya Maximets
16bec677aa debian: Add option to build without DPDK.
Co-authored-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 13:45:55 +02:00
Frode Nordahl
c78e7efa7b docs: Update package references in debian/ubuntu related docs.
On the back of changing the debian/ubuntu packaging, update the
docs to refer to existing packages.

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 13:45:55 +02:00
Emma Finn
529af67146 odp-execute: Add ISA implementation of actions.
This commit adds the AVX512 implementation of the action functionality.

Usage:
  $ ovs-appctl odp-execute/action-impl-set avx512

Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-07-15 11:40:20 +01:00
Kevin Traynor
3757e9f8e9 netdev-dpdk: Add shared mempool config.
Mempools may currently be shared between DPDK ports based
on port MTU and NUMA. With some hint from the user we can
increase the sharing on MTU and hence reduce memory
consumption in many cases.

For example, a port with MTU 9000, uses a mempool with an
mbuf size based on 9000 MTU. A port with MTU 1500, uses a
different mempool with an mbuf size based on 1500 MTU.

In this case, assuming same NUMA, both these ports could
share the 9000 MTU mempool.

The user must give a hint as order of creation of ports and
setting of MTUs may vary and we need to ensure that upgrades
from older OVS versions do not require more memory.

This scheme can also prevent multiple mempools being created
for cases where a port is added picking up a default MTU and
an appropriate mempool, but later has it's MTU changed to a
different value requiring a different mempool.

Example usage:

 $ ovs-vsctl --no-wait set Open_vSwitch . \
   other_config:shared-mempool-config=9000,1500:1,6000:1

Port added on NUMA 0:
* MTU 1500, use mempool based on 9000 MTU
* MTU 5000, use mempool based on 9000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

Port added on NUMA 1:
* MTU 1500, use mempool based on 1500 MTU
* MTU 5000, use mempool based on 6000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

Default behaviour is unchanged and mempools are still only created
when needed.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-07-14 13:17:59 +01:00
Jianbo Liu
eb8ebf8c43 doc: Add meter offload topic document
For now, add introduction and the limitation of meter offload.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:23:00 +02:00
Michael Phelan
87ef13b002 dpdk: Use DPDK 21.11.1 release.
Modify ci linux build script to use the latest DPDK stable release 21.11.1.
Modify Documentation to use the latest DPDK stable release 21.11.1.
Update NEWS file to reflect the latest DPDK stable release 21.11.1.
FAQ is updated to reflect the latest DPDK for each OVS branch.

Signed-off-by: Michael Phelan <michael.phelan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-30 23:21:46 +02:00
Kumar Amber
738c76a503 dpcls: Change info-get function to fetch dpcls usage stats.
Modified the dplcs info-get command output to include
the count for different dpcls implementations.

$ovs-appctl dpif-netdev/subtable-lookup-info-get

Available dpcls implementations:
  autovalidator (Use count: 1, Priority: 5)
  generic (Use count: 0, Priority: 1)
  avx512_gather (Use count: 0, Priority: 3)

Test case to verify changes:
        1061: PMD - dpcls configuration     ok

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-05-24 09:53:18 +01:00
Kevin Traynor
9dd3031d2e Documentation: Fix use of rst verbatim code chunk syntax.
In some places it is using Markdown syntax and in others
it is not needed as there is already a code block.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 21:47:18 +02:00
Cian Ferriter
b91025187b Documentation: Clarify QEMU version requirement.
The QEMU version requirement of >= 2.7 is for vhost-user-client ports
specifically.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 21:31:13 +02:00
Eli Britstein
6882c45d43 docs: Note ALLOW_EXPERIMENTAL_API for tunnel offloads.
Tunnel offload APIs have '__rte_experimental' attribute, therefore
available only if ALLOW_EXPERIMENTAL_API is defined. Documente it.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 21:20:06 +02:00
Andreas Karis
e8515c8cc0 ovs-monitor-ipsec: Allow custom options per tunnel.
Tunnels in LibreSwan and OpenSwan allow for many options to be set on a
per tunnel basis. Pass through any options starting with ipsec_ to the
connection in the configuration file. Administrators are responsible for
picking valid key/value pairs.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 16:30:21 +02:00
Alin-Gabriel Serdean
218dad97da windows: Fix NEWS and add OVS version in FAQ.
This patch removes the newly added NEWS entry and adds it as a leaf
under post 2.17.

Add OVS version instead of specifying that the feature is supported
for IPv6 connection tracking and Genenve IPv6 tunnels.

Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-29 21:09:53 +02:00
ldejing
53b75e91de datapath-windows: Add IPv6 conntrack support on Windows.
Implementation on Windows:
Currently, IPv4 conntrack was supported on the windows platform.
In this patch we have implemented ipv6 conntrack functions according
to the current logic of the IPv4 conntrack. This implementation has
included TcpV6(nat and normal scenario), UdpV6(nat and normal scenario),
IcmpV6 conntrack of echo request/reply packet and
FtpV6(nat and normal scenario).

Testing Topology:
On the Windows VM runs on the ESXi host, two hyper-v ports attached
to the ovs bridge; one hyper-v port worked as client and the
other port worked as server.

Testing Case:
1. TcpV6
  a) Tcp request/reply conntrack for normal scenario.
     In this scenario, 20::1 as client, 20::2 as server, it will generate
     following conntrack entry:
     (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=1556),
      reply(src=20::2,src_port=1556,dst=20::1,dst_port=1555),protocol=tcp)

  b) Tcp request/reply conntrack for nat scenario.
     In this scenario, 20::1 as client, 20::10 as floating ip, 21::3 as server,
     it will generate following conntrack entry:
     (Origin(src=20::1, src_port=1555, dst=20::10, dst_port=1556),
      reply(src=21::3, src_port=1556, dst=20::1, dst_port= 1555),protocol=tcp)

2. UdpV6
  a) Udp request/reply conntrack for normal scenario.
     (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=1556),
      reply(src=20::2,src_port=1556,dst=20::1,dst_port=1555),protocol=udp)
  b) Udp request/reply conntrack for nat scenario.
     (Origin(src=20::1, src_port=1555, dst=20::10, dst_port=1556),
      reply(src=21::3, src_port=1556, dst=20::1, dst_port= 1555),protocol=udp)

3. IcmpV6:
  a) Icmpv6 request/reply conntrack for normal scenario.
     Currently Icmpv6 only support to construct conntrack for
     echo request/reply packet, take (20::1 -> 20::2)  for example,
     it will generate following conntrack entry:
     (origin(src = 20::1, dst=20::2), reply(src=20::2, dst=20::1), protocol=icmp)
  b) Icmp request/reply conntrack for dnat scenario,
     for example (20::1->20::10->21::3), 20::1 is
     client, 20::10 is floating ip, 21::3 is server ip.
     It will generate flow like below:
     (origin(src=20::1, dst=20::10), reply(src=21::3, dst=20::1), protocol=icmp)

4. FtpV6
  a) Ftp request/reply conntrack for normal scenario.
     In this scenario, take 20::1 as client, 20::2 as server, it will generate
     two conntrack entries:
     Ftp active mode
     (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=21),
      reply(src=20::2, src_port=21, dst=20::1, dst_port=1555), protocol=tcp)
     (Origin(src=20::2, src_port=20, dst=20::1, dst_port=1556),
      reply(src=20::1, src_port=1556, dst=20::2, dst_port=20), protocol=tcp)

     Ftp passive mode
     (Origin(src=20::1, src_port=1555, dst=20::2, dst_port=21),
      reply(src=20::2,src_port=21,dst=20::1,dst_port=1555),protocol=tcp)
     (Origin(src=20::1, src_port=1556, dst=20::2, dst_port=1557),
      reply(src=20::2,src_port=1557, dst=20::1, dst_port=1556) protocol=tcp)

  b) Ftp request/reply conntrack for nat scenario.
     Ftp passive mode,
     In this secnario, 20::1 as client, 20::10 as floating ip, 21::3 as server
     ip. It will generate following flow:
     (Origin(src=20::1, src_port=1555, dst=20::10, dst_port=21),
     reply(src=21::3, src_port=21, dst=20::1, dst_port= 1555),protocol=tcp)
     (Origin(src=20::1, src_port=1556, dst=20::10, dst_port=1557),
     reply(src=21::3, src_port=1557, dst=20::1, dst_port= 1556),protocol=tcp)

5. Regression test for IpV4 in Antrea project (about 60 test case)

Future work:
  1) IcmpV6 redirect packet conntrack.
  2) IpV6 fragment support on Udp.
  3) Support napt for IPv6.
  4) FtpV6 active mode for nat.

Signed-off-by: ldejing <ldejing@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2022-04-22 12:08:38 +03:00
Wilson Peng
edb2335861 datapath-windows: Add IPv6 Geneve tunnel support in Windows
In the first step OVS Windows will support IPv6 tunnel(Geneve IPv6 tunnel).

Implementation on Windows
-------------------------
1. For the IPv6 tunnel support,  OvsIPTunnelKey will replace original
OvsIPv4TunnelKey in the related flow context handing.
2. The related src and dst address will be  changed to SOCKADDR_INET type from UINT32.
3. For the IPv6 tunnel,  one node running OVS-Windows could encapsulate IPv4/IPv6
Packets via IPV6 Geneve Tunnel, and the node could also encapsulate IPv4/IPv6 packet
Via IPv4 Geneve tunnel.
4. Related IPHelper data structure will be adapted to support IPv6 Tunnel. In the IPHelper
part the related Windows API(such as GetUnicastIpAddressTable/GetBestRoute2/GetIpNetEntry2/
ResolveIpNetEntry2) and Windows data structure(MIB_IPFORWARD_ROW2/MIB_IPNET_ROW2/IP_ADDRESS_PREFIX)
Have already supported both IPv4 and IPV6. Now OVS Windows has been adjusted some functions
And data structured to support IPV6 tunnel also.
5. OVS_TUNNEL_KEY_ATTR_IPV6_SRC and OVS_TUNNEL_KEY_ATTR_IPV6_DST filed will be supported in
OVS-Windows kernel for IPV6 tunnel.

Testing done.
-------------------------
    Related topo, 1 Windows VM(Win2019) and 2 Ubuntu 16.04 server. Both VMs
Are running on one ESX host.
1. Setup one IPV6 Geneve Tunnel between 1 Windows VM and 1 Ubuntu server.
    Windows VM,  vif0( 6000::2/40.1.1.10) vif1(5000::2)—— Ubuntu VM Eth2(5000::9), name space ns1
    with interface ns1_link_peer(6000::9/40.1.1.2)
    Related tunnnel,
ovs-vsctl.exe add-port br-int bms-tun0 -- set interface bms-tun0 type=Geneve options:csum=true
    options:key=flow options:local_ip="5000::2" options:remote_ip=flow

     In this topo, traffic from Vif0(Win) to ns1_link_peer(Ubuntu) will be gone through the Geneve tunnel
(5000::2—>5000::9) for both IPv4 traffic(40.1.1.10-->40.1.1.2) and IPv6 traffic(6000::2—>6000::9)

2. Setup one IPV4 Geneve Tunnel between Windows VM and 1 Ubuntu server.
    Windows VM,  vif0( 6000::2/40.1.1.10) vif1(50.1.1.11)—— Ubuntu,   Eth2(50.1.1.9), name space ns1
    with interface ns1_link_peer(6000::19/40.1.1.9)
    Related tunnnel,
ovs-vsctl.exe -- set Interface bms-tun0 type=geneve options:csum=true options:key=flow
    options:local_ip="50.1.1.11" options:remote_ip=flow
    In this topo, traffic from Vif0(Win) to ns1_link_peer(Ubuntu) will be gone through the Geneve Tunnel
(50.1.1.11—>50.1.1.9) for both IPv4 traffic(40.1.1.10-->40.1.1.9) and IPv6 traffic(6000::2—>6000::19).

3.Regression test for IpV4 in Antrea project (about 60 test case) is PASS

Future Work
-----------
Add other type IPv6 tunnel support for Gre/Vxlan/Stt.

Signed-off-by: Wilson Peng <pweisong@vmware.com>
Signed-off-by: Alin-Gabriel Serdean <aserdean@ovn.org>
2022-04-10 05:18:59 +03:00
Adrian Moreno
7aff8a5117 sparse: bump recommended version and include headers.
It seems versions older than 0.6.2 generate false positives. Bump the
recommended version and make sure we use the right headers from the ovs
tree.

Suggested-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:03 +02:00
Suneetha Kalahasthi
be93ce40e1 faq: Update OVS/DPDK version table for OVS 2.15/2.16
FAQ is updated to reflect the latest DPDK for OVS branch 2.15 and 2.16

Signed-off-by: Suneetha Kalahasthi <suneetha.kalahasthi@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-03-14 09:34:34 +00:00
Eelco Chaudron
4f933301f0 Documentation: Update USDT documentation to include systemtap dependency.
Update the documentation to include details on SystemTap dependency
when enabling USDT probes.

Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 20:58:51 +01:00
Maxime Coquelin
0bca7fa1a3 Documentation: Fix userspace Tx steering section.
This patch fixes the thread mode part, as the static
thread-to-txq mapping selection depends on whether the
number of queues is strictly greater than the number of
PMD threads, and not greater or equal.

The section is also reworded as per Ilya's suggestion.

Fixes: c18e707b2f25 ("dpif-netdev: Introduce hash-based Tx packet steering mode.")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:38:19 +01:00
Ilya Maximets
280d8de05f Prepare for 2.17.0.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2022-01-19 02:33:05 +01:00
Kevin Traynor
1b9fd884f5 Documentation: Remove experimental tag for PMD ALB.
PMD Auto Load Balance was introduced as an experimental feature in OVS
2.11. It is used to detect that the Rx queue to PMD assignments are no
longer balanced and it would be better to reassign.

It is disabled by default, and can be enabled with:
$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 13:00:09 +01:00
Kevin Traynor
09192a815e Documentation: Update PMD Auto Load Balance section.
Updates to the PMD Auto Load Balance section to make it more readable.

No change to the core content.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 13:00:09 +01:00
Kevin Traynor
5cc0524351 Documentation: Update PMD thread statistics.
'pmd-perf-show' gives some extra information and has nicer
formatting than 'pmd-stats-show'.

Let the user know they can use that as well to get PMD stats.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 12:26:36 +01:00
Kevin Traynor
f0adea3fce Documentation: Minor spelling and grammar fixes.
Some minor spelling and grammar fixes in pmd.rst.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 12:26:36 +01:00
Kevin Traynor
4da71121da Documentation: Fix Rx/Tx queue configuration section.
ovs-vsctl is used to configure physical Rx queues, not ovs-appctl.

Number of Tx queues are configured differently depending on whether
physical or virtual. Present documentation does not distinguish.

Fixes: 31d0dae22a0e ("doc: Add "PMD" topic document")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 12:26:36 +01:00
Eelco Chaudron
85d3785e6a utilities: Add netlink flow operation USDT probes and upcall_cost script.
This patch adds a series of NetLink flow operation USDT probes.
These probes are in turn used in the upcall_cost Python script,
which in addition of some kernel tracepoints, give an insight into
the time spent on processing upcall.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 00:46:30 +01:00
Eelco Chaudron
51ec98635e utilities: Add upcall USDT probe and associated script.
Added the dpif_recv:recv_upcall USDT probe, which is used by the
included upcall_monitor.py script. This script receives all upcall
packets sent by the kernel to ovs-vswitchd. By default, it will
show all  upcall events, which looks something like this:

 TIME               CPU  COMM      PID      DPIF_NAME          TYPE PKT_LEN FLOW_KEY_LEN
 5952147.003848809  2    handler4  1381158  system@ovs-system  0    98      132
 5952147.003879643  2    handler4  1381158  system@ovs-system  0    70      160
 5952147.003914924  2    handler4  1381158  system@ovs-system  0    98      152

It can also dump the packet and NetLink content, and if required,
the packets can also be written to a pcap file.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 00:46:30 +01:00
Eelco Chaudron
ff4c712d45 Documentation: Add USDT documentation and bpftrace example.
Add the USDT documentation and a bpftrace example using the
bridge run USDT probes.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 00:46:30 +01:00
Maxime Coquelin
c18e707b2f dpif-netdev: Introduce hash-based Tx packet steering mode.
This patch adds a new hash Tx steering mode that
distributes the traffic on all the Tx queues, whatever the
number of PMD threads. It would be useful for guests
expecting traffic to be distributed on all the vCPUs.

The idea here is to re-use the 5-tuple hash of the packets,
already computed to build the flows batches (and so it
does not provide flexibility on which fields are part of
the hash).

There are also no user-configurable indirection table,
given the feature is transparent to the guest. The queue
selection is just a modulo operation between the packet
hash and the number of Tx queues.

There are no (at least intentionnally) functionnal changes
for the existing XPS and static modes. There should not be
noticeable performance changes for these modes (only one
more branch in the hot path).

For the hash mode, performance could be impacted due to
locking when multiple PMD threads are in use (same as
XPS mode) and also because of the second level of batching.

Regarding the batching, the existing Tx port output_pkts
is not modified. It means that at maximum, NETDEV_MAX_BURST
can be batched for all the Tx queues. A second level of
batching is done in dp_netdev_pmd_flush_output_on_port(),
only for this hash mode.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-17 18:07:00 +01:00
Martin Varghese
1917ace893 Encap & Decap actions for MPLS packet type.
The encap & decap actions are extended to support MPLS packet type.
Encap & decap actions adds and removes MPLS header at start of the
packet.

The existing PUSH MPLS & POP MPLS actions inserts & removes MPLS
header between ethernet header and the IP header. Though this behaviour
is fine for L3 VPN where an IP packet is encapsulated inside a MPLS
tunnel, it does not suffice the L2 VPN requirements. In L2 VPN the
ethernet packets must be encapsulated inside MPLS tunnel.

In this change the encap & decap actions are extended to support MPLS
packet type. The encap & decap adds and removes MPLS header at the
start of packet as depicted below.

Encapsulation:

Actions - encap(mpls),encap(ethernet)

Incoming packet -> | ETH | IP | Payload |

1 Actions -  encap(mpls) [Datapath action - ADD_MPLS:0x8847]

        Outgoing packet -> | MPLS | ETH | Payload|

2 Actions - encap(ethernet) [ Datapath action - push_eth ]

        Outgoing packet -> | ETH | MPLS | ETH | Payload|

Decapsulation:

Incoming packet -> | ETH | MPLS | ETH | IP | Payload |

Actions - decap(),decap(packet_type(ns=0,type=0))

1 Actions -  decap() [Datapath action - pop_eth)

        Outgoing packet -> | MPLS | ETH | IP | Payload|

2 Actions - decap(packet_type(ns=0,type=0)) [Datapath action - POP_MPLS:0x6558]

        Outgoing packet -> | ETH  | IP | Payload|

Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-17 02:04:20 +01:00
David Marchand
b84386fa9a dpdk: Support running PMD threads on any core.
Previously in OVS, a PMD thread running on cpu X used lcore X.
This assumption limited OVS to run PMD threads on physical cpu <
RTE_MAX_LCORE.

DPDK 20.08 introduced a new API that associates a non-EAL thread to a free
lcore. This new API does not change the thread characteristics (like CPU
affinity) and let OVS run its PMD threads on any cpu regardless of
RTE_MAX_LCORE.

The DPDK multiprocess feature is not compatible with this new API and is
disabled.

DPDK still limits the number of lcores to RTE_MAX_LCORE (128 on x86_64)
which should be enough for OVS pmd threads (hopefully).

DPDK lcore/OVS pmd threads mapping are logged at threads when trying to
attach a OVS PMD thread, and when detaching.
A new command is added to help get DPDK point of view of the DPDK lcores
at any time:

$ ovs-appctl dpdk/lcore-list
lcore 0, socket 0, role RTE, cpuset 0
lcore 1, socket 0, role NON_EAL, cpuset 1
lcore 2, socket 0, role NON_EAL, cpuset 15

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-11 21:34:56 +01:00
Ilya Maximets
e7e9973b80 dpif-netdev: Forwarding optimization for flows with a simple match.
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::

  "in_port=1,actions=2"
  "in_port=2,actions=IN_PORT"
  "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
  "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"

There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.

In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules.  "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.

Design:

Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci (CFI + VID) - in most cases
  - nw_frag - for ip packets

Not all of these fields are related to packet itself.  We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing.  It also seems safe to assume that we're working
with Ethernet packets.  So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.

'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization.  CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
the simple match mark.

New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only.  'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:

  - 'recirc_id' in flow match is 0.
  - 'packet_type' in flow match is Ethernet.
  - Flow wildcards contains only minimal set of non-wildcarded fields
    (listed above).

If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows.  This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required.  So, in this case EMC and
SMC lookups are disabled.  We may optimize this path in the future by
bypassing the dpcls lookup too.

Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Performance results when compared to EMC:

Test setup:

             virtio-user   OVS    virtio-user
  Testpmd1  ------------>  pmd1  ------------>  Testpmd2
  (txonly)       x<------  pmd2  <------------ (mac swap)

Single stream of 64byte packets.  Actions:
  in_port=vhost0,actions=vhost1
  in_port=vhost1,actions=vhost0

Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt   :     Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy :     Testpmd2 ------> pmd2 --->x   Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets.  This should be closer to the performance of a
VM-to-Phy scenario.

Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.

 +----------------+------------------------+------------------------+
 |                |    Default (-g -O2)    | "-Ofast -march=native" |
 |   Scenario     +------------+-----------+------------+-----------+
 |                |     GCC    |   Clang   |     GCC    |   Clang   |
 +----------------+------------+-----------+------------+-----------+
 | Virt-to-Virt   |    +18.9%  |   +25.5%  |    +10.8%  |   +16.7%  |
 | Virt-to-NoCopy |    +24.3%  |   +33.7%  |    +14.9%  |   +22.0%  |
 +----------------+------------+-----------+------------+-----------+

For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality.  Performance
difference for the non-simple flows is within a margin of error.

Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-07 20:32:20 +01:00
Nobuhiro MIKI
9a834205a4 docs: afxdp: Remove duplicated lines.
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-04 17:16:03 +01:00
Ilya Maximets
9827312fa4 docs: Re-work the documentation around CPU ISA optimizations.
Few problems with a current documentation:

1. bridge.rst is the high-level documentation for the end user.
   Unit testing and complex implementation details are for developers,
   hence should not be there.  Testing instructions for developers
   should be in testing.rst.  Words in the doc should be understandable
   for the user who doesn't know OVS internals.

2. Some paragraphs in the current documentation are repeating each
   other almost to the word.

3. Some paragraphs are incorrectly formatted.  That affects the
   rendering.

4. There is no point describing every separate test of a system-dpdk
   testsuite.

What is done:

1. All the testing related paragraphs are consolidated and moved
   to the testing.rst.

2. Most of abbreviations replaced with more readable and understandable
   for the end user words.

3. Meaning or the purpose of several sentences I failed to understand,
   therefore just deleted.

4. Fixed formatting and a few typos along the way.

IMO, some parts of the doc still needs some re-wording, but this change
provides at least a starting point for improvement setting a better
structure for the document.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-12-15 10:41:37 +01:00
Ian Stokes
17346b3899 dpdk: Update to use DPDK v21.11.
This commit adds support for DPDK v21.11, it includes the following
changes.

1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*

5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*

In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.

For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-12-09 18:40:14 +00:00
lic121
2fe20d0bed docs/dpdk: Fix install doc.
Remove bad quotes.

Signed-off-by: lic121 <lic121@chinatelecom.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-30 01:24:00 +01:00
Suneetha Kalahasthi
3b2982c423 faq: Update OVS/DPDK version table for OVS 2.13/2.14.
FAQ is updated to reflect the latest DPDK for OVS branch 2.13 and 2.14

Signed-off-by: Suneetha Kalahasthi <suneetha.kalahasthi@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-11-29 21:17:46 +01:00
Cian Ferriter
d41cac475c docs/userspace-tunneling: Fix IP addresses for host2.
The IP addresses being recommended for the VM interface and the
"remote_ip" on the tunnel port are wrong. The host1 values were being
used before. Update to use the host2 values.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-10-12 17:49:11 +02:00