2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

179 Commits

Author SHA1 Message Date
Greg Rose
83c9518e7c xenserver: Remove xenserver.
Remove the current xenserver implementation - it is obsolete and
since 3.0 we do not support kernel module builds [1].

1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html

[i.maximets]
Can be added back if people willing to maintain it will be found.

Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-15 13:07:13 +02:00
Adrian Moreno
7e588e82f0 python: Add flow filtering syntax.
Based on pyparsing, create a very simple filtering syntax.

It supports basic logic statements (and, &, or, ||, not, !), numerical
operations (<, >), equality (=, !=), and masking (~=). The latter is only
supported in certain fields (IntMask, EthMask, IPMask).

Masking operation is semantically equivalent to "includes",
therefore:

    ip_src ~= 192.168.1.1

means that ip_src field is either a host IP address equal to 192.168.1.1
or an IPMask that includes it (e.g: 192.168.1.1/24).

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 20:14:24 +02:00
Adrian Moreno
dcd17a896c python: Add mask, ip and eth decoders.
Add more decoders that can be used by KVParser.

For IPv4 and IPv6 addresses, create a new class that wraps
netaddr.IPAddress.
For Ethernet addresses, create a new class that wraps netaddr.EUI.
For Integers, create a new class that performs basic bitwise mask
comparisons

netaddr is added as a new shoft dependency:
- extras_require in setup.py
- Suggests in deb and rpm packages

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 17:40:46 +02:00
Greg Rose
86642de3ad tests: Remove support for check-kmod test.
The OVS kernel module is no longer supported as of OVS 2.18

Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-15 13:45:55 +02:00
Emma Finn
529af67146 odp-execute: Add ISA implementation of actions.
This commit adds the AVX512 implementation of the action functionality.

Usage:
  $ ovs-appctl odp-execute/action-impl-set avx512

Signed-off-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-07-15 11:40:20 +01:00
Kevin Traynor
3757e9f8e9 netdev-dpdk: Add shared mempool config.
Mempools may currently be shared between DPDK ports based
on port MTU and NUMA. With some hint from the user we can
increase the sharing on MTU and hence reduce memory
consumption in many cases.

For example, a port with MTU 9000, uses a mempool with an
mbuf size based on 9000 MTU. A port with MTU 1500, uses a
different mempool with an mbuf size based on 1500 MTU.

In this case, assuming same NUMA, both these ports could
share the 9000 MTU mempool.

The user must give a hint as order of creation of ports and
setting of MTUs may vary and we need to ensure that upgrades
from older OVS versions do not require more memory.

This scheme can also prevent multiple mempools being created
for cases where a port is added picking up a default MTU and
an appropriate mempool, but later has it's MTU changed to a
different value requiring a different mempool.

Example usage:

 $ ovs-vsctl --no-wait set Open_vSwitch . \
   other_config:shared-mempool-config=9000,1500:1,6000:1

Port added on NUMA 0:
* MTU 1500, use mempool based on 9000 MTU
* MTU 5000, use mempool based on 9000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

Port added on NUMA 1:
* MTU 1500, use mempool based on 1500 MTU
* MTU 5000, use mempool based on 6000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

Default behaviour is unchanged and mempools are still only created
when needed.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-07-14 13:17:59 +01:00
Kumar Amber
738c76a503 dpcls: Change info-get function to fetch dpcls usage stats.
Modified the dplcs info-get command output to include
the count for different dpcls implementations.

$ovs-appctl dpif-netdev/subtable-lookup-info-get

Available dpcls implementations:
  autovalidator (Use count: 1, Priority: 5)
  generic (Use count: 0, Priority: 1)
  avx512_gather (Use count: 0, Priority: 3)

Test case to verify changes:
        1061: PMD - dpcls configuration     ok

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2022-05-24 09:53:18 +01:00
Kevin Traynor
9dd3031d2e Documentation: Fix use of rst verbatim code chunk syntax.
In some places it is using Markdown syntax and in others
it is not needed as there is already a code block.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 21:47:18 +02:00
Cian Ferriter
b91025187b Documentation: Clarify QEMU version requirement.
The QEMU version requirement of >= 2.7 is for vhost-user-client ports
specifically.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-04 21:31:13 +02:00
Eelco Chaudron
4f933301f0 Documentation: Update USDT documentation to include systemtap dependency.
Update the documentation to include details on SystemTap dependency
when enabling USDT probes.

Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-16 20:58:51 +01:00
Maxime Coquelin
0bca7fa1a3 Documentation: Fix userspace Tx steering section.
This patch fixes the thread mode part, as the static
thread-to-txq mapping selection depends on whether the
number of queues is strictly greater than the number of
PMD threads, and not greater or equal.

The section is also reworded as per Ilya's suggestion.

Fixes: c18e707b2f25 ("dpif-netdev: Introduce hash-based Tx packet steering mode.")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-31 21:38:19 +01:00
Kevin Traynor
1b9fd884f5 Documentation: Remove experimental tag for PMD ALB.
PMD Auto Load Balance was introduced as an experimental feature in OVS
2.11. It is used to detect that the Rx queue to PMD assignments are no
longer balanced and it would be better to reassign.

It is disabled by default, and can be enabled with:
$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 13:00:09 +01:00
Kevin Traynor
09192a815e Documentation: Update PMD Auto Load Balance section.
Updates to the PMD Auto Load Balance section to make it more readable.

No change to the core content.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 13:00:09 +01:00
Kevin Traynor
5cc0524351 Documentation: Update PMD thread statistics.
'pmd-perf-show' gives some extra information and has nicer
formatting than 'pmd-stats-show'.

Let the user know they can use that as well to get PMD stats.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 12:26:36 +01:00
Kevin Traynor
f0adea3fce Documentation: Minor spelling and grammar fixes.
Some minor spelling and grammar fixes in pmd.rst.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 12:26:36 +01:00
Kevin Traynor
4da71121da Documentation: Fix Rx/Tx queue configuration section.
ovs-vsctl is used to configure physical Rx queues, not ovs-appctl.

Number of Tx queues are configured differently depending on whether
physical or virtual. Present documentation does not distinguish.

Fixes: 31d0dae22a0e ("doc: Add "PMD" topic document")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 12:26:36 +01:00
Eelco Chaudron
85d3785e6a utilities: Add netlink flow operation USDT probes and upcall_cost script.
This patch adds a series of NetLink flow operation USDT probes.
These probes are in turn used in the upcall_cost Python script,
which in addition of some kernel tracepoints, give an insight into
the time spent on processing upcall.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 00:46:30 +01:00
Eelco Chaudron
51ec98635e utilities: Add upcall USDT probe and associated script.
Added the dpif_recv:recv_upcall USDT probe, which is used by the
included upcall_monitor.py script. This script receives all upcall
packets sent by the kernel to ovs-vswitchd. By default, it will
show all  upcall events, which looks something like this:

 TIME               CPU  COMM      PID      DPIF_NAME          TYPE PKT_LEN FLOW_KEY_LEN
 5952147.003848809  2    handler4  1381158  system@ovs-system  0    98      132
 5952147.003879643  2    handler4  1381158  system@ovs-system  0    70      160
 5952147.003914924  2    handler4  1381158  system@ovs-system  0    98      152

It can also dump the packet and NetLink content, and if required,
the packets can also be written to a pcap file.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 00:46:30 +01:00
Eelco Chaudron
ff4c712d45 Documentation: Add USDT documentation and bpftrace example.
Add the USDT documentation and a bpftrace example using the
bridge run USDT probes.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 00:46:30 +01:00
Maxime Coquelin
c18e707b2f dpif-netdev: Introduce hash-based Tx packet steering mode.
This patch adds a new hash Tx steering mode that
distributes the traffic on all the Tx queues, whatever the
number of PMD threads. It would be useful for guests
expecting traffic to be distributed on all the vCPUs.

The idea here is to re-use the 5-tuple hash of the packets,
already computed to build the flows batches (and so it
does not provide flexibility on which fields are part of
the hash).

There are also no user-configurable indirection table,
given the feature is transparent to the guest. The queue
selection is just a modulo operation between the packet
hash and the number of Tx queues.

There are no (at least intentionnally) functionnal changes
for the existing XPS and static modes. There should not be
noticeable performance changes for these modes (only one
more branch in the hot path).

For the hash mode, performance could be impacted due to
locking when multiple PMD threads are in use (same as
XPS mode) and also because of the second level of batching.

Regarding the batching, the existing Tx port output_pkts
is not modified. It means that at maximum, NETDEV_MAX_BURST
can be batched for all the Tx queues. A second level of
batching is done in dp_netdev_pmd_flush_output_on_port(),
only for this hash mode.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-17 18:07:00 +01:00
Ilya Maximets
e7e9973b80 dpif-netdev: Forwarding optimization for flows with a simple match.
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::

  "in_port=1,actions=2"
  "in_port=2,actions=IN_PORT"
  "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
  "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"

There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.

In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules.  "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.

Design:

Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci (CFI + VID) - in most cases
  - nw_frag - for ip packets

Not all of these fields are related to packet itself.  We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing.  It also seems safe to assume that we're working
with Ethernet packets.  So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.

'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization.  CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
the simple match mark.

New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only.  'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:

  - 'recirc_id' in flow match is 0.
  - 'packet_type' in flow match is Ethernet.
  - Flow wildcards contains only minimal set of non-wildcarded fields
    (listed above).

If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows.  This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required.  So, in this case EMC and
SMC lookups are disabled.  We may optimize this path in the future by
bypassing the dpcls lookup too.

Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Performance results when compared to EMC:

Test setup:

             virtio-user   OVS    virtio-user
  Testpmd1  ------------>  pmd1  ------------>  Testpmd2
  (txonly)       x<------  pmd2  <------------ (mac swap)

Single stream of 64byte packets.  Actions:
  in_port=vhost0,actions=vhost1
  in_port=vhost1,actions=vhost0

Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt   :     Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy :     Testpmd2 ------> pmd2 --->x   Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets.  This should be closer to the performance of a
VM-to-Phy scenario.

Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.

 +----------------+------------------------+------------------------+
 |                |    Default (-g -O2)    | "-Ofast -march=native" |
 |   Scenario     +------------+-----------+------------+-----------+
 |                |     GCC    |   Clang   |     GCC    |   Clang   |
 +----------------+------------+-----------+------------+-----------+
 | Virt-to-Virt   |    +18.9%  |   +25.5%  |    +10.8%  |   +16.7%  |
 | Virt-to-NoCopy |    +24.3%  |   +33.7%  |    +14.9%  |   +22.0%  |
 +----------------+------------+-----------+------------+-----------+

For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality.  Performance
difference for the non-simple flows is within a margin of error.

Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-07 20:32:20 +01:00
Ilya Maximets
9827312fa4 docs: Re-work the documentation around CPU ISA optimizations.
Few problems with a current documentation:

1. bridge.rst is the high-level documentation for the end user.
   Unit testing and complex implementation details are for developers,
   hence should not be there.  Testing instructions for developers
   should be in testing.rst.  Words in the doc should be understandable
   for the user who doesn't know OVS internals.

2. Some paragraphs in the current documentation are repeating each
   other almost to the word.

3. Some paragraphs are incorrectly formatted.  That affects the
   rendering.

4. There is no point describing every separate test of a system-dpdk
   testsuite.

What is done:

1. All the testing related paragraphs are consolidated and moved
   to the testing.rst.

2. Most of abbreviations replaced with more readable and understandable
   for the end user words.

3. Meaning or the purpose of several sentences I failed to understand,
   therefore just deleted.

4. Fixed formatting and a few typos along the way.

IMO, some parts of the doc still needs some re-wording, but this change
provides at least a starting point for improvement setting a better
structure for the document.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2021-12-15 10:41:37 +01:00
Ian Stokes
17346b3899 dpdk: Update to use DPDK v21.11.
This commit adds support for DPDK v21.11, it includes the following
changes.

1. ci: Install python elftools for DPDK 21.02.
2. ci: Update meson requirement for DPDK 21.05.
3. netdev-dpdk: Fix build with 21.05.
4. ci: Compile DPDK in non developer mode.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=242480&state=*

5. netdev-dpdk: Remove access to DPDK internals.
6. netdev-dpdk: Remove unused attribute from rte_flow rule.
7. netdev-dpdk: Fix mbuf macros namespace with 21.11-rc1.
8. netdev-dpdk: Fix vhost namespace with 21.11-rc2.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=271159&state=*

In addition documentation and DPDK unit tests were also updated in this
commit for use with DPDK v21.11.

For credit all authors of the original commits to 'dpdk-latest' with the above
changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Emma Finn <emma.finn"intel.com>
Tested-by: Seamus Ryan <seamus.ryan@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-12-09 18:40:14 +00:00
Kevin Traynor
0eeca50f06 Documentation: Cleanup PMD information.
The 'Port/Rx Queue Assigment to PMD Threads' section has
expanded over time and now includes info about stats/commands,
manual pinning and different options for OVS assigning Rxqs to
PMDs.

Split them into different sections with sub-headings and move
the two similar paragraphs about stats together.

Rename 'Automatic assignment of Port/Rx Queue to PMD Threads'
section to 'PMD Automatic Load Balance'.

A few other minor cleanups as I was reading.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-16 00:47:47 +02:00
Cian Ferriter
8a5f055a0b docs/dpdk/bridge: Fix dpif-netdev/miniflow-parser-set formatting
The "name" parameter isn't optional so don't use brackets around it.

Fixes: 5c5c98cec21b ("docs/dpdk/bridge: Add miniflow extract section.")
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-08-16 15:00:15 +01:00
David Marchand
9547987526 Documentation: Remove duplicate words.
This is a simple cleanup with a script of mine.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-07-19 09:33:01 -07:00
David Marchand
3222a89d9a dpif-netdev: Report overhead busy cycles per pmd.
Users complained that per rxq pmd usage was confusing: summing those
values per pmd would never reach 100% even if increasing traffic load
beyond pmd capacity.

This is because the dpif-netdev/pmd-rxq-show command only reports "pure"
rxq cycles while some cycles are used in the pmd mainloop and adds up to
the total pmd load.

dpif-netdev/pmd-stats-show does report per pmd load usage.
This load is measured since the last dpif-netdev/pmd-stats-clear call.
On the other hand, the per rxq pmd usage reflects the pmd load on a 10s
sliding window which makes it non trivial to correlate.

Gather per pmd busy cycles with the same periodicity and report the
difference as overhead in dpif-netdev/pmd-rxq-show so that we have all
info in a single command.

Example:
$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 1 core_id 3:
  isolated : true
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 90 %
  overhead:  4 %
pmd thread numa_id 1 core_id 5:
  isolated : false
  port: vhost0            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost1            queue-id:  0 (enabled)   pmd usage: 93 %
  port: vhost2            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost6            queue-id:  0 (enabled)   pmd usage:  0 %
  overhead:  6 %
pmd thread numa_id 1 core_id 31:
  isolated : true
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 86 %
  overhead:  4 %
pmd thread numa_id 1 core_id 33:
  isolated : false
  port: vhost3            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost4            queue-id:  0 (enabled)   pmd usage:  0 %
  port: vhost5            queue-id:  0 (enabled)   pmd usage: 92 %
  port: vhost7            queue-id:  0 (enabled)   pmd usage:  0 %
  overhead:  7 %

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 17:43:42 +01:00
Kevin Traynor
6193e03267 dpif-netdev: Allow pin rxq and non-isolate PMD.
Pinning an rxq to a PMD with pmd-rxq-affinity may be done for
various reasons such as reserving a full PMD for an rxq, or to
ensure that multiple rxqs from a port are handled on different PMDs.

Previously pmd-rxq-affinity always isolated the PMD so no other rxqs
could be assigned to it by OVS. There may be cases where there is
unused cycles on those pmds and the user would like other rxqs to
also be able to be assigned to it by OVS.

Add an option to pin the rxq and non-isolate the PMD. The default
behaviour is unchanged, which is pin and isolate the PMD.

In order to pin and non-isolate:
ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false

Note this is available only with group assignment type, as pinning
conflicts with the operation of the other rxq assignment algorithms.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 16:51:57 +01:00
Kevin Traynor
3dd050909a dpif-netdev: Add group rxq scheduling assignment type.
Add an rxq scheduling option that allows rxqs to be grouped
on a pmd based purely on their load.

The current default 'cycles' assignment sorts rxqs by measured
processing load and then assigns them to a list of round robin PMDs.
This helps to keep the rxqs that require most processing on different
cores but as it selects the PMDs in round robin order, it equally
distributes rxqs to PMDs.

'cycles' assignment has the advantage in that it separates the most
loaded rxqs from being on the same core but maintains the rxqs being
spread across a broad range of PMDs to mitigate against changes to
traffic pattern.

'cycles' assignment has the disadvantage that in order to make the
trade off between optimising for current traffic load and mitigating
against future changes, it tries to assign and equal amount of rxqs
per PMD in a round robin manner and this can lead to a less than optimal
balance of the processing load.

Now that PMD auto load balance can help mitigate with future changes in
traffic patterns, a 'group' assignment can be used to assign rxqs based
on their measured cycles and the estimated running total of the PMDs.

In this case, there is no restriction about keeping equal number of
rxqs per PMD as it is purely load based.

This means that one PMD may have a group of low load rxqs assigned to it
while another PMD has one high load rxq assigned to it, as that is the
best balance of their measured loads across the PMDs.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 16:51:47 +01:00
Kevin Traynor
4fb54652e0 dpif-netdev: Assign PMD for failed pinned rxqs.
Previously, if pmd-rxq-affinity was used to pin an rxq to
a core that was not in pmd-cpu-mask the rxq was not polled
for and the user received a warning. This meant that no traffic
would be received from that rxq.

Now that pinned and non-pinned rxqs are assigned to PMDs in
a common call to rxq scheduling, if an invalid core is
selected in pmd-rxq-affinity the rxq can be assigned an
available PMD (if any).

A warning will still be logged as the requested core could
not be used.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 16:51:35 +01:00
Harry van Haaren
a72c1dfbd5 dpif/dpcls: limit count subtable search info logs
This commit avoids many instances of "using subtable X for miniflow (x,y)"
in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
when no specialized subtable is found, and the generic "_any" version of
the avx512 subtable search implementation was used. This change logs the
subtable usage once, avoiding duplicates.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: kumar Amber <kumar.amber@intel.com>
Co-authored-by: kumar Amber <kumar.amber@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 12:02:43 +01:00
Kumar Amber
50be6715c0 test/sytem-dpdk: Add unit test for mfex autovalidator
Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy
  8: OVS-DPDK - MFEX Configuration

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 11:30:31 +01:00
Kumar Amber
a395b132b7 dpif-netdev: Add packet count and core id paramters for study
This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 11:11:13 +01:00
Kumar Amber
5324b54e60 dpif-netdev: Add configure to enable autovalidator at build time.
This commit adds a new command to allow the user to enable
autovalidatior by default at build time thus allowing for
runnig unit test by default.

 $ ./configure --enable-mfex-default-autovalidator

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 10:34:49 +01:00
Kumar Amber
5c5c98cec2 docs/dpdk/bridge: Add miniflow extract section.
This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added miniflow functionality. The newly added commands are
documented, and sample output is provided.

The use of auto-validator and special study function is also described
in detail as well as running fuzzy tests.

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 10:32:41 +01:00
Ilya Maximets
3e82604b7c docs: Add documentation for ovsdb relay mode.
Main documentation for the service model and tutorial with the use case
and configuration examples.

Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-15 22:38:52 +02:00
Harry van Haaren
3f86fdf5ce dpif-netdev: Add command to get dpif implementations.
This commit adds a new command to retrieve the list of available
DPIF implementations. This can be used by to check what implementations
of the DPIF are available in any given OVS binary. It also returns which
implementations are in use by the OVS PMD threads.

Usage:
 $ ovs-appctl dpif-netdev/dpif-impl-get

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:38 +01:00
Harry van Haaren
abb807e27d dpif-netdev: Add command to switch dpif implementation.
This commit adds a new command to allow the user to switch
the active DPIF implementation at runtime. A probe function
is executed before switching the DPIF implementation, to ensure
the CPU is capable of running the ISA required. For example, the
below code will switch to the AVX512 enabled DPIF assuming
that the runtime CPU is capable of running AVX512 instructions:

 $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512

A new configuration flag is added to allow selection of the
default DPIF. This is useful for running the unit-tests against
the available DPIF implementations, without modifying each unit test.

The design of the testing & validation for ISA optimized DPIF
implementations is based around the work already upstream for DPCLS.
Note however that a DPCLS lookup has no state or side-effects, allowing
the auto-validator implementation to perform multiple lookups and
provide consistent statistic counters.

The DPIF component does have state, so running two implementations in
parallel and comparing output is not a valid testing method, as there
are changes in DPIF statistic counters (side effects). As a result, the
DPIF is tested directly against the unit-tests.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:24 +01:00
Yunjian Wang
00c1bce13a docs: fix wrong quote
Remove the coma character by using ' or " character.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-07-06 15:40:03 -07:00
Ilya Maximets
210c4cba9b docs: Add a topic about record/replay with ovsdb-server.
Also added a NEWS entry.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
2021-06-07 21:03:16 +02:00
Kevin Traynor
ec68a877db dpif-netdev: Allow PMD auto load balance with cross-numa.
Previously auto load balance did not trigger a reassignment when
there was any cross-numa polling as an rxq could be polled from a
different numa after reassign and it could impact estimates.

In the case where there is only one numa with pmds available, the
same numa will always poll before and after reassignment, so estimates
are valid. Allow PMD auto load balance to trigger a reassignment in
this case.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-03-22 12:28:49 +01:00
William Tu
f013e6f2d6 Documentation: Fix DPDK qos example.
Fix the example use case based on the decription.
EIR and CIR are measured in bytes/sec and considered 64-byte
IP packets size withtout 14-byte Ethernet header.
So fix the 1000pps example by: (64 - 14) * 1000 = 50,000
If the frame includes 4-byte FCS header, then it's
(64 - 14 - 4) * 1000 = 46,000

Fixes: e61bdffc2a98 ("netdev-dpdk: Add new DPDK RFC 4115 egress policer")
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-03-01 19:37:57 +01:00
Kevin Traynor
1560b46673 docs: Update for auto load balance threshold parameters.
Update the docs to remove the previously hardcoded values
and mention the load and improvement thresholds when
describing the operation of auto load balance.

Fixes: 62ab5594c20c ("dpif-netdev: Add parameters to configure PMD auto load balance.")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-02-12 10:20:35 +00:00
Ian Stokes
252e1e5764 dpdk: Update to use DPDK v20.11.
This commit adds support for DPDK v20.11, it includes the following
changes.

1. travis: Remove explicit DPDK kmods configuration.
2. sparse: Fix build with 20.05 DPDK tracepoints.
3. netdev-dpdk: Remove experimental API flag.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=173216&state=*

4. sparse: Update to DPDK 20.05 trace point header.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=179604&state=*

5. sparse: Fix build with DPDK 20.08.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=200181&state=*

6. build: Add support for DPDK meson build.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=199138&state=*

7. netdev-dpdk: Remove usage of RTE_ETH_DEV_CLOSE_REMOVE flag.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=207850&state=*

8. netdev-dpdk: Fix build with 20.11-rc1.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=209006&state=*

9. sparse: Fix __ATOMIC_* redefinition errors

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=209452&state=*

10. build: Remove DPDK make build references.

   http://patchwork.ozlabs.org/project/openvswitch/list/?series=216682&state=*

For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Sunil Pai G <sunil.pai.g@intel.com>
Co-authored-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Co-authored-by: Eli Britstein <elibr@nvidia.com>
Tested-by: Harry van Haaren <harry.van.haaren@intel.com>
Tested-by: Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com>
Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-12-16 17:44:06 +00:00
Gaetan Rivet
f4336f504b netdev-dpdk: Add option to configure VF MAC address.
In some cloud topologies, using DPDK VF representors in guest requires
configuring a VF before it is assigned to the guest.

A first basic option for such configuration is setting the VF MAC
address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
table.

This option can be used as such:

   $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
      options:dpdk-vf-mac=00:11:22:33:44:55

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eli Britstein <elibr@nvidia.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-11-16 17:47:11 +01:00
Ben Pfaff
91fc374a9c Eliminate use of term "slave" in bond, LACP, and bundle contexts.
The new term is "member".

Most of these changes should not change user-visible behavior.  One
place where they do is in "ovs-ofctl dump-flows", which will now output
"members:..." inside "bundle" actions instead of "slaves:...".  I don't
expect this to cause real problems in most systems.  The old syntax
is still supported on input for backward compatibility.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
2020-10-21 11:28:24 -07:00
Ben Pfaff
8205fbc8f5 Eliminate "whitelist" and "blacklist" terms.
There is one remaining use under datapath.  That change should happen
upstream in Linux first according to our usual policy.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
2020-10-16 19:22:24 -07:00
Ben Pfaff
807152a4dd Use primary/secondary, not master/slave, as names for OpenFlow roles.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
2020-10-16 19:10:02 -07:00
Tomasz Konieczny
39fbd2c3f0 docs: Add flow control on i40e issue
There is an issue with flow control configuration on i40e devices
and it has a work around. We add this to documentation as known issue
until a permanent solution is developed.

Signed-off-by: Tomasz Konieczny <tomaszx.konieczny@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-10-06 14:52:55 +01:00
Ian Stokes
86f624e486 DPDK: Remove support for vhost-user zero-copy.
Support for vhost-user dequeue zero-copy was deprecated in OVS 2.14 with
the aim of removing it for OVS 2.15.

OVS only supports zero copy for vhost client mode, as such it will cease
to function due to DPDK commit [1]

Also DPDK is set to remove zero-copy functionality in DPDK 20.11 as
referenced by commit [2]

As such remove support from OVS.

[1] 715070ea10e6 ("vhost: prevent zero-copy with incompatible client mode")
[2] d21003c9dafa ("doc: announce removal of vhost zero-copy dequeue")

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
2020-10-05 16:05:25 +01:00