2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 13:27:59 +00:00

829 Commits

Author SHA1 Message Date
Kevin Traynor
0efefc4f98 dpif-netdev: Sort PMD list by core id for rxq scheduling.
The list of PMDs is round robined through for the selection
when assigning an rxq to a PMD. The list is based on a
hash map, so there is no defined order.

It means the same set of PMDs may get assigned different rxqs
on different runs for no reason other than how the PMDs are stored
in the hash map.

This can be easily changed by sorting the PMDs by core id after
they are extracted, so the PMDs will be used in a consistent order.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 16:51:22 +01:00
Kevin Traynor
58fed7e8d8 dpif-netdev: Make PMD auto load balance use common rxq scheduling.
PMD auto load balance had its own separate implementation of the
rxq scheduling that it used for dry runs. This was done because
previously the rxq scheduling was not made reusable for a dry run.

Apart from the code duplication (which is a good enough reason
to replace it alone) this meant that if any further rxq scheduling
changes or assignment types were added they would also have to be
duplicated in the auto load balance code too.

This patch replaces the current PMD auto load balance rxq scheduling
code to reuse the common rxq scheduling code.

The behaviour does not change from a user perspective, except the logs
are updated to be more consistent.

As the dry run will compare the pmd load variances for current and
estimated assignments, new functions are added to populate the current
assignments and use the rxq scheduling data structs for variance
calculations.

Now that the new rxq scheduling data structures are being used in
PMD auto load balance, the older rr_* data structs and associated
functions can be removed.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 16:51:12 +01:00
Kevin Traynor
f577c2d046 dpif-netdev: Rework rxq scheduling code.
This reworks the current rxq scheduling code to break it into more
generic and reusable pieces.

The behaviour does not change from a user perspective, except the logs
are updated to be more consistent.

From an implementation view, there are some changes with mind to
extending functionality.

The high level reusable functions added in this patch are:
- Generate a list of current numas and pmds
- Perform rxq scheduling assignments into that list
- Effect the rxq scheduling assignments so they are used

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 16:51:01 +01:00
Eli Britstein
7ab851e1b8 dpif-netdev: Do not execute packet recovery without experimental support.
rte_flow_get_restore_info() API is under experimental attribute. Using it
has a performance impact that can be avoided for non-experimental compilation.

Do not call it without experimental support.

Reported-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-16 13:50:28 +02:00
Harry van Haaren
dc39608d2a dpif/stats: Add miniflow extract opt hits counter
This commit adds a new counter to be displayed to the user when
requesting datapath packet statistics. It counts the number of
packets that are parsed and a miniflow built up from it by the
optimized miniflow extract parsers.

The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
extra entry indicating if the optimized MFEX was hit:

  - MFEX Opt hits:        6786432  (100.0 %)

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 11:31:14 +01:00
Kumar Amber
a395b132b7 dpif-netdev: Add packet count and core id paramters for study
This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 11:11:13 +01:00
Kumar Amber
dd3f5d86d9 dpif-netdev: Add auto validation function for miniflow extract
This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 09:57:19 +01:00
Kumar Amber
3d8f47bc04 dpif-netdev: Add command line and function pointer for miniflow extract
This patch introduces the MFEX function pointers which allows
the user to switch between different miniflow extract implementations
which are provided by the OVS based on optimized ISA CPU.

The user can query for the available minflow extract variants available
for that CPU by following commands:

$ovs-appctl dpif-netdev/miniflow-parser-get

Similarly an user can set the miniflow implementation by the following
command :

$ ovs-appctl dpif-netdev/miniflow-parser-set name

This allows for more performance and flexibility to the user to choose
the miniflow implementation according to the needs.

Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Co-authored-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-16 09:56:58 +01:00
Cian Ferriter
d76a719a7a dpif-netdev: Add a partial HWOL PMD statistic.
It is possible for packets traversing the userspace datapath to match a
flow before hitting on EMC by using a mark ID provided by a NIC. Add a
PMD statistic for this hit.

Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:55 +01:00
Harry van Haaren
3f86fdf5ce dpif-netdev: Add command to get dpif implementations.
This commit adds a new command to retrieve the list of available
DPIF implementations. This can be used by to check what implementations
of the DPIF are available in any given OVS binary. It also returns which
implementations are in use by the OVS PMD threads.

Usage:
 $ ovs-appctl dpif-netdev/dpif-impl-get

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:38 +01:00
Harry van Haaren
abb807e27d dpif-netdev: Add command to switch dpif implementation.
This commit adds a new command to allow the user to switch
the active DPIF implementation at runtime. A probe function
is executed before switching the DPIF implementation, to ensure
the CPU is capable of running the ISA required. For example, the
below code will switch to the AVX512 enabled DPIF assuming
that the runtime CPU is capable of running AVX512 instructions:

 $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512

A new configuration flag is added to allow selection of the
default DPIF. This is useful for running the unit-tests against
the available DPIF implementations, without modifying each unit test.

The design of the testing & validation for ISA optimized DPIF
implementations is based around the work already upstream for DPCLS.
Note however that a DPCLS lookup has no state or side-effects, allowing
the auto-validator implementation to perform multiple lookups and
provide consistent statistic counters.

The DPIF component does have state, so running two implementations in
parallel and comparing output is not a valid testing method, as there
are changes in DPIF statistic counters (side effects). As a result, the
DPIF is tested directly against the unit-tests.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:24 +01:00
Harry van Haaren
9ac84a1a36 dpif-avx512: Add ISA implementation of dpif.
This commit adds the AVX512 implementation of DPIF functionality,
specifically the dp_netdev_input_outer_avx512 function. This function
only handles outer (no re-circulations), and is optimized to use the
AVX512 ISA for packet batching and other DPIF work.

Sparse is not able to handle the AVX512 intrinsics, causing compile
time failures, so it is disabled for this file.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Co-authored-by: Kumar Amber <kumar.amber@intel.com>
Signed-off-by: Kumar Amber <kumar.amber@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:12 +01:00
Harry van Haaren
e540499e4f dpif-netdev: Add function pointer for netdev input.
This commit adds a function pointer to the pmd thread data structure,
giving the pmd thread flexibility in its dpif-input function choice.
This allows choosing of the implementation based on ISA capabilities
of the runtime CPU, leading to optimizations and higher performance.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:13:00 +01:00
Harry van Haaren
5930dfeeb2 dpif-netdev: Refactor to multiple header files.
Split the very large file dpif-netdev.c and the datastructures
it contains into multiple header files. Each header file is
responsible for the datastructures of that component.

This logical split allows better reuse and modularity of the code,
and reduces the very large file dpif-netdev.c to be more managable.

Due to dependencies between components, it is not possible to
move component in smaller granularities than this patch.

To explain the dependencies better, eg:

DPCLS has no deps (from dpif-netdev.c file)
FLOW depends on DPCLS (struct dpcls_rule)
DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
THREAD depends on DFC (struct dfc_cache)

DFC_PROC depends on THREAD (struct pmd_thread)

DPCLS lookup.h/c require only DPCLS
DPCLS implementations require only dpif-netdev-lookup.h.
- This change was made in 2.12 release with function pointers
- This commit only refactors the name to "private-dpcls.h"

netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().

Rename functions specific to dpcls from netdev_* namespace to the
dpcls_* namespace, as they are only used by dpcls code.

'inline' is added to the dp_netdev_flow_hash() when it is moved
definition to fix a compiler error.

One valid checkpatch issue with the use of the
EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Co-authored-by: Cian Ferriter <cian.ferriter@intel.com>
Signed-off-by: Cian Ferriter <cian.ferriter@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-07-09 17:12:45 +01:00
Paolo Valerio
ba16a36f35 dpif-netdev: Add all-zero SNAT to the advertised features of ct.
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-08 23:49:34 +02:00
Balazs Nemeth
aa4359cb9f dpif-netdev: Read recirc depth and flow api enabled once per batch.
The call to recirc_depth_get involves accessing a TLS value. So read
that once, and store it on the stack for re-use while processing the
batch. The same goes for reading netdev_is_flow_api_enabled(), a
non-inlined function.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
Acked-by: Gaetan Rivet <grive@u256.net>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-08 22:35:19 +02:00
Eelco Chaudron
e6ad4d8d9c conntrack: Document all-zero IP SNAT behavior and add a test case.
Currently, conntrack in the kernel has an undocumented feature referred
to as all-zero IP address SNAT. Basically, when a source port
collision is detected during the commit, the source port will be
translated to an ephemeral port. If there is no collision, no SNAT is
performed.

This patchset documents this behavior and adds a self-test to verify
it's not changing. In addition, a datapath feature flag is added for
the all-zero IP SNAT case. This will help applications on top of OVS,
like OVN, to determine this feature can be used.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Acked-by: Paolo Valerio <pvalerio@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-08 21:19:14 +02:00
Yunjian Wang
828d9cb8d4 ovs: fix wrong quote
Remove the coma character by using ' and " character.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2021-07-06 15:40:04 -07:00
Timothy Redaelli
772a842fb5 dpif-netdev: Apply subtable-lookup-prio-set on any datapath.
Currently, if you try to set subtable-lookup-prio-set when you don't have
any datapath (for example if an user wants to set AVX512 before creating
any bridge) it sets it globally (dpcls_subtable_set_prio),
but it returns an error:

  please specify an existing datapath
  ovs-appctl: ovs-vswitchd: server returned an error

and, in this case, the exit code of ovs-appctl is 2.

This commit changes the behaviour by removing the [datapath] optional
parameter of subtable-lookup-prio-set and by changing the priority
level on any datapath and globally. This means if you don't have any
datapath or if you have only one datapath, the behaviour is the same as
now, but without the confusing error when you don't have any datapath.

Fixes: 3d018c3ea79d ("dpif-netdev: add subtable lookup prio set command.")
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-01 16:31:55 +02:00
Sriharsha Basavapatna
b5e6f6f6bf dpif-netdev: Provide orig_in_port in metadata for tunneled packets.
When an encapsulated packet is recirculated through a TUNNEL_POP
action, the metadata gets reinitialized and the originating physical
port information is lost. When this flow gets processed by the vport
and it needs to be offloaded, we can't figure out the physical port
through which the tunneled packet was received.

Add a new member to the metadata: 'orig_in_port'. This is passed to
the next stage during recirculation and the offload layer can use it
to offload the flow to this physical port.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-24 22:22:08 +02:00
Ilya Maximets
a1ec428037 netdev-offload: Disallow offloading to unrelated tunneling vports.
'linux_tc' flow API suitable only for tunneling vports with backing
linux interfaces. DPDK flow API is not suitable for such ports.

With this change we could drop vport restriction from dpif-netdev.

This is a prerequisite for enabling vport offloading in DPDK.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
2021-06-24 22:22:08 +02:00
Eli Britstein
bc341440d3 dpif-netdev: Add HW miss packet state recover logic.
Recover the packet if it was partially processed by the HW. Fallback to
lookup flow by mark association.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-24 22:22:08 +02:00
Tonghao Zhang
f3ad560d5f dpif-netdev: Expand the meter capacity.
For now, ovs-vswitchd use the array of the dp_meter struct
to store meter's data, and at most, there are only 65536
(defined by MAX_METERS) meters that can be used. But in some
case, for example, in the edge gateway, we should use 200,000+,
at least, meters for IP address bandwidth limitation.
Every one IP address will use two meters for its rx and tx
path[1]. In other way, ovs-vswitchd should support meter-offload
(rte_mtr_xxx api introduced by dpdk.), but there are more than
65536 meters in the hardware, such as Mellanox ConnectX-6.

This patch use cmap to manage the meter, instead of the array.

* Insertion performance, ovs-ofctl add-meter 1000+ meters,
  the cmap takes about 4000ms, as same as previous implementation.
* Lookup performance in datapath, we add 1000+ meters which rate limit
  are 10Gbps (the NIC cards are 10Gbps, so netdev-datapath will not
  drop the packets.), and a flow which only forward packets from p0
  to p1, with meter action[2]. On other machine, pktgen-dpdk will
  generate 64B packets to p0.

  The forwarding performance always is 1324 Kpps on my server
  which CPU is Intel E5-2650, 2.00GHz.

[1].
$ in_port=p0,ip,ip_dst=1.1.1.x action=meter:n,output:p1
$ in_port=p1,ip,ip_src=1.1.1.x action=meter:m,output:p0

[2].
$ in_port=p0 action=meter:100,output:p1

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-24 21:34:20 +02:00
Aaron Conole
640d4db788 ipf: Fix a use-after-free error, and remove the 'do_not_steal' flag.
As reported by Wang Liang, the way packets are passed to the ipf module
doesn't allow for use later on in reassembly.  Such packets may be get
released anyway, such as during cleanup of tx processing.  Because the
ipf module lacks a way of forcing the dp_packet to be retained, it
will later reuse the packet.  Instead, just clone the packet and let the
ipf queue own the copy until the queue is destroyed.

After this change, there are no more in-tree users of the batch
'do_not_steal' flag.  Thus, we remove it as well.

Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
Fixes: 0b3ff31d35f5 ("dp-packet: Add 'do_not_steal' packet batch flag.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382098.html
Reported-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Co-authored-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Wang Liang <wangliangrt@didiglobal.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-15 10:46:33 +02:00
Ilya Maximets
7731d26144 dpif-netdev: Remove meter rate from the bucket size calculation.
Implementation of meters supposed to be a classic token bucket with 2
typical parameters: rate and burst size.

Burst size in this schema is the maximum number of bytes/packets that
could pass without being rate limited.

Recent changes to userspace datapath made meter implementation to be
in line with the kernel one, and this uncovered several issues.

The main problem is that maximum bucket size for unknown reason
accounts not only burst size, but also the numerical value of rate.
This creates a lot of confusion around behavior of meters.

For example, if rate is configured as 1000 pps and burst size set to 1,
this should mean that meter will tolerate bursts of 1 packet at most,
i.e. not a single packet above the rate should pass the meter.
However, current implementation calculates maximum bucket size as
(rate + burst size), so the effective bucket size will be 1001.  This
means that first 1000 packets will not be rate limited and average
rate might be twice as high as the configured rate.  This also makes
it practically impossible to configure meter that will have burst size
lower than the rate, which might be a desirable configuration if the
rate is high.

Inability to configure low values of a burst size and overall inability
for a user to predict what will be a maximum and average rate from the
configured parameters of a meter without looking at the OVS and kernel
code might be also classified as a security issue, because drop meters
are frequently used as a way of protection from DoS attacks.

This change removes rate from the calculation of a bucket size, making
it in line with the classic token bucket algorithm and essentially
making the rate and burst tolerance being predictable from a users'
perspective.

Same change will be proposed for the kernel implementation.
Unit tests changed back to their correct version and enhanced.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
2021-05-18 22:11:14 +02:00
Tonghao Zhang
c3690ccbce dpif-netdev: Refactor and fix the buckets calculation.
The way that "burst_size" was used as total buckets is
very strange. If user set the "burst_size" too smaller while
"rate" larger, that may affect the meter normal work.
This patch refactor the buckets calculation, and start
with a full buckets. It also makes the calculation equal
to what kernel datapath does.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-03-30 21:16:27 +02:00
Tonghao Zhang
759aaa8513 dpif-netdev: Fix the meter buckets overflow.
When setting the meter rate to 4.3+Gbps, there is an overflow, the
meters don't work as expected.

$ ovs-ofctl -O OpenFlow13 add-meter br-int "meter=1 kbps stats bands=type=drop rate=4294968"

It was overflow when we set the rate to 4294968, because "burst_size" in
the ofputil_meter_band is uint32_t type. This patch remove the "up"
in the dp_meter_band struction, and introduce "rate", "burst_size" and
"bucket" (uint64_t) to userspace datapath's meter band. This patch don't
change the public API and structure.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-03-30 20:17:54 +02:00
Kevin Traynor
ec68a877db dpif-netdev: Allow PMD auto load balance with cross-numa.
Previously auto load balance did not trigger a reassignment when
there was any cross-numa polling as an rxq could be polled from a
different numa after reassign and it could impact estimates.

In the case where there is only one numa with pmds available, the
same numa will always poll before and after reassignment, so estimates
are valid. Allow PMD auto load balance to trigger a reassignment in
this case.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Tested-by: Sunil Pai G <sunil.pai.g@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-03-22 12:28:49 +01:00
Mao YingMing
cdaa7e0fd6 dpif-netdev: Fix crash when add dp flow without in_port field.
Userspace datapath relies on fact that every datapath flow has exact
match on the in_port, but flows without in_port match could be
added directly via dpctl commands.  Even though dpctl is a debug
interface, datapath should just reject such flows instead of
crashing on assertion.

Fix the following crash and add a unit test for this issue
to tests/dpif-netdev.at:

$ ovs-appctl dpctl/add-flow "eth(),eth_type(0x0800),ipv4()" "3"
  unixctl|WARN|error communicating with unix:ovs-vswitchd.ctl: End of file
  ovs-appctl: ovs-vswitchd: transaction error (End of file)

ovs-vswitchd.log record:
  util(ovs-vswitchd)|EMER|lib/dpif-netdev.c:3638:
    assertion match->wc.masks.in_port.odp_port == ODPP_NONE failed
    in dp_netdev_flow_add()
  daemon_unix(monitor)|ERR|2 crashes: pid 1995 died, killed (Aborted),
                                      core dumped, restarting

Fix result:

$ ovs-appctl dpctl/add-flow "eth(),eth_type(0x0800),ipv4()" "3"
  ovs-vswitchd: updating flow table (Invalid argument)
  ovs-appctl: ovs-vswitchd: server returned an error

ovs-vswitchd.log record:
  dpif_netdev|ERR|failed to put[create] flow: in_port is not an exact match
  dpif|WARN|netdev@ovs-netdev: failed to put[create] (Invalid argument)
    ufid:7e...d1 eth(src=00..00,dst=00..00),eth_type(0x0800),
    ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0), actions:3

Signed-off-by: Mao YingMing <maoyingming@baidu.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-03-01 22:20:52 +01:00
Kevin Traynor
4b674829f9 dpif-netdev: auto load balance log state on user request.
At present the log displays the auto load balance state
everytime it is changed.

There are some cases where the user will try to enable
auto load balance, but it cannot be enabled because not
enough PMDs or RxQs. As the state does not change, there
is no new log of the state.

While the the last log report of state is still correct,
it is better to log the state again at this point so the
user can explicitly confirm the outcome of their request.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2021-02-15 15:22:29 +00:00
Eli Britstein
62d1c28e9c dpif-netdev: Flush offload rules upon port deletion.
When a port is deleted, flow deletion requests are posted, and the netdev
is removed from offload netdevs map. Following flow deletion handling may
be done after the netdev has already been removed from the offload
netdevs map, so the HW rule is not removed and the data object is not
freed (memory leak). Flush offload rules upon port deletion, and disable
pending handling of offloads to fix it.

Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Emma Finn <emma.finn@intel.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-01-15 19:01:00 +01:00
Kevin Traynor
30de755202 dpif-netdev: Add PMD auto load balance status log.
When any PMD auto load balance parameters change, it is useful
to also log if the feature is enabled or disabled.

|dpif_netdev|INFO|PMD auto load balance load threshold changed to 70%
|dpif_netdev|INFO|PMD auto load balance is disabled

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-01-15 18:52:16 +01:00
Christophe Fontaine
62ab5594c2 dpif-netdev: Add parameters to configure PMD auto load balance.
Two important parts of how PMD auto load balance operates are how
loaded a core needs to be and how much improvement is estimated
before a PMD auto load balance can trigger.

Previously they were hardcoded to 95% loaded and 25% variance
improvement.

These default values may not be suitable for all use cases and
we may want to use a more (or less) aggressive rebalance, either
on the pmd load threshold or on the minimum variance improvement
threshold.

The defaults are not changed, but "pmd-auto-lb-load-threshold" and
"pmd-auto-lb-improvement-threshold" parameters are added to override
the defaults.

$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-load-threshold="70"
$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement-threshold="20"

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-01-15 18:51:27 +01:00
Kevin Traynor
e4db0b69e2 dpif-netdev: Add log for PMD auto load balance interval parameter.
Previously if the parameter for the PMD auto load balance minimum
interval was changed at runtime, it was not logged unless the
PMD auto load balance feature was also changed to enabled.

Log the parameter anytime it changes, and use minutes when it is
logged as that is the user input format.

Fixes: 5bf84282482a ("Adding support for PMD auto load balancing")
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-01-15 18:38:47 +01:00
Ben Pfaff
91fc374a9c Eliminate use of term "slave" in bond, LACP, and bundle contexts.
The new term is "member".

Most of these changes should not change user-visible behavior.  One
place where they do is in "ovs-ofctl dump-flows", which will now output
"members:..." inside "bundle" actions instead of "slaves:...".  I don't
expect this to cause real problems in most systems.  The old syntax
is still supported on input for backward compatibility.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
2020-10-21 11:28:24 -07:00
Tonghao Zhang
a4953c5fe3 Revert "dpif-netdev: includes microsecond delta in meter bucket calculation".
This reverts commit 5c41c31ebd64fda821fb733a5784a7a440a794f8.

Use the pktgen-dpdk to test the commit 5c41c31ebd64
("dpif-netdev: includes microsecond delta in meter bucket calculation"),
it does't work as expected. And it broken the meter function (e.g. set
rate 200Mbps, the rate watched was 400Mbps). To reproduce it:

 $ ovs-vsctl add-br br-int -- set bridge br-int datapath_type=netdev
 $ ovs-ofctl -O OpenFlow13 add-meter br-int \
         "meter=100 kbps burst stats bands=type=drop rate=200000 burst_size=200000"
 $ ovs-ofctl -O OpenFlow13 add-flow br-int \
         "in_port=dpdk0 action=meter:100,output:dpdk1"
 $ pktgen -l 1,3,5,7,9,11,13,15,17,19 -n 8 --socket-mem 4096 \
         --file-prefix pg1 -w 0000:82:00.0 -w 0000:82:00.1 -- \
         -T -P -m "[3/5/7/9/11/13/15].[0-1]" -f meter-test.pkt

 meter-test.pkt:
 | set 0 count 0
 | set 0 size 1500
 | set 0 rate 100
 | set 0 burst 64
 | set 0 sport 1234
 | set 0 dport 5678
 | set 0 prime 1
 | set 0 type ipv4
 | set 0 proto udp
 | set 0 dst ip 1.1.1.2
 | set 0 src ip 1.1.1.1/24
 | set 0 dst mac ec:0d:9a🆎54:0a
 | set 0 src mac ec:0d:9a:bf:df:bb
 | set 0 vlanid 0
 | start 0

Note that the issue that patch 5c41c31ebd64 was intended to fix was
already fixed by commit:
  42697ca7757b ("dpif-netdev: fix meter at high packet rate.")

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-27 13:30:11 +02:00
Ilya Maximets
12d0edd75e dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.
Main thread will try to pause/stop all revalidators during datapath
reconfiguration via datapath purge callback (dp_purge_cb) while
holding 'dp->port_mutex'.  And deadlock happens in case any of
revalidator threads is already waiting on 'dp->port_mutex' while
dumping offloaded flows:

           main thread                           revalidator
 ---------------------------------  ----------------------------------

 ovs_mutex_lock(&dp->port_mutex)

                                    dpif_netdev_flow_dump_next()
                                    -> dp_netdev_flow_to_dpif_flow
                                    -> get_dpif_flow_status
                                    -> dpif_netdev_get_flow_offload_status()
                                    -> ovs_mutex_lock(&dp->port_mutex)
                                       <waiting for mutex here>

 reconfigure_datapath()
 -> reconfigure_pmd_threads()
 -> dp_netdev_del_pmd()
 -> dp_purge_cb()
 -> udpif_pause_revalidators()
 -> ovs_barrier_block(&udpif->pause_barrier)
    <waiting for revalidators to reach barrier>

                          <DEADLOCK>

We're not allowed to call offloading API without holding global
port mutex from the userspace datapath due to thread safety
restrictions on netdev-offload-dpdk module.  And it's also not easy
to rework datapath reconfiguration process in order to move actual
PMD removal and datapath purge out of the port mutex.

So, for now, not sleeping on a mutex if it's not immediately available
seem like an easiest workaround.  This will have impact on flow
statistics update rate and on ability to get the latest statistics
before removing the flow (latest stats will be lost in case we were
not able to take the mutex).  However, this will allow us to operate
normally avoiding the deadlock.

The last bit is that to avoid flapping of flow attributes and
statistics we're not failing the operation, but returning last
statistics and attributes returned by offload provider.  Since those
might be updated in different threads, stores and reads are atomic.

Reported-by: Frank Wang (王培辉) <wangpeihui@inspur.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Tested-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-16 23:59:52 +02:00
Harry van Haaren
9ff7cabfd7 dpif-netdev: add subtable-lookup-prio-get command.
This commit adds a new command, "dpif-netdev/subtable-lookup-prio-get"
which prints the available subtable lookup functions in this OVS binary.
Example output from the command:

Available lookup functions (priority : name)
        0 : autovalidator
        1 : generic

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-07-13 14:54:49 +01:00
Harry van Haaren
3d018c3ea7 dpif-netdev: add subtable lookup prio set command.
This commit adds a command for the dpif-netdev to set a specific
lookup function to a particular priority level. The command enables
runtime switching of the dpcls subtable lookup implementation.

Selection is performed based on a priority. Higher priorities take
precedence, e.g. priority 5 will be selected instead of a priority 3.
If lookup functions have the same priority, the first one in the list
is selected.

The two options available are 'autovalidator' and 'generic'.
The below command will set a new priority for the given function:
$ ovs-appctl dpif-netdev/subtable-lookup-prio-set generic 2

The autovalidator implementation can be selected at runtime now:
$ ovs-appctl dpif-netdev/subtable-lookup-prio-set autovalidator 5

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-07-13 14:54:36 +01:00
Harry van Haaren
e90e115a01 dpif-netdev: implement subtable lookup validation.
This commit refactors the existing dpif subtable function pointer
infrastructure, and implements an autovalidator component.

The refactoring of the existing dpcls subtable lookup function
handling, making it more generic, and cleaning up how to enable
more implementations in future.

In order to ensure all implementations provide identical results,
the autovalidator is added. The autovalidator itself implements
the subtable lookup function prototype, but internally iterates
over all other available implementations. The end result is that
testing of each implementation becomes automatic, when the auto-
validator implementation is selected.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-07-13 14:54:08 +01:00
Tonghao Zhang
fa31efd211 dpif-netdev: Return error code when no mark available.
The max number of mark is (UINT32_MAX - 1), that is
enough to be used. But theoretically, if there are no
mark available, the later different flows will shared
the mark INVALID_FLOW_MARK, that may break the function.
If there are no available mark to be used, return error
code.

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-09 11:31:52 +02:00
Tonghao Zhang
058b80d3de dpif-netdev: Add check mark to avoid ovs-vswitchd crash.
When changing the pmd interfaces attribute, ovs-vswitchd will
reload pmd and flush offload flows. reload_affected_pmds may
be invoked twice or more. In that case, the flows may been
queued to "dp_netdev_flow_offload" thread again.

For example:
$ ovs-vsctl -- set interface <Interface> options:dpdk-lsc-interrupt=true

ovs-vswitchd main       flow-offload thread
append F to queue       ...
...
append F to queue
...                     del F
...                     del F (crash [1])

[1]:
ovs_assert_failure          lib/cmap.c:922
cmap_replace                lib/cmap.c:921
cmap_remove                 lib/cmap.h:295
mark_to_flow_disassociate   lib/dpif-netdev.c:2269
dp_netdev_flow_offload_del  lib/dpif-netdev.c:2369
dp_netdev_flow_offload_main lib/dpif-netdev.c:2492

Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-09 11:31:52 +02:00
Ilya Maximets
8842fdf1b3 netdev-offload: Use dpif type instead of class.
There is no real difference between the 'class' and 'type' in the
context of common lookup operations inside netdev-offload module
because it only checks the value of pointers without using the
value itself.  However, 'type' has some meaning and can be used by
offload provides on the initialization phase to check if this type
of Flow API in pair with the netdev type could be used in particular
datapath type.  For example, this is needed to check if Linux flow
API could be used for current tunneling vport because it could be
used only if tunneling vport belongs to system datapath, i.e. has
backing linux interface.

This is needed to unblock tunneling offloads in userspace datapath
with DPDK flow API.

Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-08 19:07:21 +02:00
Eli Britstein
77057965cb dpif-netdev: Don't use zero flow mark.
Zero flow mark is used to indicate the HW to remove the mark. A packet
marked with zero mark is received in SW without a mark at all, so it
cannot be used as a valid mark. Change the pool range to fix it.

Fixes: 241bad15d99a ("dpif-netdev: associate flow with a mark id")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-08 17:50:36 +02:00
Eli Britstein
9ac365a8ed dpif-netdev: Add mega ufid in flow add/del log.
As offload is done using the mega ufid of a flow, for better
debugability, add it in the log message.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roni Bar Yanai <roniba@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-08 17:50:36 +02:00
Nitin Katiyar
81ac8b3b19 dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.
Each PMD updates the global sequence number for RCU synchronization
purpose with other OVS threads. This is done at every 1025th iteration
in PMD main loop.

If the PMD thread is responsible for polling large number of queues
that are carrying traffic, it spends a lot of time processing packets
and this results in significant delay in performing the housekeeping
activities.

If the OVS main thread is waiting to synchronize with the PMD threads
and if those threads delay performing housekeeping activities for
more than 3 sec then LACP processing will be impacted and it will lead
to LACP flaps. Similarly, other controls protocols run by OVS main
thread are impacted.

For e.g. a PMD thread polling 200 ports/queues with average of 1600
processing cycles per packet with batch size of 32 may take 10240000
(200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU
it means more than 5 ms per iteration. So, for 1024 iterations to
complete it would be more than 5 seconds.

This gets worse when there are PMD threads which are less loaded.
It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
by heavily loaded PMD and next attempt to quiesce would be after 1024
iterations.

With this patch, PMD RCU synchronization will be performed after fixed
interval instead after a fixed number of iterations. This will ensure
that even if the packet processing load is high the RCU synchronization
will not be delayed long.

Co-authored-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Anju Thomas <anju.thomas@ericsson.com>
Signed-off-by: Nitin Katiyar <nitin.katiyar@ericsson.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-07 17:23:54 +02:00
Tonghao Zhang
df5c293642 dpif-netdev: Delete the artificial flow limit.
The MAX_FLOWS constant was there from the introduction of dpif-netdev,
however, later new flow-limit mechanism was implemented that
controls number of datapath flows in a dynamic way on ofproto level.

So, we can just remove the limit and fully rely on ofproto to decide
what flow limit we need.  There are no limitations for flow table size
in dpif-netdev beside the artificial one.
'other_config:flow-limit' seems suitable to control this.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-07-07 01:40:32 +02:00
Vishal Deep Ajmera
9df65060cf userspace: Avoid dp_hash recirculation for balance-tcp bond mode.
Problem:

In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:

1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.

2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.

3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.

Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.

Proposed optimization:

This proposal introduces a new load-balancing output action instead of
recirculation.

Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.

Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.

Instead, we will now generate action as
    'lb_output(<bond id>)'

This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).

Example:
Current scheme:

With 8 UDP flows (with random UDP src port):

  flow-dump from pmd on cpu core: 2
  recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)

  recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1

New scheme:
We can do with a single flow entry (for any number of new flows):

  in_port(7),<...> actions:lb_output(1)

A new CLI has been added to dump datapath bond cache as given below.

 # ovs-appctl dpif-netdev/bond-show [dp]

   Bond cache:
     bond-id 1 :
       bucket 0 - slave 2
       bucket 1 - slave 1
       bucket 2 - slave 2
       bucket 3 - slave 1

Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-06-22 13:11:51 +02:00
William Tu
2078901a4c userspace: Add conntrack timeout policy support.
Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
policy support") adds conntrack timeout policy for kernel datapath.
This patch enables support for the userspace datapath.  I tested
using the 'make check-system-userspace' which checks the timeout
policies for ICMP and UDP cases.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
2020-05-01 08:22:45 -07:00
Jiang Lidong
5c41c31ebd dpif-netdev: includes microsecond delta in meter bucket calculation
When dp-netdev meter rate is higher than 200Mbps, observe
more than 10% bias from configured rate value with UDP traffic.

In dp-netdev meter, millisecond delta between now and last used
is taken into bucket size calcualtion, while sub-millisecond part
is truncated.

If traffic rate is pretty high, time delta can be few milliseconds,
its ratio to truncated part is less than 10:1, the loss of bucket
size caused by truncated can be observed obviously by commited
traffic rate.

In this patch, microsend delta part is included in calculation
of meter bucket to make it more precise.

Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-04-09 14:50:08 -07:00