This is the third patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The dynamic rebalancing functionality is implemented in this patch. The
ukeys that are not scheduled for deletion are obtained and passed as input
to the rebalancing routine. The rebalancing is done in the context of
revalidation leader thread, after all other revalidator threads are
done with gathering rebalancing data for flows.
For each netdev that is in OOR state, a list of flows - both offloaded
and non-offloaded (pending) - is obtained using the ukeys. For each netdev
that is in OOR state, the flows are grouped and sorted into offloaded and
pending flows. The offloaded flows are sorted in descending order of
pps-rate, while pending flows are sorted in ascending order of pps-rate.
The rebalancing is done in two phases. In the first phase, we try to
offload all pending flows and if that succeeds, the OOR state on the device
is cleared. If some (or none) of the pending flows could not be offloaded,
then we start replacing an offloaded flow that has a lower pps-rate than
a pending flow, until there are no more pending flows with a higher rate
than an offloaded flow. The flows that are replaced from the device are
added into kernel datapath.
A new OVS configuration parameter "offload-rebalance", is added to ovsdb.
The default value of this is "false". To enable this feature, set the
value of this parameter to "true", which provides packets-per-second
rate based policy to dynamically offload and un-offload flows.
Note: This option can be enabled only when 'hw-offload' policy is enabled.
It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
offload errors (specifically ENOSPC error this feature depends on) reported
by an offloaded device are supressed by TC-Flower kernel module.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This is the first patch in the patch-set to support dynamic rebalancing
of offloaded flows.
The patch detects OOR condition on a netdev port when ENOSPC error is
returned by TC-Flower while adding a flow rule. A new structure is added
to the netdev called "netdev_hw_info", to store OOR related information
required to perform dynamic offload-rebalancing.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Tunnels (gre, geneve, vxlan) support 'csum' option (true/false), default is false.
Generated encap TC rule will now be configured as the tunnel configuration.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
In dpif_netlink_port_add__(), socksp could be NULL, because
vport_socksp_to_pids() would allocate a new array and return a single
zero element.
Following vport_socksp_to_pids() removal, a NULL pointer can happen when
dpif_netlink_port_add__() is called and dpif->handlers is 0.
Restore the old behaviour of using a zero pid when dpif->handlers is 0.
Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets")
Reported-by: Flavio Leitner <fbl@redhat.com>
Reported-by: Guru Shetty <guru@ovn.org>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Commit 69c51582ff78 ("dpif-netlink: don't allocate per thread netlink
sockets") removed dpif-netlink support for multiple queues per port.
No remaining dpif provider supports multiple queues per port, so
remove infrastructure for the feature.
CC: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-by: Yifeng Sun <pkusunyifeng@gmail.com>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
When using the kernel datapath, OVS allocates a pool of sockets to handle
netlink events. The number of sockets is: ports * n-handler-threads, where
n-handler-threads is user configurable and defaults to 3/4*number of cores.
This because vswitchd starts n-handler-threads threads, each one with a
netlink socket for every port of the switch. Every thread then, starts
listening on events on its set of sockets with epoll().
On setup with lot of CPUs and ports, the number of sockets easily hits
the process file descriptor limit, and ovs-vswitchd will exit with -EMFILE.
Change the number of allocated sockets to just one per port by moving
the socket array from a per handler structure to a per datapath one,
and let all the handlers share the same sockets by using EPOLLEXCLUSIVE
epoll flag which avoids duplicate events, on systems that support it.
The patch was tested on a 56 core machine running Linux 4.18 and latest
Open vSwitch. A bridge was created with 2000+ ports, some of them being
veth interfaces with the peer outside the bridge. The latency of the upcall
is measured by setting a single 'action=controller,local' OpenFlow rule to
force all the packets going to the slow path and then to the local port.
A tool[1] injects some packets to the veth outside the bridge, and measures
the delay until the packet is captured on the local port. The rx timestamp
is get from the socket ancillary data in the attribute SO_TIMESTAMPNS, to
avoid having the scheduler delay in the measured time.
The first test measures the average latency for an upcall generated from
a single port. To measure it 100k packets, one every msec, are sent to a
single port and the latencies are measured.
The second test is meant to check latency fairness among ports, namely if
latency is equal between ports or if some ports have lower priority.
The previous test is repeated for every port, the average of the average
latencies and the standard deviation between averages is measured.
The third test serves to measure responsiveness under load. Heavy traffic
is sent through all ports, latency and packet loss is measured
on a single idle port.
The fourth test is all about fairness. Heavy traffic is injected in all
ports but one, latency and packet loss is measured on the single idle port.
This is the test setup:
# nproc
56
# ovs-vsctl show |grep -c Port
2223
# ovs-ofctl dump-flows ovs_upc_br
cookie=0x0, duration=4.827s, table=0, n_packets=0, n_bytes=0, actions=CONTROLLER:65535,LOCAL
# uname -a
Linux fc28 4.18.7-200.fc28.x86_64 #1 SMP Mon Sep 10 15:44:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
And these are the results of the tests:
Stock OVS Patched
netlink sockets
in use by vswitchd
lsof -p $(pidof ovs-vswitchd) \
|grep -c GENERIC 91187 2227
Test 1
one port latency
min/avg/max/mdev (us) 2.7/6.6/238.7/1.8 1.6/6.8/160.6/1.7
Test 2
all port
avg latency/mdev (us) 6.51/0.97 6.86/0.17
Test 3
single port latency
under load
avg/mdev (us) 7.5/5.9 3.8/4.8
packet loss 95 % 62 %
Test 4
idle port latency
under load
min/avg/max/mdev (us) 0.8/1.5/210.5/0.9 1.0/2.1/344.5/1.2
packet loss 94 % 4 %
CPU and RAM usage seems not to be affected, the resource usage of vswitchd
idle with 2000+ ports is unchanged:
# ps u $(pidof ovs-vswitchd)
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
openvsw+ 5430 54.3 0.3 4263964 510968 pts/1 RLl+ 16:20 0:50 ovs-vswitchd
Additionally, to check if vswitchd is thread safe with this patch, the
following test was run for circa 48 hours: on a 56 core machine, a
bridge with kernel datapath is filled with 2200 dummy interfaces and 22
veth, then 22 traffic generators are run in parallel piping traffic into
the veths peers outside the bridge.
To generate as many upcalls as possible, all packets were forced to the
slowpath with an openflow rule like 'action=controller,local' and packet
size was set to 64 byte. Also, to avoid overflowing the FDB early and
slowing down the upcall processing, generated mac addresses were restricted
to a small interval. vswitchd ran without problems for 48+ hours,
obviously with all the handler threads with almost 99% CPU usage.
[1] https://github.com/teknoraver/network-tools/blob/master/weed.c
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Added new types to the flow dump filter, and allowed multiple filter
types to be passed at once, as a comma separated list. The new types
added are:
* tc - specifies flows handled by the tc dp
* non-offloaded - specifies flows not offloaded to the HW
* all - specifies flows of all types
The type list is now fully parsed by the dpctl, and a new struct was
added to dpif which enables dpctl to define which types of dumps to
provide, rather than passing the type string and having dpif parse it.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Commit 92d0d515d ("dpif-netlink: Probe for broken Linux meter
implementations.") introduced a deadlock on the 'once' structure
declared in probe_broken_meters() with the following callstack:
probe_broken_meters()
probe_broken_meters__()
dpif_netlink_meter_set()
probe_broken_meters()
This commit introduce a modified version of dpif_netlink_meter_set()
that sets a meter without calling the probe.
Reported-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
This patch provides the implementation of conntrack zone limit
in dpif-netlink. It basically utilizes the netlink API to
communicate with OVS kernel module to set, delete, and get conntrack
zone limit.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
This patch defines the dpif interface to support conntrack
per zone limit. Basically, OVS users can use this interface
to set, delete, and get the conntrack per zone limit for various
dpif interfaces. The following patch will make use of the proposed
interface to implement the feature.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Meter support was introduced in Linux 4.15. In some versions of Linux
4.15, 4.16, and 4.17, there was a bug that never set the id when the
meter was created, so all meters essentially had an id of zero. This
commit adds a probe to check for that condition and disable meters on
those kernels.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
The original intent of the API appears to be that the underlying DPIF
implementaion would choose a local meter id. However, neither of the
existing datapath meter implementations (userspace or Linux) implemented
that; they expected a valid meter id to be passed in, otherwise they
returned an error. This commit follows the existing implementations and
makes the API somewhat cleaner.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
To work with kernel datapath that supports meter.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Co-authored-by: Justin Pettit <jpettit@ovn.org>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Commit ab15e70eb587 ("dpctl: Expand the flow dump type filter") had a
number of issues with style, build breakage, and failing unit tests.
The patch is being reverted so that they can addressed.
This reverts commit ab15e70eb5878b46f8f84da940ffc915b6d74cad.
CC: Gavi Teitz <gavi@mellanox.com>
CC: Simon Horman <simon.horman@netronome.com>
CC: Roi Dayan <roid@mellanox.com>
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Added new types to the flow dump filter, and allowed multiple filter
types to be passed at once, as a comma separated list. The new types
added are:
* tc - specifies flows handled by the tc dp
* non-offloaded - specifies flows not offloaded to the HW
* all - specifies flows of all types
The type list is now fully parsed by the dpctl, and a new struct was
added to dpif which enables dpctl to define which types of dumps to
provide, rather than passing the type string and having dpif parse it.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently the inner VLAN header is ignored when using the TC data-path.
As TC flower supports QinQ, now we can offload the rules to match on both
outer and inner VLAN headers.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Previously, any rule that is offloaded via a netdev, not necessarily
to the HW, would be reported as "offloaded". This patch fixes this
misalignment, and introduces the 'dp' state, as follows:
rule is in HW via TC offload -> offloaded=yes dp:tc
rule is in not HW over TC DP -> offloaded=no dp:tc
rule is in not HW over OVS DP -> offloaded=no dp:ovs
To achieve this, the flows's 'offloaded' flag was encapsulated in a new
attrs struct, which contains the offloaded state of the flow and the
DP layer the flow is handled in, and instead of setting the flow's
'offloaded' state based solely on the type of dump it was acquired
via, for netdev flows it now sends the new attrs struct to be
collected along with the rest of the flow via the netdev, allowing
it to be set per flow.
For TC offloads, the offloaded state is set based on the 'in_hw' and
'not_in_hw' flags received from the TC as part of the flower. If no
such flag was received, due to lack of kernel support, it defaults
to true.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
[simon: resolved conflict in lib/dpctl.man]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Several OVS structs contain embedded named unions, like this:
struct {
...
union {
...
} u;
};
C11 standardized a feature that many compilers already implemented
anyway, where an embedded union may be unnamed, like this:
struct {
...
union {
...
};
};
This is more convenient because it allows the programmer to omit "u."
in many places. OVS already used this feature in several places. This
commit embraces it in several others.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Tested-by: Alin Gabriel Serdean <aserdean@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
In netdev_to_ovs_vport_type() it checks for netdev types matching
"gre" with a strstr(). This makes it match ip6gre as well and return
OVS_VPORT_TYPE_GRE, which is clearly wrong.
Move the usage of strstr() *after* all the exact matches with strcmp()
to avoid the problem permanently because when I added the ip6gre
type I ran into a very difficult to detect bug.
Cc: Ben Pfaff <blp@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Add handlers for OVS_VPORT_TYPE_IP6GRE
Cc: Ben Pfaff <blp@ovn.org>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
ERSPAN is a tunneling protocol based on GRE tunnel. The patch
add erspan tunnel support for ovs-vswitchd with userspace datapath.
Configuring erspan tunnel is similar to gre tunnel, but with
additional erspan's parameters. Matching a flow on erspan's
metadata is also supported, see ovs-fields for more details.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch backports upstream ipv6 GRE and tunneling into the OVS
OOT (Out of Tree) datapath drivers. The primary reason for this
is to support the ERSPAN feature.
Because there is no previous history of ipv6 GRE and tunneling it is
not possible to exactly reproduce the history of all the files in
the patch. The two newly added files - ip6_gre.c and ip6_tunnel.c -
are cut from whole cloth out of the upstream Linux 4.15 kernel and
then modified as necessary with compatibility layer fixups.
These two files already included parts of several other upstream
commits that also touched other upstream files. As such, this
patch may incorporate parts or all of the following commits:
d350a82 net: erspan: create erspan metadata uapi header
c69de58 net: erspan: use bitfield instead of mask and offset
b423d13 net: erspan: fix use-after-free
214bb1c net: erspan: remove md NULL check
afb4c97 ip6_gre: fix potential memory leak in ip6erspan_rcv
50670b6 ip_gre: fix potential memory leak in erspan_rcv
a734321 ip6_gre: fix error path when ip6erspan_rcv failed
dd8d5b8 ip_gre: fix error path when erspan_rcv failed
293a199 ip6_gre: fix a pontential issue in ip6erspan_rcv
d91e8db5 net: erspan: reload pointer after pskb_may_pull
ae3e133 net: erspan: fix wrong return value
c05fad5 ip_gre: fix wrong return value of erspan_rcv
94d7d8f ip6_gre: add erspan v2 support
f551c91 net: erspan: introduce erspan v2 for ip_gre
1d7e2ed net: erspan: refactor existing erspan code
ef7baf5 ip6_gre: add ip6 erspan collect_md mode
5a963eb ip6_gre: Add ERSPAN native tunnel support
ceaa001 openvswitch: Add erspan tunnel support.
f192970 ip_gre: check packet length and mtu correctly in erspan tx
c84bed4 ip_gre: erspan device should keep dst
c122fda ip_gre: set tunnel hlen properly in erspan_tunnel_init
5513d08 ip_gre: check packet length and mtu correctly in erspan_xmit
935a974 ip_gre: get key from session_id correctly in erspan_rcv
1a66a83 gre: add collect_md mode to ERSPAN tunnel
84e54fe gre: introduce native tunnel support for ERSPAN
In cases where the listed commits also touched other source code
files then the patches are also listed separately within this
patch series.
Signed-off-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Currently, we support offloading of one output port. Remove that
limitation by use of mirred mirror action for all output ports,
except that the last one is mirred redirect action.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
When the netdev is in another namespace and the operation doesn't
support network namespaces, return the correct error.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Recent kernels provide the network namespace ID of a port,
so use that to discover where the port currently is.
A network device in another network namespace could have the
same name, so once the socket starts listening to other network
namespaces, it is necessary to confirm the netnsid.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
The netlink notification's ancillary data contains the network
namespace id (netnsid) needed to identify the device correctly.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ofp-util had been far too large and monolithic for a long time. This
commit breaks it up into units that make some logical sense. It also
moves the pieces of ofp-parse that were specific to each unit into the
relevant unit.
Most of this commit is just moving code around.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
A get command is added for number of conntrack connections.
This command is only supported in the userspace datapath
at this time.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Get and set dpctl commands are added for conntrack maxconns.
These commands are only supported in the userspace
datapath at this time.
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This patch adds support of flushing a conntrack entry specified by the
conntrack 5-tuple, and provides the implementation in dpif-netlink.
The implementation of dpif-netlink in the linux datapath utilizes the
NFNL_SUBSYS_CTNETLINK netlink subsystem to delete a conntrack entry in
nf_conntrack. Future patches will add support for the userspace and
Windows datapaths.
VMWare-BZ: #1983178
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
With the command:
ovs-appctl dpctl/ct-bkts
shows the number of connections per bucket.
By using a threshold:
ovs-appctl dpctl/ct-bkts gt=N
for each bucket shows the number of connections when they
are greater than N.
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Since it's an error but also will always occur in older kernels
log the message with level warning instead of info.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
It's basically what is being passed today and passing a specific
type adds a compiler type check.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
For non-Ethernet flows, when fixing up the netlink message we need make
sure to pass down a valid Ethertype. The kernel does not understand
packet_type so it's implicitly encoded by the absence of _ETHERNET and
exact match of _ETHERTYPE. Without this change match_validate in the
kernel complains when trying to match packets from L3 tunnels.
e.g.
openvswitch: netlink: Unexpected mask (mask=110088, allowed=3d9804c)
The mask use to always be set in xlate_wc_init() and xlate_wc_finish(),
but that changed for non-Ethernet frames with the commit listed below.
Fixes: 3d4b2e6eb74e ("userspace: Add OXM field MFF_PACKET_TYPE")
Signed-off-by: Joe Stringer <joe@ovn.org>
Co-authored-by: Eric Garver <e@erig.me>
Acked-by: Eric Garver <e@erig.me>
Rather than open-coding access to netlink attribute pointers in
put_exclude_packet_type(), make use of the netlink attribute helpers.
This simplifies the following bugfix.
Signed-off-by: Joe Stringer <joe@ovn.org>
Acked-by: Eric Garver <e@erig.me>
When verbosity is requested on dump-flows (-m) indicate which flows
are offloaded.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Usage:
# to dump all datapath flows (default):
ovs-dpctl dump-flows
# to dump only flows that in kernel datapath:
ovs-dpctl dump-flows type=ovs
# to dump only flows that are offloaded:
ovs-dpctl dump-flows type=offloaded
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Search all datapath added netdevs for a given flow
using netdev flow api and parse it back to dpif flow.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
If a flow was offloaded to a netdev we delete it using netdev
flow api.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Using the new netdev flow api operate will now try and
offload flows to the relevant netdev of the input port.
Other operate methods flows will come in later patches.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
While dumping flows, dump flows that were offloaded to
netdev and parse them back to dpif flow.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
If netdev flow offloading is enabled, flush all
added ports using netdev flow api.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Use the helpers in appropriate places. In most cases, this fixes a
misaligned reference, since ovs_be128 and ovs_u128 require 8-byte alignment
but Netlink only guarantees 4-byte.
Found by GCC -fsanitize=undefined.
Reported-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Ports have a new layer3 attribute if they send/receive L3 packets.
The packet_type included in structs dp_packet and flow is considered in
ofproto-dpif. The classical L2 match fields (dl_src, dl_dst, dl_type, and
vlan_tci, vlan_vid, vlan_pcp) now have Ethernet as pre-requisite.
A dummy ethernet header is pushed to L3 packets received from L3 ports
before the the pipeline processing starts. The ethernet header is popped
before sending a packet to a L3 port.
For datapath ports that can receive L2 or L3 packets, the packet_type
becomes part of the flow key for datapath flows and is handled
appropriately in dpif-netdev.
In the 'else' branch in flow_put_on_pmd() function, the additional check
flow_equal(&match.flow, &netdev_flow->flow) was removed, as a) the dpcls
lookup is sufficient to uniquely identify a flow and b) it caused false
negatives because the flow in netdev->flow may not properly masked.
In dpif_netdev_flow_put() we now use the same method for constructing the
netdev_flow_key as the one used when adding the flow to the dplcs to make sure
these always match. The function netdev_flow_key_from_flow() used so far was
not only inefficient but sometimes caused mismatches and subsequent flow
update failures.
The kernel datapath does not support the packet_type match field.
Instead it encodes the packet type implictly by the presence or absence of
the Ethernet attribute in the flow key and mask.
This patch filters the PACKET_TYPE attribute out of netlink flow key and
mask to be sent to the kernel datapath.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
This function attempts to open a bunch of new handlers. If it fails, it
attempts to close all the handlers that have already been opened.
Unfortunately, the loop to close the opened handlers used the wrong array
index: 'i' instead of 'j'. This fixes the problem.
Found by Coverity.
Reported-at: https://scan3.coverity.com/reports.htm#v16889/p10449/fileInstanceId=14762827&defectInstanceId=4305351&mergedDefectId=180429
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
On dpif init, probe for whether tunnels are created using in-tree
(upstream linux) or out-of-tree (OVS). This is done by probing for the
existence of "ovs_geneve" via rtnetlink. This is used to determine how
to create the tunnel devices.
For out-of-tree tunnels, only try genetlink/compat.
For in-tree kernel tunnels, try rtnetlink then fallback to genetlink.
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>
In order to be able to add those tunnels, we need to add code to create
the tunnels and add them as NETDEV vports. And when there is no support
to create them, we need to fallback to compatibility code and add them
as tunnel vports.
When removing those tunnels, we need to remove the interfaces as well,
and detecting the right type might be important, at least to distinguish
the tunnel vports that we should remove and the interfaces that we
shouldn't.
Co-authored-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: Joe Stringer <joe@ovn.org>