Upstream kernel now rejects unsupported ct_state flags.
Earlier kernels, ignored it but still echoed back the requested ct_state,
if ct_state was supported. ct_state initial support had trk, new, est,
and rel flags.
If kernel echos back ct_state, assume support for trk, new, est, and
rel. If kernel rejects a specific unsupported flag, continue and
use reject mechanisim to probe for flags rep and inv.
Disallow inserting rules with unnsupported ct_state flags.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Previously, only chain 0 is deleted before attach the ingress block.
If there are rules on the chain other than 0, these rules are not flushed.
In this case, the recreation of qdisc also fails. To fix this issue, flush
rules from all chains.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC flower doesn't support some ct state flags such as
INVALID/SNAT/DNAT/REPLY. So it is better to reject this rule.
Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The n_offloaded_flows counter is saved in dpif, and this is the first
one when ofproto is created. When flow operation is done by ovs-appctl
commands, such as, dpctl/add-flow, a new dpif is opened, and the
n_offloaded_flows in it can't be used. So, instead of using counter,
the number of offloaded flows is queried from each netdev, then sum
them up. To achieve this, a new API is added in netdev_flow_api to get
how many flows assigned to a netdev.
In order to get better performance, this number is calculated directly
from tc_to_ufid hmap for netdev-offload-tc, because flow dumping by tc
takes much time if there are many flows offloaded.
Fixes: af0618470507 ("dpif-netlink: Count the number of offloaded rules")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
There is no need for a 'once' variable per probe.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
tc_replace_flower may fail, so the return value must be checked.
If not zero, ufid can't be deleted. Otherwise the operations on this
filter may fail because its ufid is not found.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
There is no need to pass tc rules to hw when just probing
for tc features. this will avoid redundant errors from hw drivers
that may happen.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-By: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
There is no real difference between the 'class' and 'type' in the
context of common lookup operations inside netdev-offload module
because it only checks the value of pointers without using the
value itself. However, 'type' has some meaning and can be used by
offload provides on the initialization phase to check if this type
of Flow API in pair with the netdev type could be used in particular
datapath type. For example, this is needed to check if Linux flow
API could be used for current tunneling vport because it could be
used only if tunneling vport belongs to system datapath, i.e. has
backing linux interface.
This is needed to unblock tunneling offloads in userspace datapath
with DPDK flow API.
Acked-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Roni Bar Yanai <roniba@mellanox.com>
Acked-by: Ophir Munk <ophirmu@mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently drop action is not offloaded when using userspace datapath
with tc offload. The patch programs tc gact (generic action) chain
ID 0 to drop the packet by setting it to TC_ACT_SHOT.
Example:
$ ovs-appctl dpctl/add-flow netdev@ovs-netdev \
'recirc_id(0),in_port(2),eth(),eth_type(0x0806),\
arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' drop
Or no action also infers drop
$ ovs-appctl dpctl/add-flow netdev@ovs-netdev \
'recirc_id(0),in_port(2),eth(),eth_type(0x0806),\
arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' ''
$ tc filter show dev ovs-p0 ingress
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
eth_type arp
arp_tip 10.255.1.116
arp_op reply
arp_tha 00:50:56:e1:4b:ab
skip_hw
not_in_hw
action order 1: gact action drop
...
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The cited commit intended to add tc support for masking tunnel src/dst
ips and ports. It's not possible to do tunnel ports masking with
openflow rules and the default mask for tunnel ports set to 0 in
tnl_wc_init(), unlike tunnel ports default mask which is full mask.
So instead of never passing tunnel ports to tc, revert the changes
to tunnel ports to always pass the tunnel port.
In sw classification is done by the kernel, but for hw we must match
the tunnel dst port.
Fixes: 5f568d049130 ("netdev-offload-tc: Allow to match the IP and port mask of tunnel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This patch allows to install arp rules to tc dp.
In the future, arp will be offloaded to hardware to
be processed. So OvS enable this now.
$ ovs-appctl dpctl/add-flow 'recirc_id(0),in_port(3),eth(),\
eth_type(0x0806),arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' 2
$ ovs-appctl dpctl/dump-flows
... arp(tip=10.255.1.116,op=2,tha=00:50:56:e1:4b:ab) ...
$ tc filter show dev <ethx> ingress
...
eth_type arp
arp_tip 10.255.1.116
arp_op reply
arp_tha 00:50:56:e1:4b:ab
not_in_hw
action order 1: mirred (Egress Redirect to device <ethy>) stolen
...
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
When dumping flows in terse mode set TCA_DUMP_FLAGS attribute to
TCA_DUMP_FLAGS_TERSE flag to prevent unnecessary copying of data between
kernel and user spaces. Only expect kernel to provide cookie, stats and
flags when dumping filters in terse mode.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
In order to improve revalidator performance by minimizing unnecessary
copying of data, extend netdev-offloads to support terse dump mode. Extend
netdev_flow_api->flow_dump_create() with 'terse' bool argument. Implement
support for terse dump in functions that convert netlink to flower and
flower to match. Set flow stats "used" value based on difference in number
of flow packets because lastuse timestamp is not included in TC terse dump.
Kernel API support is implemented in following patch.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
To support more use case, for example, DDOS, which
packets should be dropped in hardware, this patch
allows users to match only the tunnel source IPs with
masked value.
$ ovs-appctl dpctl/add-flow "tunnel(src=2.2.2.0/255.255.255.0,tp_dst=4789,ttl=64),\
recirc_id(2),in_port(3),eth(),eth_type(0x0800),ipv4()" ""
$ ovs-appctl dpctl/dump-flows
tunnel(src=2.2.2.0/255.255.255.0,ttl=64,tp_dst=4789) ... actions:drop
$ tc filter show dev vxlan_sys_4789 ingress
...
eth_type ipv4
enc_src_ip 2.2.2.0/24
enc_dst_port 4789
enc_ttl 64
in_hw in_hw_count 2
action order 1: gact action drop
...
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This patch allows users to offload the TC flower rules with
tunnel mask. This patch allows masked match of the following,
where previously supported an exact match was supported:
* Remote (dst) tunnel endpoint address
* Local (src) tunnel endpoint address
* Remote (dst) tunnel endpoint UDP port
And also allows masked match of the following, where previously
no match was supported:
* Local (src) tunnel endpoint UDP port
In some case, mask is useful as wildcards. For example, DDOS,
in that case, we don’t want to allow specified hosts IPs or
only source Ports to access the targeted host. For example:
$ ovs-appctl dpctl/add-flow "tunnel(dst=2.2.2.100,src=2.2.2.0/255.255.255.0,tp_dst=4789),\
recirc_id(0),in_port(3),eth(),eth_type(0x0800),ipv4()" ""
$ tc filter show dev vxlan_sys_4789 ingress
...
eth_type ipv4
enc_dst_ip 2.2.2.100
enc_src_ip 2.2.2.0/24
enc_dst_port 4789
enc_ttl 64
in_hw in_hw_count 2
action order 1: gact action drop
...
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Not bugfix, make the codes more readable.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
It's possible that block_id could changes after the probe for block
support. Therefore, fetch the block_id again after the probe.
Fixes: edc2055a2bf7 ("netdev-offload-tc: Flush rules on ingress block when init tc flow api")
Cc: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Co-authored-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
OVS can fail to attach ingress block on iface when init tc flow api,
if block already exist with rules on it and is shared with other iface.
Fix by flush all existing rules on the ingress block prior to deleting
it.
Fixes: 093c9458fb02 ("tc: allow offloading of block ids")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Acked-by: Raed Salem <raeds@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The tc modify flow put always delete the original flow first and
then add the new flow. If the modfiy flow put operation failed,
the flow put operation will change from modify to create if success
to delete the original flow in tc (which will be always failed with
ENOENT, the flow is already be deleted before add the new flow in tc).
Finally, the modify flow put will failed to add in kernel datapath.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Openstack may set an skb mark of 0 in tunnel rules. This is considered to
be an unused/unset value. However, it prevents the rule from being
offloaded.
Check if the key value of the skb mark is 0 when it is in use (mask is
set to all ones). If it is then ignore the field and continue with TC offload.
Only the exact-match case is covered by this patch as it addresses the
Openstack use-case and seems most robust against feature evolution: f.e. in
future there may exist hardware offload scenarios where an operation, such
as a BPF offload, sets the SKB mark before proceeding tho the in-HW OVS.
datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Co-Authored-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Aaron Conole <aconole@redhat.com>
'dpif_probe_feature'/'revalidate' doesn't free the 'dp_extra_info'
string. Also, all the implementations of dpif_flow_get() should
initialize the value to avoid printing/freeing of random memory.
30 bytes in 1 blocks are definitely lost in loss record 323 of 889
at 0x483AD19: realloc (vg_replace_malloc.c:836)
by 0xDDAD89: xrealloc (util.c:149)
by 0xCE1609: ds_reserve (dynamic-string.c:63)
by 0xCE1A90: ds_put_format_valist (dynamic-string.c:161)
by 0xCE19B9: ds_put_format (dynamic-string.c:142)
by 0xCCCEA9: dp_netdev_flow_to_dpif_flow (dpif-netdev.c:3170)
by 0xCCD2DD: dpif_netdev_flow_get (dpif-netdev.c:3278)
by 0xCCEA0A: dpif_netdev_operate (dpif-netdev.c:3868)
by 0xCDF81B: dpif_operate (dpif.c:1361)
by 0xCDEE93: dpif_flow_get (dpif.c:1002)
by 0xCDECF9: dpif_probe_feature (dpif.c:962)
by 0xC635D2: check_recirc (ofproto-dpif.c:896)
by 0xC65C02: check_support (ofproto-dpif.c:1567)
by 0xC63274: open_dpif_backer (ofproto-dpif.c:818)
by 0xC65E3E: construct (ofproto-dpif.c:1605)
by 0xC4D436: ofproto_create (ofproto.c:549)
by 0xC3931A: bridge_reconfigure (bridge.c:877)
by 0xC3FEAC: bridge_run (bridge.c:3324)
by 0xC4551D: main (ovs-vswitchd.c:127)
CC: Emma Finn <emma.finn@intel.com>
Fixes: 0e8f5c6a38d0 ("dpif-netdev: Modified ovs-appctl dpctl/dump-flows command")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
If output device is not yet added to netdev-offload, netdev_ports_get()
will not find it leading to NULL pointer dereference inside
netdev_get_ifindex().
Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Zone and ct_state first.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Each recirculation id will create a tc chain, and we translate
the recirculation action to a tc goto chain action.
We check for kernel support for this by probing OvS Datapath for the
tc recirc id sharing feature. If supported, we can offload rules
that match on recirc_id, and recirculation action safely.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
To be consistent with our tc-ufid mapping after flush, and to support tc
chains flushing in the next commit, implement flush operation via
deleting all the filters we actually added and delete their mappings.
This will also not delete the configured qos policing via matchall filters,
while old code did.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Tunnel id 0 is not printed unless tunnel flag FLOW_TNL_F_KEY is set.
Fix that by always setting FLOW_TNL_F_KEY when tunnel id is valid.
Fixes: 0227bf092ee6 ("lib/tc: Support optional tunnel id")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
A lot of the offload implementations didn't bother to initialize the
statistics they were supposed to return. I don't know whether any of
the callers actually use them, but it looked wrong.
Found by inspection.
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
But net sched api has a limit of 4K message size which is not enough
for 32 actions when echo flag is set.
After a lot of testing, we find that 16 actions is a reasonable number.
So in this commit, we introduced a new define to limit the max actions.
Fixes: 0c70132cd288("tc: Make the actions order consistent")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Recent modifications to TC allows the modifying of fields within the
outermost MPLS header of a packet. OvS datapath rules impliment an MPLS
set action by supplying a new MPLS header that should overwrite the
current one.
Convert the OvS datapath MPLS set action to a TC modify action and allow
such rules to be offloaded to a TC datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC can now be used to push an MPLS header onto a packet. The MPLS label is
the only information that needs to be passed here with the rest reverting
to default values if none are supplied. OvS, however, gives the entire
MPLS header to be pushed along with the MPLS protocol to use. TC can
optionally accept these values so can be made replicate the OvS datapath
rule.
Convert OvS MPLS push datapath rules to TC format and offload to a TC
datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC now supports an action to pop the outer MPLS header from a packet. The
next protocol after the header is required alongside this. Currently, OvS
datapath rules also supply this information.
Offload OvS MPLS pop actions to TC along with the next protocol.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
'mask' must be checked first before configuring key in flower.
CC: Eli Britstein <elibr@mellanox.com>
Fixes: 0b0a84783cd6 ("netdev-tc-offloads: Support match on priority tags")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
A preliminary netdev qdisc cleanup is done during init tc flow.
The cited commit allows for creating of egress hook qdiscs on internal
ports. This breaks the netdev qdisc cleanup as currently only ingress
hook qdiscs type is deleted. As a consequence the check for tc ingress
shared block support fails when the check is done on internal port.
Issue can be reproduced by the following steps:
- start openvswitch service
- create ovs bridge
- restart openvswitch service
Fix by using the correct hook qdisc type at netdev hook qdisc cleanup.
Fixes 608ff46aaf0d ("ovs-tc: offload datapath rules matching on internal ports")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Flow API providers renamed to be consistent with parent module
'netdev-offload' and look more like each other.
'_rte_' replaced with more convenient '_dpdk_'.
We'll have following structure:
Common code:
lib/netdev-offload-provider.h
lib/netdev-offload.c
lib/netdev-offload.h
Providers:
lib/netdev-offload-tc.c
lib/netdev-offload-dpdk.c
'netdev-offload-dummy' still resides inside netdev-dummy, but it
makes no much sence to move it out of there.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>