This patch adds "Don't Fragment" (DF) flag support to the encap action,
if supported by the kernel. If the kernel does not support this,
it falls back to the previous behavior of ignoring the DF request.
Acked-by: Roi Dayan <roid@nvidia.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
This patch checks to see if the TCA_FLOWER_KEY_ENC_FLAGS key is supported.
Acked-by: Roi Dayan <roid@nvidia.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
The cited commit reserved lower tc priorities for IP ethertypes in order
to give IP traffic higher priority than other management traffic.
In case of of vlan encap traffic, IP traffic will still get lower
priority.
Fix it by also reserving low priority tc prio for vlan.
Fixes: c230c7579c14 ("netdev-offload-tc: Reserve lower tc prios for ip ethertypes")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <horms@ovn.org>
There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
by OVS. Current code just ignores the attribute in the tunnel(set())
action leading to a flow mismatch and potential incorrect datapath
behavior:
|tc(handler21)|DBG|tc flower compare failed action compare
...
Action 0 mismatch:
- Expected Action:
00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
...
- Received Action:
00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
...
In the tc_action dump above we can see the difference on the second
line. The action dumped from a kernel is missing 'c0 5b' - source
port for a tunnel(set()) action on the second line.
Removing the field from the tc_action_encap structure entirely to
avoid any potential confusion.
Note: In general, the source port number in the tunnel(set()) action
is not particularly useful for most tunnels, because they may just
ignore the value. Specs for Geneve and VXLAN suggest using a value
based on the headers of the inner packet as a source port.
In vast majority of scenarios the source port doesn't actually end
up in the action itself.
Having a mismatch between the userspace and TC leads to constant
revalidation of the flow and warnings in the log.
Adding a test case that demonstrates a scenario where the issue
occurs - bridging of two tunnels.
Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
Reported-by: Vladislav Odintsov <odivlad@gmail.com>
Tested-by: Vladislav Odintsov <odivlad@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Add TC offload support for vxlan encap with gbp option.
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Most of the data members of struct tc_action{ } are defined as anonymous
struct in place. Instead of passing all members of an anonymous struct,
which is not flexible to new members being added, expose encap as named
struct and pass it entirely.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Tc flower tunnel key options were encoded in nl_msg_put_flower_tunnel_opts
and decoded in nl_parse_flower_tunnel_opts. Only geneve was supported.
To avoid adding more arguments to the function to support more vxlan
options in the future, change the function arguments to pass tunnel
entirely to it instead of keep adding new arguments.
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.
This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.
It also updates the check_pkt_len tc test to purge the flows, so we do
not use existing updated tc flow counters, but start with fresh installed
set of datapath flows.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
A long long time ago, an effort was made to make tc flower
rtnl_lock() free. However, on the OVS part we forgot to add
the TCA_KIND "flower" attribute, which tell the kernel to skip
the lock. This patch corrects this by adding the attribute for
the delete and get operations.
The kernel code calls tcf_proto_is_unlocked() to determine the
rtnl_lock() is needed for the specific tc protocol. It does this
in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter().
If the name is not set, tcf_proto_is_unlocked() will always return
false. If set, the specific protocol is queried for unlocked support.
Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Currently ethertype to prio hmap is static and the first ethertype
being used gets a lower priority. Usually there is an arp request
before the ip traffic and the arp ethertype gets a lower tc priority
while the ip traffic proto gets a higher priority.
In this case ip traffic will go through more hops in tc and HW.
Instead, reserve lower priorities for ip ethertypes.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Add tc action flags when adding police action to offload meter table.
There is a restriction that the flag of skip_sw/skip_hw should be same for
filter rule and the independent created tc actions the rule uses. In this
case, if we configure the tc-policy as skip_hw, filter rule will be created
with skip_hw flag and the police action according to meter table will have
no action flag, then flower rule will fail to add to tc kernel system.
To fix this issue, we will add tc action flag when adding police action to
offload a meter table, so it will allow meter table to work in tc software
datapath.
Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
This change implements support for the check_pkt_len
action using the TC police action, which supports MTU
checking.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
As the police actions with indexes of 0x10000000-0x1fffffff are
reserved for meter offload, to provide a clean environment for OVS,
these reserved police actions should be deleted on startup. So dump
all the police actions, delete those actions with indexes in this
range.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Add helpers to add, delete and get stats of police action with
the specified index.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Add function to parse police action from netlink message.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Currently, tc merges all header rewrite actions into one tc pedit
action. So the header rewrite actions order is lost. Save each header
rewrite action into one tc pedit action to keep the order. And only
append one tc csum action to the last pedit action of a series.
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Fragmented packets with offset=0 are defragmented by tc act_ct, and
only when assembled pass to next action, in ovs offload case,
a goto action. Since stats are overwritten on each action dump,
only the stats for last action in the tc filter action priority
list is taken, the stats on the goto action, which count
only the assembled packets. See below for example.
Hardware updates just part of the actions (gact, ct, mirred) - those
that support stats_update() operation. Since datapath rules end
with either an output (mirred) or recirc/drop (both gact), tc rule
will at least have one action that supports it. For software packets,
the first action will have the max software packets count.
Tc dumps total packets (hw + sw) and hardware packets, then
software packets needs to be calculated from this (total - hw).
To fix the above, get hardware packets and calculate software packets
for each action, take the max of each set, then combine back
to get the total packets that went through software and hardware.
Example by running ping above MTU (ping <IP> -s 2000):
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=first),
packets:14, bytes:19544,..., actions:ct(zone=1),recirc(0x1)
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=later),
packets:14, bytes:28392,..., actions:ct(zone=1),recirc(0x1)
Second rule should have had bytes=14*<size of 'later' frag>, but instead
it's bytes=14*<size of assembled packets - size of 'first' + 'later'
frags>.
Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Previously, only chain 0 is deleted before attach the ingress block.
If there are rules on the chain other than 0, these rules are not flushed.
In this case, the recreation of qdisc also fails. To fix this issue, flush
rules from all chains.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
There is no need to pass tc rules to hw when just probing
for tc features. this will avoid redundant errors from hw drivers
that may happen.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-By: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
This patch allows to install arp rules to tc dp.
In the future, arp will be offloaded to hardware to
be processed. So OvS enable this now.
$ ovs-appctl dpctl/add-flow 'recirc_id(0),in_port(3),eth(),\
eth_type(0x0806),arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' 2
$ ovs-appctl dpctl/dump-flows
... arp(tip=10.255.1.116,op=2,tha=00:50:56:e1:4b:ab) ...
$ tc filter show dev <ethx> ingress
...
eth_type arp
arp_tip 10.255.1.116
arp_op reply
arp_tha 00:50:56:e1:4b:ab
not_in_hw
action order 1: mirred (Egress Redirect to device <ethy>) stolen
...
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
When dumping flows in terse mode set TCA_DUMP_FLAGS attribute to
TCA_DUMP_FLAGS_TERSE flag to prevent unnecessary copying of data between
kernel and user spaces. Only expect kernel to provide cookie, stats and
flags when dumping filters in terse mode.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Port range struct is currently union so the last min/max port assignment
wins, and kernel doesn't receive the range.
Change it to struct type.
Fixes: 2bf6ffb76ac6 ("netdev-offload-tc: Add conntrack nat support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
tc_make_tcf_id() doesn't initialize the 'chain' field leaving it with a
random value from the stack. This makes request_from_tcf_id() create
request with random TCA_CHAIN included. These requests are obviously
doesn't work as needed leading to broken flow dump and various other
issues. Fix that by using designated initializer instead.
Fixes: acdd544c4c9a ("tc: Introduce tcf_id to specify a tc filter")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Zone and ct_state first.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Each recirculation id will create a tc chain, and we translate
the recirculation action to a tc goto chain action.
We check for kernel support for this by probing OvS Datapath for the
tc recirc id sharing feature. If supported, we can offload rules
that match on recirc_id, and recirculation action safely.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
But net sched api has a limit of 4K message size which is not enough
for 32 actions when echo flag is set.
After a lot of testing, we find that 16 actions is a reasonable number.
So in this commit, we introduced a new define to limit the max actions.
Fixes: 0c70132cd288("tc: Make the actions order consistent")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Recent modifications to TC allows the modifying of fields within the
outermost MPLS header of a packet. OvS datapath rules impliment an MPLS
set action by supplying a new MPLS header that should overwrite the
current one.
Convert the OvS datapath MPLS set action to a TC modify action and allow
such rules to be offloaded to a TC datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC can now be used to push an MPLS header onto a packet. The MPLS label is
the only information that needs to be passed here with the rest reverting
to default values if none are supplied. OvS, however, gives the entire
MPLS header to be pushed along with the MPLS protocol to use. TC can
optionally accept these values so can be made replicate the OvS datapath
rule.
Convert OvS MPLS push datapath rules to TC format and offload to a TC
datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC now supports an action to pop the outer MPLS header from a packet. The
next protocol after the header is required alongside this. Currently, OvS
datapath rules also supply this information.
Offload OvS MPLS pop actions to TC along with the next protocol.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Offloading rules to a TC datapath only allows the creating of ingress hook
qdiscs and the application of filters to these. However, there may be
certain situations where an egress qdisc is more applicable (e.g. when
offloading to TC rules applied to OvS internal ports).
Extend the TC API in OvS to allow the creation of egress qdiscs and to add
or interact with flower filters applied to these.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The TC datapath only permits the offload of mirred actions if they are
egress. To offload TC actions that output to OvS internal ports, ingress
mirred actions are required. At the TC layer, an ingress mirred action
passes the packet back into the network stack as if it came in the action
port rather than attempting to egress the port.
Update OvS-TC offloads to support ingress mirred actions. To ensure
packets that match these rules are properly passed into the network stack,
add a TC skbedit action along with ingress mirred that sets the pkt_type
to PACKET_HOST. This mirrors the functionality of the OvS internal port
kernel module.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.
An example of what to expect:
OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100
Resulting TC filter:
filter protocol ip pref 1 matchall chain 0
filter protocol ip pref 1 matchall chain 0 handle 0x1
not_in_hw
action order 1: police 0x1 rate 5Mbit burst 125Kb mtu 64Kb
action drop/continue overhead 0b
ref 1 bind 1 installed 3 sec used 3 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.200 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 60.13 4.49
ovs-vsctl list interface <netdev_name>
_uuid : 2ca774e8-8b95-430f-a2c2-f8f742613ab1
admin_state : up
...
ingress_policing_burst: 100
ingress_policing_rate: 5000
...
type : ""
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently the TC tunnel_key action is always
initialized with the given tunnel id value. However,
some tunneling protocols define the tunnel id as an optional field.
This patch initializes the id field of tunnel_key:set and tunnel_key:unset
only if a value is provided.
In the case that a tunnel key value is not provided by the user
the key flag will not be set.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Extend ovs-tc translation by allowing non-byte-aligned fields
for set actions. Use new boundary shifts and add set ipv6 traffic
class action offload via pedit.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add setting of ipv4 dscp and ecn fields in tc offload using pedit.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add support for IPv6 hlimit field.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Tunnels (gre, geneve, vxlan) support 'csum' option (true/false), default is false.
Generated encap TC rule will now be configured as the tunnel configuration.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add TC offload support for classifying geneve tunnels with options.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add TC offload support for encapsulating geneve tunnels with options.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add TC offload support for classifying single MPLS tagged traffic.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Move the tunnel match fields to be part of the tc/flower key structure.
This is pre-step for being able to apply masked match where needed.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Support matching on tos and ttl of ip tunnels
for the TC data-path.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Allow to set the tos and ttl for TC tunnels.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add the missing code to match on ip tos when dealing
with the TC data-path.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently the inner VLAN header is ignored when using the TC data-path.
As TC flower supports QinQ, now we can offload the rules to match on both
outer and inner VLAN headers.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>