Recent modifications to TC allows the modifying of fields within the
outermost MPLS header of a packet. OvS datapath rules impliment an MPLS
set action by supplying a new MPLS header that should overwrite the
current one.
Convert the OvS datapath MPLS set action to a TC modify action and allow
such rules to be offloaded to a TC datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC can now be used to push an MPLS header onto a packet. The MPLS label is
the only information that needs to be passed here with the rest reverting
to default values if none are supplied. OvS, however, gives the entire
MPLS header to be pushed along with the MPLS protocol to use. TC can
optionally accept these values so can be made replicate the OvS datapath
rule.
Convert OvS MPLS push datapath rules to TC format and offload to a TC
datapath.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TC now supports an action to pop the outer MPLS header from a packet. The
next protocol after the header is required alongside this. Currently, OvS
datapath rules also supply this information.
Offload OvS MPLS pop actions to TC along with the next protocol.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Offloading rules to a TC datapath only allows the creating of ingress hook
qdiscs and the application of filters to these. However, there may be
certain situations where an egress qdisc is more applicable (e.g. when
offloading to TC rules applied to OvS internal ports).
Extend the TC API in OvS to allow the creation of egress qdiscs and to add
or interact with flower filters applied to these.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The TC datapath only permits the offload of mirred actions if they are
egress. To offload TC actions that output to OvS internal ports, ingress
mirred actions are required. At the TC layer, an ingress mirred action
passes the packet back into the network stack as if it came in the action
port rather than attempting to egress the port.
Update OvS-TC offloads to support ingress mirred actions. To ensure
packets that match these rules are properly passed into the network stack,
add a TC skbedit action along with ingress mirred that sets the pkt_type
to PACKET_HOST. This mirrors the functionality of the OvS internal port
kernel module.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.
An example of what to expect:
OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100
Resulting TC filter:
filter protocol ip pref 1 matchall chain 0
filter protocol ip pref 1 matchall chain 0 handle 0x1
not_in_hw
action order 1: police 0x1 rate 5Mbit burst 125Kb mtu 64Kb
action drop/continue overhead 0b
ref 1 bind 1 installed 3 sec used 3 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.200 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 60.13 4.49
ovs-vsctl list interface <netdev_name>
_uuid : 2ca774e8-8b95-430f-a2c2-f8f742613ab1
admin_state : up
...
ingress_policing_burst: 100
ingress_policing_rate: 5000
...
type : ""
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently the TC tunnel_key action is always
initialized with the given tunnel id value. However,
some tunneling protocols define the tunnel id as an optional field.
This patch initializes the id field of tunnel_key:set and tunnel_key:unset
only if a value is provided.
In the case that a tunnel key value is not provided by the user
the key flag will not be set.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Extend ovs-tc translation by allowing non-byte-aligned fields
for set actions. Use new boundary shifts and add set ipv6 traffic
class action offload via pedit.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add setting of ipv4 dscp and ecn fields in tc offload using pedit.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
pedit allows setting entire words with an optional mask and OVS
makes use of such masks to allow setting fields that do not span
entire words. One mask for leading bytes that should not be
updated and another mask for trailing bytes that should not be
updated. The masks are created using bit shifts.
In the case of the mask to omit trailing bytes a right bit shift
is used. Currently the code can produce shifts of 1, 2, 3 or 4
bytes (8, 16, 24 or 32 bits) based on the alignment of the end
of field being set.
However, a shift of 32 bits on a 32bit value is undefined.
As it stands the code relies on the result of UINT32_MAX >> 32
being UINT32_MAX. Or in other words a mask that results in the
pedit action setting all bytes of the word under operation.
This patch adjusts the code to use a shift of 0 for this case,
which gives the same result as the undefined behaviour that was
relied on, and appears logically correct as the desire is for no
trailing bytes (or bits!) to be omitted from the set action.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
pedit allows setting entire words with an optional mask and OVS
makes use of such masks to allow setting fields that do not span
entire words.
The struct tc_pedit_key structure, which is part of the kernel
ABI, uses host byte order fields to store the mask and value for
a pedit action, however, these fields contain values in network
byte order.
In order to allow static analysis tools to check for endianness
problems this patch adds a local version of struct tc_pedit_key
which uses big endian types and refactors the relevant code as
appropriate.
In the course of making this change it became apparent that the
calculation of masks was occurring using host byte order although
the values are in network byte order. This patch also fixes that
problem by shifting values in host byte order and then converting
them to network byte order. It is believe this fixes a bug on big
endian systems although we are not in a position to test that.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add support for IPv6 hlimit field.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
For non UDP tunnels as GRE there is no UDP port, i.e initialized to 0.
Do not set the port attribute in such case.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Tunnels (gre, geneve, vxlan) support 'csum' option (true/false), default is false.
Generated encap TC rule will now be configured as the tunnel configuration.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add TC offload support for classifying geneve tunnels with options.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add TC offload support for encapsulating geneve tunnels with options.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Previously the key was used to check the presence of vlan id and
prio fields instead of using the mask. Additionally the vlan id
field was considered to be present if only the prio field was set,
and vice versa. f.e. setting the following:
ovs-ofctl -OOpenFlow13,OpenFlow15 add-flow br0 \
priority=10,cookie=1,table=0,ip,dl_vlan_pcp=2,actions=output:2
Resulted in (instead of wildcarding vlan_id, filter matches 0):
filter protocol 802.1Q pref 1 flower chain 0
filter protocol 802.1Q pref 1 flower chain 0 handle 0x1
vlan_id 0
vlan_prio 2
vlan_ethtype ip
eth_type ipv4
ip_flags nofrag
in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 2 ref 1 bind 1 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie 47040ae7a94fff6afd7ed8aa04b11ba4
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add TC offload support for classifying single MPLS tagged traffic.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The tunnel ttl key is not masked when provided to the tc lib, hence we
wrongly attempted to match on it, when we got non zero ttl key with a zero
mask. Fix it by applying the mask. Use the same practice for the tunnel tos.
Fixes: dd83253e117c ('lib/tc: Support matching on ip tunnel tos and ttl')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Move the tunnel match fields to be part of the tc/flower key structure.
This is pre-step for being able to apply masked match where needed.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Support matching on tos and ttl of ip tunnels
for the TC data-path.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Allow to set the tos and ttl for TC tunnels.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Add the missing code to match on ip tos when dealing
with the TC data-path.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
TTL can and should be used to match on IPv6's hop-limit, fix that.
Fixes: ab7ecf266b0a ('netdev-tc-offloads: Add nw_ttl matching using flower')
Fixes: 0b4b5203d12e ('tc: Add ip layer ttl matching')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently the inner VLAN header is ignored when using the TC data-path.
As TC flower supports QinQ, now we can offload the rules to match on both
outer and inner VLAN headers.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently, we assume VLAN ethtertype is 0x8100, but it could
be 0x88a8 if QinQ is supported.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently we only support 802.1q, so we can offload push action without
specifying any vlan type. Kernel will push 802.1q ethertype by default.
But to support QinQ, we need to tell what ethertype is in push action as
it could be 802.1ad.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Blocks, in tc classifiers, allow the grouping of multiple qdiscs with an
associated block id. Whenever a filter is added to/removed from this
block, the filter is added to/removed from all associated qdiscs.
Extend TC offload functions to take a block id as a parameter. If the id
is zero then the dqisc is not considered part of a block.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Previously, any rule that is offloaded via a netdev, not necessarily
to the HW, would be reported as "offloaded". This patch fixes this
misalignment, and introduces the 'dp' state, as follows:
rule is in HW via TC offload -> offloaded=yes dp:tc
rule is in not HW over TC DP -> offloaded=no dp:tc
rule is in not HW over OVS DP -> offloaded=no dp:ovs
To achieve this, the flows's 'offloaded' flag was encapsulated in a new
attrs struct, which contains the offloaded state of the flow and the
DP layer the flow is handled in, and instead of setting the flow's
'offloaded' state based solely on the type of dump it was acquired
via, for netdev flows it now sends the new attrs struct to be
collected along with the rest of the flow via the netdev, allowing
it to be set per flow.
For TC offloads, the offloaded state is set based on the 'in_hw' and
'not_in_hw' flags received from the TC as part of the flower. If no
such flag was received, due to lack of kernel support, it defaults
to true.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Acked-by: Roi Dayan <roid@mellanox.com>
[simon: resolved conflict in lib/dpctl.man]
Signed-off-by: Simon Horman <simon.horman@netronome.com>
ICMP checksum is calculated from ICMP headers and data, so hardware doesn't
need to calculate it again because we only rewrite IP headers.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Currently, we support offloading of one output port. Remove that
limitation by use of mirred mirror action for all output ports,
except that the last one is mirred redirect action.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
When OVS DP passes the actions to TC library, we save all the
actions in data structure tc_flower and each action type has its
own field in tc_flower. So when TC library passes the actions to
kernel, actually the actions order is lost.
We add an actions array in tc_flower to keep the actions order
in this patch.
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Also update the message to be more correct.
Before this commit if there were tc rules that are not of type
flower the log was getting filled quickyl with errors about it
and always appeared to the user when dumping flows from user space.
This commit moves the error to debug and logs it only once.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
"sparse" complains with the warning 'incorrect type in argument 1
(different base types)' in function nl_parse_flower_ip when parsing a key
flag and in function nl_msg_put_flower_options when writing the key
flag. Fix this by using network byte order when reading and writing key
flags to netlink messages.
Fixes: 83e86606 ("netdev-tc-offloads: Add support for IP fragmentation")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
Add support for frag no, first and later.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Raise the error up instead of ignoring it.
Before this commit beside an error an incorrect rule was also printed.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
The tc_flower conversion struct does not consider the order of actions.
If an OvS rule matches on a tunnel (decap) and outputs to a new tunnel,
the netlink conversion to TC will add the set tunnel key action before the
unset, leading to an incorrect TC rule. This patch reorders the netlink
generation to ensure a decap is done before an encap if both exist.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Open vSwitch enables the GCC 7+ option that warns about fall-through
switch statements. This commit fixes newly introduced warnings.
Fixes: d6118e628988 ("netdev-tc-offloads: Verify csum flags on dump from tc")
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Paul Blakey <paulb@mellanox.com>
Currently we send the tc csum action even if it's not needed.
Fix that by sending it only if csum update flags isn't zero.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Fix the struct variable order to corrospond with
it's usage.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
On dump, parse and verify the tc csum action update flags
in the same way as we put them.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
We only support extended pedit keys for now, so it's the type we
expect. Skip the legacy ones instead.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
To be later used to implement ovs action set offloading.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellacom>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
To be used later for offloading rules matching on tcp_flags.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Add matching on ip layer ttl, to be used later.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Joe Stringer <joe@ovn.org>
Use sysconf(_SC_CLK_TCK) to read run time "number of clock ticks per
second" and use that to convert ticks to msecs.
This is how iproute does the conversion when parsing tc filters.
The system call is done only once.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Split dst/src_port and ipv4/ipv6 union so we can
distingush them easily for later features.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Refactor nl_msg_put_flower_options to be more readable.
This commit doesn't change functionality.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>