2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

81 Commits

Author SHA1 Message Date
Maor Dickman
6280f5d04a netdev-offload-tc: Reserve lower tc prio for vlan ethertype.
The cited commit reserved lower tc priorities for IP ethertypes in order
to give IP traffic higher priority than other management traffic.
In case of of vlan encap traffic, IP traffic will still get lower
priority.

Fix it by also reserving low priority tc prio for vlan.

Fixes: c230c7579c14 ("netdev-offload-tc: Reserve lower tc prios for ip ethertypes")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <horms@ovn.org>
2024-06-06 13:45:48 +01:00
Timothy Redaelli
be695f26fd netdev-offload-tc: Check geneve metadata length.
Currently ovs-vswitchd crashes, with hw offloading enabled, if a geneve
packet with corrupted metadata is received, because the metadata header
is not verified correctly.

This commit adds a check for geneve metadata length and, if the header
is wrong, the packet is not sent to flower.

It also includes a system-traffic test for geneve packets with corrupted
metadata.

Fixes: a468645c6d33 ("lib/tc: add geneve with option match offload")
Reported-by: Haresh Khandelwal <hakhande@redhat.com>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-02-08 16:25:42 +01:00
Ilya Maximets
472dd66423 netdev-offload-tc: Fix offload of tunnel key tp_src.
There is no TCA_TUNNEL_KEY_ENC_SRC_PORT in the kernel, so the offload
should not be attempted if OVS_TUNNEL_KEY_ATTR_TP_SRC is requested
by OVS.  Current code just ignores the attribute in the tunnel(set())
action leading to a flow mismatch and potential incorrect datapath
behavior:

  |tc(handler21)|DBG|tc flower compare failed action compare
  ...
  Action 0 mismatch:
  - Expected Action:
  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
  00000020 c0 5b 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
  ...
 - Received Action:
  00000010 01 00 00 00 00 00 00 00-00 00 00 00 00 ff 00 11
  00000020 00 00 17 c1 00 40 00 00-0a 01 00 6d 0a 01 01 12
  00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000060 01 02 80 01 00 18 00 0b-00 00 00 00 00 00 00 00
  ...

In the tc_action dump above we can see the difference on the second
line.  The action dumped from a kernel is missing 'c0 5b' - source
port for a tunnel(set()) action on the second line.

Removing the field from the tc_action_encap structure entirely to
avoid any potential confusion.

Note: In general, the source port number in the tunnel(set()) action
is not particularly useful for most tunnels, because they may just
ignore the value.  Specs for Geneve and VXLAN suggest using a value
based on the headers of the inner packet as a source port.
In vast majority of scenarios the source port doesn't actually end
up in the action itself.
Having a mismatch between the userspace and TC leads to constant
revalidation of the flow and warnings in the log.

Adding a test case that demonstrates a scenario where the issue
occurs - bridging of two tunnels.

Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-October/052744.html
Reported-by: Vladislav Odintsov <odivlad@gmail.com>
Tested-by: Vladislav Odintsov <odivlad@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-12-04 17:57:24 +01:00
Gavin Li
7f04588d78 netdev-tc-offloads: Probe for allowing vxlan gbp support.
Kernels that do not support vxlan gbp would treat the rule that has vxlan
gbp encap action or vxlan gbp id match differently, either reject it or
just skip the action/match and continue processing the knowing ones.

To solve the issue, probe and disallow inserting rules with vxlan gbp
action/match if kernel does not support it.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
a2a3f1983f tc: Add vxlan encap action with gbp option offload.
Add TC offload support for vxlan encap with gbp option.

Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Gavin Li
a4332b5e68 tc: Add vxlan gbp option flower match offload.
Add TC offload support for filtering vxlan tunnels with gbp option.

Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
2023-07-03 11:56:39 +02:00
Roi Dayan
77d8228985 tc: Fix cleaning chains.
Sometimes there is a need to clean empty chains as done in
delete_chains_from_netdev().  The cited commit doesn't remove
the chain completely which cause adding ingress_block later to fail.
This can be reproduced with adding bond as ovs port which makes ovs
use ingress_block for it.
While at it add the netdev name that fails to the log.

Fixes: e1e5eac5b016 ("tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-28 19:01:14 +02:00
Faicker Mo
f9507c1ea4 netdev-offload-tc: Del ufid mapping if device not exist.
The device may be deleted and added with ifindex changed.
The tc rules on the device will be deleted if the device is deleted.
The func tc_del_filter will fail when flow del. The mapping of
ufid to tc will not be deleted.
The traffic will trigger the same flow(with same ufid) to put to tc
on the new device. Duplicated ufid mapping will be added.
If the hashmap is expanded, the old mapping entry will be the first entry,
and now the dp flow can't be deleted.

Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-04-03 20:39:33 +02:00
Mike Pattrick
306583b568 netdev-tc-offloads: Fix misaligned 8 byte read.
UB Sanitizer report:

lib/netdev-offload-tc.c:1276:19: runtime error: load of misaligned
address 0x7f74e801976c for type 'union ovs_u128', which requires 8 byte
alignment

    0 in netdev_tc_flow_dump_next lib/netdev-offload-tc.c:1276
    1 in netdev_flow_dump_next lib/netdev-offload.c:303
    2 in dpif_netlink_flow_dump_next lib/dpif-netlink.c:1921
    [...]

Fixes: 8f7620e6a406 ("netdev-tc-offloads: Implement netdev flow dump api using tc interface")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-03-29 23:20:06 +02:00
Eelco Chaudron
d53ee36aa6 netdev-offload-tc: Fix parse_tc_flower_to_actions() reporting errors.
parse_tc_flower_to_actions() was not reporting errors, which would
cause parse_tc_flower_to_match() to ignore them.

Fixes: dd03672f7bbb ("netdev-offload-tc: Move flower_to_match action handling to isolated function.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-03-22 20:06:34 +01:00
Eelco Chaudron
b292cce2ff netdev-offload-tc: Conntrack ALGs are not supported with tc.
tc does not support conntrack ALGs. Even worse, with tc enabled, they
should not be used/configured at all. This is because even though TC
will ignore the rules with ALG configured, i.e., they will flow through
the kernel module, return traffic might flow through a tc conntrack
rule, and it will not invoke the ALG helper.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:32:55 +01:00
Eelco Chaudron
564d09ef53 netdev-offload-tc: Fix tc conntrack force commit support.
tc was not setting the OVS_CT_ATTR_FORCE_COMMIT flag when a forced
commit was requested. This patch will fix this.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-09 00:25:36 +01:00
Eelco Chaudron
f68e757ef1 tests: Include working system-traffic tests into the system-offloads-testsuite.
Include and run the system-traffic.at tests as part of the system offload
testsuite. Exclude all the tests that will not run without any special
modifications.

Lowered log level for "recirc_id sharing not supported" message, so tests
will not fail with older kernels. This is not an error level message, but
should be debug, like all other, EOPNOTSUPP, related log messages.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-08 22:14:22 +01:00
Eelco Chaudron
b1f58f5072 netdev-offload-tc: Preserve tc statistics when flow gets modified.
When a flow gets modified, i.e. the actions are changes, the tc layer will
remove, and re-add the flow. This is causing all the counters to be reset.

This patch will remember the previous tc counters and adjust any requests
for statistics. This is done in a similar way as the rte_flow implementation.

It also updates the check_pkt_len tc test to purge the flows, so we do
not use existing updated tc flow counters, but start with fresh installed
set of datapath flows.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-02-03 16:39:59 +01:00
Eelco Chaudron
e1e5eac5b0 tc: Add TCA_KIND flower to delete and get operation to avoid rtnl_lock().
A long long time ago, an effort was made to make tc flower
rtnl_lock() free. However, on the OVS part we forgot to add
the TCA_KIND "flower" attribute, which tell the kernel to skip
the lock. This patch corrects this by adding the attribute for
the delete and get operations.

The kernel code calls tcf_proto_is_unlocked() to determine the
rtnl_lock() is needed for the specific tc protocol. It does this
in the tc_new_tfilter(), tc_del_tfilter(), and in tc_get_tfilter().

If the name is not set, tcf_proto_is_unlocked() will always return
false. If set, the specific protocol is queried for unlocked support.

Fixes: f98e418fbdb6 ("tc: Add tc flower functions")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-30 21:12:31 +01:00
Ilya Maximets
9117f4d54f netdev-offload-tc: Fix misaligned access to ct label.
UndefinedBehaviorSanitizer:

  lib/netdev-offload-tc.c:1356:50: runtime error:
   member access within misaligned address 0x60700001a89c for type
   'const struct (unnamed struct at lib/netdev-offload-tc.c:1350:27)',
   which requires 8 byte alignment 0x60700001a89c: note: pointer points here
   24 00 04 00 01 00 00 05  00 00 0d 00 0a 00 00 00  00 00 00 00 ...
               ^
   0 0xd5d183 in parse_put_flow_ct_action lib/netdev-offload-tc.c:1356:50
   1 0xd5783f in netdev_tc_parse_nl_actions lib/netdev-offload-tc.c:2015:19
   2 0xd4027c in netdev_tc_flow_put lib/netdev-offload-tc.c:2355:11
   3 0x9666d7 in netdev_flow_put lib/netdev-offload.c:318:14
   4 0xcd4c0a in parse_flow_put lib/dpif-netlink.c:2297:11
   5 0xcd4c0a in try_send_to_netdev lib/dpif-netlink.c:2384:15
   6 0xcd4c0a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
   7 0x87d40e in dpif_operate lib/dpif.c:1372:13
   8 0x6d43e9 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
   9 0x6d43e9 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
   10 0x6cf6ea in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
   11 0xb6d7ea in ovsthread_wrapper lib/ovs-thread.c:423:12
   12 0x7f5ccf017801 in start_thread
   13 0x7f5ccefb744f in __GI___clone3

Fixes: 9221c721bec0 ("netdev-offload-tc: Add conntrack label and mark support")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2023-01-27 16:57:27 +01:00
Paul Blakey
c230c7579c netdev-offload-tc: Reserve lower tc prios for ip ethertypes
Currently ethertype to prio hmap is static and the first ethertype
being used gets a lower priority. Usually there is an arp request
before the ip traffic and the arp ethertype gets a lower tc priority
while the ip traffic proto gets a higher priority.
In this case ip traffic will go through more hops in tc and HW.
Instead, reserve lower priorities for ip ethertypes.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-11-08 06:59:20 -05:00
Ilya Maximets
1336eeb570 netdev-offload-tc: Parse tunnel options only for geneve ports.
Cited commit correctly fixed the issue of incorrect reporting
of zero-length geneve option matches as wildcarded.  But as a
side effect, exact match on the metadata length was added to
every tunnel flow match, even if the tunnel is not geneve.
That doesn't generate any functional issues, but it maybe
confusing to see 'tunnel(...,geneve(),...)' while looking at
datapath flow dumps for, e.g., vxlan tunnel flows.

Fix that by checking the port type before parsing geneve options.
tunnel() attribute itself doesn't have enough information to
figure out the tunnel type.

Fixes: 7a6c8074c5d2 ("netdev-offload-tc: Fix the mask for tunnel metadata length.")
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-30 22:37:35 +02:00
Ilya Maximets
ff55e8f385 netdev-offload-tc: Add missing handling of the tunnel source port.
netdev_tc_flow_put() "consumes" the tunnel.tp_src value, but
it's never passed down to TC, and not parsed back.  Fix that.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:40:00 +02:00
Ilya Maximets
262eded5fb netdev-offload-tc: Fix ignoring unknown tunnel keys.
Current offloading code supports only limited number of tunnel keys
and silently ignores everything it doesn't understand.  This is
causing, for example, offloaded ERSPAN tunnels to not work, because
flow is offloaded, but ERSPAN options are not provided to TC.

There is a number of tunnel keys, which are supported by the userspace,
but silently ignored during offloading:

  OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT
  OVS_TUNNEL_KEY_ATTR_OAM
  OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS
  OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS

OVS_TUNNEL_KEY_ATTR_CSUM is kind of supported, but only for actions
and for some reason is set from the tunnel port instead of the
provided action, and not currently supported for the tunnel key in
the match.

Addig a default case to fail offloading of unknown attributes.  For
now explicitly allowing incorrect behavior for the DONT_FRAGMENT flag,
otherwise we'll break all tunnel offloading by default.  VXLAN and
ERSPAN options has to fail offloading, because the tunnel will not
work otherwise.  OAM is not a default configurations, so failing it
as well. The missing DONT_FRAGMENT flag though should, probably,
cause frequent flow revalidation, but that is not new with this patch.

Same for the 'match' key, only clearing masks that was actually
consumed, except for the DONT_FRAGMENT and CSUM flags, which are
explicitly allowed and highlighted as broken.

Also, destination port as well as CSUM configuration for unknown
reason was not taken from the actions list and were passed via HW
offload info instead of being consumed from the set() action.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395522.html
Reported-by: Eelco Chaudron <echaudro@redhat.com>
Fixes: 8f283af89298 ("netdev-tc-offloads: Implement netdev flow put using tc interface")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:39:51 +02:00
Ilya Maximets
7a26634678 netdev-offload-tc: Use masks instead of keys while parsing tunnel attributes.
If the key is zero, it doesn't mean we don't need to match on it.
Masks should be checked instead.

Fixes: 49a7961fca65 ("lib/tc: Avoid matching on tunnel ttl or tos if not needed")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:39:51 +02:00
Ilya Maximets
5d91bdf12e netdev-offload-tc: Explicitly handle mask for the tunnel destination port.
netdev_tc_flow_put() ignores the tunnel.tp_dst mask.  That results
in the exact match on the value.  TC supports the masked match on
this field and it does return the mask back during the flow dump
even if it wasn't provided initially.  OVS should correctly handle
that.  There is a problem though.  Some drivers (mlx5) doesn't
support offloading if the destination port is not an exact match [1].

Keeping the logic as-is for now, but making it explicit and somewhat
documented in the comment, so it is clear what is happening and we can
revisit this in the future.

[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20220704224505.1117988-3-i.maximets@ovn.org/#2927396

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:39:51 +02:00
Ilya Maximets
7a6c8074c5 netdev-offload-tc: Fix the mask for tunnel metadata length.
'wc.masks.tunnel.metadata.present.len' must be a mask for the same
field in the flow key, but in the tc_flower structure it's the real
length of metadata masks.

That is correctly handled for the individual opt->length, setting all
the masks to 0x1f like it's done in the tun_metadata_to_geneve_mask__(),
but not handled for the main 'len' field.

Fix that by setting the mask to 0xff like it's done during the flow
translation in xlate_actions() and during the flow dump in the
tun_metadata_from_geneve_nlattr().

Also, flower always has an exact match on the present.len field
regardless of its value and regardless of this field being masked
by OVS flow translation layer while installing the flow.  Hence,
all tunnel flows dumped from TC should have an exact match on
present.len and also UDPIF flag, because present.len doesn't make
sense without that flag.  Without the change, zero-length options
match is incorrectly reported as a wildcard match.  The side effect
though is that zero-length match on geneve options is reported even
for other tunnel types, e.g. vxlan.  But that should be fairly
harmless.  To avoid reporting a match on empty geneve options for
vxlan/etc. tunnels we'll need to check the tunnel port type, there
is no enough information in the TUNNEL attribute itself.

Extra checks and comments added around the code to better explain
what is going on.

Fixes: a468645c6d33 ("lib/tc: add geneve with option match offload")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:39:45 +02:00
Ilya Maximets
1fd336ccee netdev-offload-tc: Disable offload of IPv6 fragments.
OVS kernel datapath and TC are parsing IPv6 fragments differently.
For IPv6 later fragments, according to the original design [1], OVS
always sets nw_proto=44 (IPPROTO_FRAGMENT), regardless of the type
of the L4 protocol.

This leads to situation where flow for nw_proto=44 gets installed
to TC, but packets can not match on it, causing all IPv6 later
fragments to go to userspace significantly degrading performance.

Disabling offload for such packets, so the flow can be installed
to the OVS kernel datapath instead.  Disabling for all IPv6 fragments
including the first one, because it doesn't make a lot of sense to
handle them separately.  It may also cause potential problems with
conntrack trying to re-assemble a packet from fragments handled by
different datapaths (first in HW, later in OVS kernel).

Checking both 'nw_proto' and 'nw_frag' as classifier might decide
to match only on one of them and also nw_proto will not be 44 for
the first fragment.

The issue was hidden for some time due to incorrect behavior of the
OVS kernel datapath that was recently fixed in kernel commit:

 12378a5a75e3 ("net: openvswitch: fix parsing of nw_proto for IPv6 fragments")

To allow offloading in the future either flow dissector in TC
should be changed to parse packets in the same way as OVS does,
or parsing in OVS kernel and userspace should be made configurable,
so users can opt-in to the behavior change.  Silent change of the
behavior (change by default) is not an option, because existing
OpenFlow pipelines may depend on a certain behavior.

[1] https://docs.openvswitch.org/en/latest/topics/design/#fragments

Fixes: 83e866067ea6 ("netdev-tc-offloads: Add support for IP fragmentation")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-08 19:15:09 +02:00
Ilya Maximets
a7680c3caf netdev-offload-tc: Print unused mask bits on failure.
This change extends the debug logging with the sparse
dump of the flow mask structure to make debug process
easier.

Sample output:

  |netdev_offload_tc|DBG|offloading isn't supported, unknown attribute
  Unused mask bits:
  00000270  00 00 00 00 00 00 00 00-00 00 00 ff 00 00 00 00

In this example, 0x270 + 11 = 635, which is an offset of
the nsh.mdtype in the struct flow.

Suggested-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-04 14:18:09 +02:00
Eelco Chaudron
e4daf88a43 netdev-offload-tc: Handle check_pkt_len datapath action.
This change implements support for the check_pkt_len
action using the TC police action, which supports MTU
checking.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-13 17:39:22 +02:00
Eelco Chaudron
dd03672f7b netdev-offload-tc: Move flower_to_match action handling to isolated function.
Move handling of the individual actions in the parse_tc_flower_to_match()
function to a separate function that will make recursive action handling
easier.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-13 17:39:22 +02:00
Eelco Chaudron
64365293a5 netdev-offload-tc: Move flow_put action handling to isolated function.
Move handling of the individual actions in the netdev_tc_flow_put()
function to a separate function that will make recursive action handling
easier.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-13 17:39:22 +02:00
Jianbo Liu
218304df18 netdev-offload-tc: Offloading rules with police actions
When offloading rule, tc should be filled with police index, instead
of meter id. As meter is mapped to police action, and the mapping is
inserted into meter_id_to_police_idx hmap, this hmap is used to find
the police index. Besides, the reverse mapping between meter id and
police index is also added, so meter id can be retrieved from this
hashmap and pass to dpif while dumping rules.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:20:50 +02:00
Jianbo Liu
f6ed09ef55 netdev-offload-tc: Cleanup police actions with reserved indexes on startup
As the police actions with indexes of 0x10000000-0x1fffffff are
reserved for meter offload, to provide a clean environment for OVS,
these reserved police actions should be deleted on startup. So dump
all the police actions, delete those actions with indexes in this
range.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:20:37 +02:00
Jianbo Liu
4c226944f7 netdev-offload-tc: Implement meter offload API for tc
For dpif-netlink, meters are mapped by tc to police actions with
one-to-one relationship. Implement meter offload API to set/get/del
the police action, and a hmap is used to save the mappings.
An id-pool is used to manage all the available police indexes, which
are 0x10000000-0x1fffffff, reserved only for OVS.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:20:04 +02:00
Jianbo Liu
a9b8cdde69 tc: Add support parsing tc police action
Add function to parse police action from netlink message.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:16:56 +02:00
Eelco Chaudron
d632ad0aa1 odp_util: Fix parse_key_and_mask_to_match() vlan parsing.
The parse_key_and_mask_to_match() is a function to translate
a netlink formatted key/mask to match structure. And should
not consider any configuration setting when translating.

In addition we also enforce the encap_eth_type[0] mask as
it's required for the VLAN match.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-06-24 20:24:55 +02:00
Eelco Chaudron
21b76c774d netdev-offload-tc: Check for ct_state flag combinations that are not offloadable.
This patch checks for none offloadable ct_state match flag combinations.
If they exist force the +trk flag down to TC Flower

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-04-06 13:34:45 +02:00
Adrian Moreno
9e56549c2b hmap: use short version of safe loops if possible.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Eelco Chaudron
db40eb79ec netdev-offload-tc: Check for valid netdev ifindex in flow_put.
Verify that the returned ifindex by netdev_get_ifindex() is valid.
This might not be the case in the ERSPAN port scenario, which can
not be offloaded.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Chris Mi
920ec5761e tc: Keep header rewrite actions order.
Currently, tc merges all header rewrite actions into one tc pedit
action. So the header rewrite actions order is lost. Save each header
rewrite action into one tc pedit action to keep the order. And only
append one tc csum action to the last pedit action of a series.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-18 17:41:14 +01:00
Paul Blakey
f34a7626cc tc: Fix stats byte count on fragmented packets.
Fragmented packets with offset=0 are defragmented by tc act_ct, and
only when assembled pass to next action, in ovs offload case,
a goto action. Since stats are overwritten on each action dump,
only the stats for last action in the tc filter action priority
list is taken, the stats on the goto action, which count
only the assembled packets. See below for example.

Hardware updates just part of the actions (gact, ct, mirred) - those
that support stats_update() operation. Since datapath rules end
with either an output (mirred) or recirc/drop (both gact), tc rule
will at least have one action that supports it. For software packets,
the first action will have the max software packets count.
Tc dumps total packets (hw + sw) and hardware packets, then
software packets needs to be calculated from this (total - hw).

To fix the above, get hardware packets and calculate software packets
for each action, take the max of each set, then combine back
to get the total packets that went through software and hardware.

Example by running ping above MTU (ping <IP> -s 2000):
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=first),
  packets:14, bytes:19544,..., actions:ct(zone=1),recirc(0x1)
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=later),
  packets:14, bytes:28392,..., actions:ct(zone=1),recirc(0x1)

Second rule should have had bytes=14*<size of 'later' frag>, but instead
it's bytes=14*<size of assembled packets - size of 'first' + 'later'
frags>.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 16:02:45 +01:00
Paolo Valerio
63c01b8956 netdev-offload-tc: Use nl_msg_put_flag for OVS_TUNNEL_KEY_ATTR_CSUM.
When a tunnel port gets added to the bridge setting the checksum option
to true:

  ovs-vsctl add-port br0 geneve0         \
    -- set interface geneve0 type=geneve \
    options:remote_ip=<remote_ip> options:key=<key> options:csum=true

the flow dump for the outgoing traffic will include a
"bad key length 1 ..." message:

  ovs-appctl dpctl/dump-flows --names -m
  ufid:<>, ..., dp:tc,
  actions:set(tunnel(tun_id=<>,dst=<>,ttl=64,tp_dst=6081,
                     key6(bad key length 1, expected 0)(01)flags(key)))
          ,genev_sys_6081

This is due to a mismatch present between the expected length (zero
for OVS_TUNNEL_KEY_ATTR_CSUM in ovs_tun_key_attr_lens) and the
current one.

With this patch the same flow dump becomes:

  ovs-appctl dpctl/dump-flows --names -m
  ufid:<>, ..., dp:tc,
  actions:set(tunnel(tun_id=<>,dst=<>,ttl=64,tp_dst=6081,
                     flags(csum|key))),genev_sys_6081

Fixes: d9677a1f0eaf ("netdev-tc-offloads: TC csum option is not matched with tunnel configuration")
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Paolo Valerio <pvalerio@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-06-30 23:56:03 +02:00
Ilya Maximets
a1ec428037 netdev-offload: Disallow offloading to unrelated tunneling vports.
'linux_tc' flow API suitable only for tunneling vports with backing
linux interfaces. DPDK flow API is not suitable for such ports.

With this change we could drop vport restriction from dpif-netdev.

This is a prerequisite for enabling vport offloading in DPDK.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Gaetan Rivet <gaetanr@nvidia.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Tested-by: Emma Finn <emma.finn@intel.com>
Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
2021-06-24 22:22:08 +02:00
Ariel Levkovich
ea71a9d443 netdev-offload-tc: Add support for ct_state flag rel.
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-19 23:57:05 +02:00
Paul Blakey
edcfd7176f netdev-offload-tc: Add support for ct_state flags inv and rpl
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-03-15 19:38:24 +01:00
Paul Blakey
1e4aa061ac netdev-offload-tc: Probe for support for any of the ct_state flags
Upstream kernel now rejects unsupported ct_state flags.
Earlier kernels, ignored it but still echoed back the requested ct_state,
if ct_state was supported. ct_state initial support had trk, new, est,
and rel flags.

If kernel echos back ct_state, assume support for trk, new, est, and
rel. If kernel rejects a specific unsupported flag, continue and
use reject mechanisim to probe for flags rep and inv.

Disallow inserting rules with unnsupported ct_state flags.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-03-15 19:38:24 +01:00
Jianbo Liu
47e73f7f00 netdev-offload-tc: Flush rules on all chains before attach ingress block
Previously, only chain 0 is deleted before attach the ingress block.
If there are rules on the chain other than 0, these rules are not flushed.
In this case, the recreation of qdisc also fails. To fix this issue, flush
rules from all chains.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-02-24 11:07:20 +01:00
wenxu
498cf3eaed netdev-offload-tc: Reject rules with unsupported ct_state flags.
TC flower doesn't support some ct state flags such as
INVALID/SNAT/DNAT/REPLY. So it is better to reject this rule.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-02-04 20:11:12 +01:00
Maor Dickman
75e1e6fd2d lib/tc: add ICMP type and code match offload
Add TC offload support for classifying ICMPv4/6 type and code.

$ ovs-appctl dpctl/add-flow 'recirc_id(0),in_port(3),eth(),\
  eth_type(0x0800),ipv4(proto=1),icmp(type=9,code=0)' 2

$ ovs-appctl dpctl/dump-flows
  ... icmp(type=9,code=0) ...

$ tc filter show dev <ethx> ingress
  ...
  eth_type ipv4
  ip_proto icmp
  icmp_type 9
  icmp_code 0
  not_in_hw
  action order 1: mirred (Egress Redirect to device <ethy>) stolen
  ...

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-02-01 16:54:15 +01:00
Jianbo Liu
c5b4b0ce95 dpif-netlink: Fix issues of the offloaded flows counter.
The n_offloaded_flows counter is saved in dpif, and this is the first
one when ofproto is created. When flow operation is done by ovs-appctl
commands, such as, dpctl/add-flow, a new dpif is opened, and the
n_offloaded_flows in it can't be used. So, instead of using counter,
the number of offloaded flows is queried from each netdev, then sum
them up. To achieve this, a new API is added in netdev_flow_api to get
how many flows assigned to a netdev.

In order to get better performance, this number is calculated directly
from tc_to_ufid hmap for netdev-offload-tc, because flow dumping by tc
takes much time if there are many flows offloaded.

Fixes: af0618470507 ("dpif-netlink: Count the number of offloaded rules")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-12-21 20:25:59 +01:00
Roi Dayan
ecadc3a30b netdev-offload-tc: Use single 'once' variable for probing tc features
There is no need for a 'once' variable per probe.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-11-11 11:36:31 +01:00
Jianbo Liu
dd8ca104ac netdev-tc-offloads: Don't delete ufid mapping if fail to delete filter
tc_replace_flower may fail, so the return value must be checked.
If not zero, ufid can't be deleted. Otherwise the operations on this
filter may fail because its ufid is not found.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-10-23 16:20:46 +02:00
Roi Dayan
d5659751f6 tc: Use skip_hw flag when probing tc features
There is no need to pass tc rules to hw when just probing
for tc features. this will avoid redundant errors from hw drivers
that may happen.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-By: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-08-06 17:26:46 +02:00