2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-29 05:18:13 +00:00

103 Commits

Author SHA1 Message Date
Ilya Maximets
fec5424aed tc: Fix misaligned writes while parsing pedit.
Offsets within 'rewrite' action are not 4-byte aligned, so has to
be accessed carefully.

 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/tc.c:1132:17 in

 lib/tc.c:1132:17: runtime error: store to misaligned address 0x7fba215b2025
   for type 'ovs_be32' (aka 'unsigned int'), which requires 4 byte alignment

    0 0xd78857 in nl_parse_act_pedit lib/tc.c:1132:24
    1 0xd68103 in nl_parse_single_action lib/tc.c:1936:15
    2 0xd624ee in nl_parse_flower_actions lib/tc.c:2024:19
    3 0xd624ee in nl_parse_flower_options lib/tc.c:2139:12
    4 0xd5f082 in parse_netlink_to_tc_flower lib/tc.c:2187:12
    5 0xd6a2a1 in tc_replace_flower lib/tc.c:3776:19
    6 0xd2ae8f in netdev_tc_flow_put lib/netdev-offload-tc.c:2350:11
    7 0x951d07 in netdev_flow_put lib/netdev-offload.c:318:14
    8 0xcbb81a in parse_flow_put lib/dpif-netlink.c:2297:11
    9 0xcbb81a in try_send_to_netdev lib/dpif-netlink.c:2384:15
    10 0xcbb81a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
    11 0x8678ae in dpif_operate lib/dpif.c:1372:13
    12 0x6bcc89 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
    13 0x6bcc89 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
    14 0x6b7f9a in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
    15 0xb54c5a in ovsthread_wrapper lib/ovs-thread.c:422:12
    16 0x7fba2f2081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
    17 0x7fba2de39dd2 in clone (/lib64/libc.so.6+0x39dd2)

Fixes: 8ada482bbe19 ("tc: Add header rewrite using tc pedit action")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-11-02 19:47:21 +01:00
Baowen Zheng
ffcb6f115f netdev-linux: Allow meter to work in tc software datapath when tc-policy is specified
Add tc action flags when adding police action to offload meter table.

There is a restriction that the flag of skip_sw/skip_hw should be same for
filter rule and the independent created tc actions the rule uses. In this
case, if we configure the tc-policy as skip_hw, filter rule will be created
with skip_hw flag and the police action according to meter table will have
no action flag, then flower rule will fail to add to tc kernel system.

To fix this issue, we will add tc action flag when adding police action to
offload a meter table, so it will allow meter table to work in tc software
datapath.

Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-11-01 10:18:16 +01:00
Tianyu Yuan
743499607b Revert "tc: Fix stats dump when using same meter table"
This reverts commit dd9881ed55e6 ('tc: Fix stats dump when
using same meter table')

This patch doesn't solve the tc flow stats update issue and
will lead to failure of system-offloads-traffic testsuite, it
only counts packets surviving after the tc filter, rather than
hitting the filter

A following patch will come up to solve this flow stats update
issue

Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-10-31 13:55:06 +01:00
Roi Dayan
7a5ee32518 tc: On last action use drop action attribute instead of pipe
OVN is setting ct drop rule with a ct clear action.
OVS datapath behavior is if there is no forward action
the default is drop.
TC behavior is to continue with next match.
Fix to match tc to ovs behavior by setting last action
attribute as drop instead of pipe.
Also update lastused when parsing ct action.

example rule
recirc_id(0x1),in_port(2),ct_state(+trk),eth(),eth_type(0x0800),ipv4(frag=no),
packets:82, bytes:8036, used:2.108s, actions:ct_clear

Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-10-31 10:34:37 +01:00
Tianyu Yuan
dd9881ed55 tc: Fix stats dump when using same meter table
When we apply meter police on both directions of TCP traffic, the
dumped stats is shown same (as shown below). This issue is introduced
by modifying the stats update strategy.

...,in_port(6),eth(),eth_type(0x0800),ipv4(frag=no), packets:1488557,
bytes:2089059644, used:0.040s, actions:meter(0),9
...,in_port(9),eth(),eth_type(0x0800),ipv4(frag=no), packets:1488557,
bytes:2089059644, used:0.040s, actions:meter(0),6

In previous patch, after parsing police action, the flower stats will
be updated by dumped meter table stats, which will result in the issue
above.

Thus, the stats of meter table should not be used when dumping flow
stats. Ignore the stats update when police.index belongs to meter.

Fixes: a9b8cdde69de ("tc: Add support parsing tc police action")
Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com>
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2022-09-12 18:54:33 +02:00
Ilya Maximets
ff55e8f385 netdev-offload-tc: Add missing handling of the tunnel source port.
netdev_tc_flow_put() "consumes" the tunnel.tp_src value, but
it's never passed down to TC, and not parsed back.  Fix that.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:40:00 +02:00
Ilya Maximets
5d91bdf12e netdev-offload-tc: Explicitly handle mask for the tunnel destination port.
netdev_tc_flow_put() ignores the tunnel.tp_dst mask.  That results
in the exact match on the value.  TC supports the masked match on
this field and it does return the mask back during the flow dump
even if it wasn't provided initially.  OVS should correctly handle
that.  There is a problem though.  Some drivers (mlx5) doesn't
support offloading if the destination port is not an exact match [1].

Keeping the logic as-is for now, but making it explicit and somewhat
documented in the comment, so it is clear what is happening and we can
revisit this in the future.

[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20220704224505.1117988-3-i.maximets@ovn.org/#2927396

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:39:51 +02:00
Ilya Maximets
7a6c8074c5 netdev-offload-tc: Fix the mask for tunnel metadata length.
'wc.masks.tunnel.metadata.present.len' must be a mask for the same
field in the flow key, but in the tc_flower structure it's the real
length of metadata masks.

That is correctly handled for the individual opt->length, setting all
the masks to 0x1f like it's done in the tun_metadata_to_geneve_mask__(),
but not handled for the main 'len' field.

Fix that by setting the mask to 0xff like it's done during the flow
translation in xlate_actions() and during the flow dump in the
tun_metadata_from_geneve_nlattr().

Also, flower always has an exact match on the present.len field
regardless of its value and regardless of this field being masked
by OVS flow translation layer while installing the flow.  Hence,
all tunnel flows dumped from TC should have an exact match on
present.len and also UDPIF flag, because present.len doesn't make
sense without that flag.  Without the change, zero-length options
match is incorrectly reported as a wildcard match.  The side effect
though is that zero-length match on geneve options is reported even
for other tunnel types, e.g. vxlan.  But that should be fairly
harmless.  To avoid reporting a match on empty geneve options for
vxlan/etc. tunnels we'll need to check the tunnel port type, there
is no enough information in the TUNNEL attribute itself.

Extra checks and comments added around the code to better explain
what is going on.

Fixes: a468645c6d33 ("lib/tc: add geneve with option match offload")
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-19 19:39:45 +02:00
Ilya Maximets
398623a63e tc: Use sparse hex dump while printing inconsistencies.
Instead of a very long hex string something like this will be printed:

 |DBG|tc flower compare failed mask compare:
 Expected Mask:
 00000000  ff ff 00 00 ff ff ff ff-ff ff ff ff ff ff ff ff
 00000020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 03 00
 00000090  00 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff
 000000c0  ff 00 00 00 ff ff 00 00-ff ff ff ff ff ff ff ff

 Received Mask:
 00000000  ff ff 00 00 ff ff ff ff-ff ff ff ff ff ff ff ff
 00000020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 03 00
 00000090  00 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff
 000000c0  ff 00 00 00 00 00 00 00-ff ff ff ff ff ff ff ff

It's easier to spot the difference this way and count which bytes are
to blame, since offsets are printed as well.

Using a sparse dump to avoid printing huge number of all-zero lines.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-04 14:18:17 +02:00
Ilya Maximets
eacc544c4d tc: Fix misaligned access while creating pedit actions.
calc_offsets() function returns 'data' and 'mask' pointers, which
are pointers somewhere inside struct tc_flower_key, and they are not
aligned, causing misaligned memory access.  For example:

  ipv6.rewrite_hlimit is at 148 byte offset inside the struct
  tc_flower_key.  While the actual field is in the 7th byte of
  the IPv6 header in the actual packet.  So, pedit will need
  to write the last byte of the [4-7] range to the actual packet.
  So, data pointer is positioned to 145th byte inside the tc_flower_key
  with the 000000FF mask.  Obviously, 145th byte inside the structure
  is not 4-byte aligned.

 lib/tc.c:2879:34: runtime error:
   load of misaligned address 0x7f2802eaa321 for type 'ovs_be32' (aka
   'unsigned int'), which requires 4 byte alignment
   0x7f2802eaa321: note: pointer points here
     00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ...
                  ^
     0 0xd7f2fb in nl_msg_put_flower_rewrite_pedits lib/tc.c:2879:34
     1 0xd7f2fb in nl_msg_put_flower_acts lib/tc.c:3141:25
     2 0xd6ae5a in nl_msg_put_flower_options lib/tc.c:3445:12
     3 0xd6a2be in tc_replace_flower lib/tc.c:3712:17
     4 0xd2bf25 in netdev_tc_flow_put lib/netdev-offload-tc.c:2224:11
     5 0x94f6b7 in netdev_flow_put lib/netdev-offload.c:316:14
     6 0xcbd19e in parse_flow_put lib/dpif-netlink.c:2289:11
     7 0xcbd19e in try_send_to_netdev lib/dpif-netlink.c:2376:15
     8 0xcbd19e in dpif_netlink_operate lib/dpif-netlink.c:2447:23
     9 0x86536e in dpif_operate lib/dpif.c:1372:13
    10 0x6bc289 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1654:5
    11 0x6bc289 in recv_upcalls ofproto/ofproto-dpif-upcall.c:892:9
    12 0x6b766a in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:792:13
    13 0xb5015a in ovsthread_wrapper lib/ovs-thread.c:422:12
    14 0x7f280b2081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
    15 0x7f2809e39dd2 in clone (/lib64/libc.so.6+0x39dd2)

Fix misaligned read by using appropriate functions.

Fixes: 8ada482bbe19 ("tc: Add header rewrite using tc pedit action")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-14 14:58:10 +02:00
Ilya Maximets
e5162ac42a tc: Fix misaligned access to struct tcf_t for police action.
lib/tc.c:1334:44: runtime error: member access within misaligned address
  0x6210001f5a2c for type 'const struct tcf_t', which requires 8 byte alignment
  0x6210001f5a2c: note: pointer points here
      24 00 06 00 c3 00 00 00  00 00 00 00 c3 00 00 00 ...
                  ^
     0 0xd7c7ea in nl_parse_tcf lib/tc.c:1334:44
     1 0xd7bd3a in nl_parse_act_police lib/tc.c:1433:9
     2 0xd68b1a in nl_parse_single_action lib/tc.c:1922:9
     3 0xd62c7e in nl_parse_flower_actions lib/tc.c:1992:19
     4 0xd62c7e in nl_parse_flower_options lib/tc.c:2107:12
     5 0xd5fa32 in parse_netlink_to_tc_flower lib/tc.c:2155:12
     6 0xd21760 in netdev_tc_flow_dump_next lib/netdev-offload-tc.c:1158:13
     7 0x94f442 in netdev_flow_dump_next lib/netdev-offload.c:301:14
     8 0xcba2f6 in dpif_netlink_flow_dump_next lib/dpif-netlink.c:1901:20
     9 0x8665b6 in dpif_flow_dump_next lib/dpif.c:1135:9
    10 0xee5f0f in dpctl_dump_flows lib/dpctl.c:1106:12
    11 0xee27a3 in dpctl_unixctl_handler lib/dpctl.c:3035:17
    12 0xc7f78b in process_command lib/unixctl.c:310:13
    13 0xc7f78b in run_connection lib/unixctl.c:344:17
    14 0xc7f78b in unixctl_server_run lib/unixctl.c:395:21
    15 0x59acb4 in main vswitchd/ovs-vswitchd.c:130:9
    16 0x7f1be043acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2)
    17 0x47e91d in _start (vswitchd/ovs-vswitchd+0x47e91d)

Fixes: a9b8cdde69de ("tc: Add support parsing tc police action")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-14 14:58:10 +02:00
Eelco Chaudron
e4daf88a43 netdev-offload-tc: Handle check_pkt_len datapath action.
This change implements support for the check_pkt_len
action using the TC police action, which supports MTU
checking.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-07-13 17:39:22 +02:00
Jianbo Liu
218304df18 netdev-offload-tc: Offloading rules with police actions
When offloading rule, tc should be filled with police index, instead
of meter id. As meter is mapped to police action, and the mapping is
inserted into meter_id_to_police_idx hmap, this hmap is used to find
the police index. Besides, the reverse mapping between meter id and
police index is also added, so meter id can be retrieved from this
hashmap and pass to dpif while dumping rules.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:20:50 +02:00
Jianbo Liu
f6ed09ef55 netdev-offload-tc: Cleanup police actions with reserved indexes on startup
As the police actions with indexes of 0x10000000-0x1fffffff are
reserved for meter offload, to provide a clean environment for OVS,
these reserved police actions should be deleted on startup. So dump
all the police actions, delete those actions with indexes in this
range.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:20:37 +02:00
Jianbo Liu
5c039ddc64 netdev-linux: Add functions to manipulate tc police action
Add helpers to add, delete and get stats of police action with
the specified index.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:18:16 +02:00
Jianbo Liu
a9b8cdde69 tc: Add support parsing tc police action
Add function to parse police action from netlink message.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
2022-07-11 11:16:56 +02:00
Ilya Maximets
a2d202bdee tc: Fix misaligned access to stats and time values.
Pointers to gnet_stats_basic and tcf_t are not correctly aligned,
so we need to copy the data before accessing.  Found by running
check-offloads testsuite with UBsan:

 lib/tc.c:1791:50: runtime error:
   member access within misaligned address 0x61900005ce1c for type
   'const struct gnet_stats_basic', which requires 8 byte alignment
   0x61900005ce1c: note: pointer points here
      14 00 07 00 00 00 00 00  00 00 00 00 00 00 00 00  ...
                  ^
   0 0xd69044 in nl_parse_single_action lib/tc.c:1791:50
   1 0xd69044 in nl_parse_flower_actions lib/tc.c:1841:19
   2 0xd57612 in nl_parse_flower_options lib/tc.c:1893:12
   3 0xd5468d in parse_netlink_to_tc_flower lib/tc.c:1941:12
   4 0xd5a7ec in tc_replace_flower lib/tc.c:3199:19
   5 0xd2baea in probe_tc_block_support lib/netdev-offload-tc.c:2226:13
   6 0xd2baea in netdev_tc_init_flow_api lib/netdev-offload-tc.c:2279:9
   7 0x94d726 in netdev_assign_flow_api lib/netdev-offload.c:183:14
   8 0x94d726 in netdev_init_flow_api lib/netdev-offload.c:323:9
   9 0x9515c7 in netdev_ports_flow_init lib/netdev-offload.c:775:8
   10 0x9515c7 in netdev_set_flow_api_enabled lib/netdev-offload.c:815:13
   11 0x562ec8 in bridge_run vswitchd/bridge.c:3292:9
   12 0x59a98c in main vswitchd/ovs-vswitchd.c:129:9
   13 0x7fb5c583acf2 in __libc_start_main (/lib64/libc.so.6+0x3acf2)
   14 0x47e60d in _start (vswitchd/ovs-vswitchd+0x47e60d)

 lib/tc.c:1306:43: runtime error:
   member access within misaligned address 0x619000140324 for type
   'const struct tcf_t', which requires 8 byte alignment
   0x619000140324: note: pointer points here
      24 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00  ...
                  ^
   0 0xd6ee6f in nl_parse_tcf lib/tc.c:1306:43
   1 0xd6423f in nl_parse_act_mirred lib/tc.c:1389:5
   2 0xd6423f in nl_parse_single_action lib/tc.c:1747:15
   3 0xd6423f in nl_parse_flower_actions lib/tc.c:1843:19
   4 0xd57612 in nl_parse_flower_options lib/tc.c:1895:12
   5 0xd5468d in parse_netlink_to_tc_flower lib/tc.c:1943:12
   6 0xd5a7ec in tc_replace_flower lib/tc.c:3201:19
   7 0xd28ae8 in netdev_tc_flow_put lib/netdev-offload-tc.c:1969:11
   8 0x94cc97 in netdev_flow_put lib/netdev-offload.c:257:14
   9 0xcba2be in parse_flow_put lib/dpif-netlink.c:2289:11
   10 0xcba2be in try_send_to_netdev lib/dpif-netlink.c:2376:15
   11 0xcba2be in dpif_netlink_operate lib/dpif-netlink.c:2447:23
   12 0x8647ce in dpif_operate lib/dpif.c:1372:13
   13 0x6b50a6 in push_dp_ops ofproto/ofproto-dpif-upcall.c:2406:5
   14 0x6c9bcd in revalidate ofproto/ofproto-dpif-upcall.c:2792:13
   15 0x6b79b5 in udpif_revalidator ofproto/ofproto-dpif-upcall.c:980:9
   16 0xb4d5ea in ovsthread_wrapper lib/ovs-thread.c:422:12
   17 0x7ff1090081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
   18 0x7ff107c39dd2 in clone (/lib64/libc.so.6+0x39dd2)

This patch fixes only problems reported by UBsan in current tests,
there could be more issues like this not currently covered by the
testsuite.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-06-24 23:50:31 +02:00
Eelco Chaudron
6d76cfc444 netdev-offload-tc: Fix IP and port ranges in flower returns.
When programming NAT rules OVS only sets the minimum value for a
single IP/port value. However, responses from flower will always
return min == max for single IP/port values. This is causing the
verification to fail as the request is different than the response.
To avoid this, we will update the response to match the request.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
38298a877b netdev-offload-tc: Fix use of ICMP values instead of masks defines.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
a039636950 netdev-offload-tc: Always include conntrack information to tc.
Regardless of the traffic type, if requested, the conntrack information
should be included to keep the datapath and tc rules in sync.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
b4868ee163 netdev-offload-tc: Set the correct VLAN_VID and VLAN_PCP masks.
This change will set the correct VID and PCP masks, as well as the
ethernet type mask.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Eelco Chaudron
2bdf5b288c netdev-offload-tc: Add debug logs on tc rule verify failures.
This patch adds more detailed debug logs on tc verify failures to
ease debugging the actual cause after the fact.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-21 00:31:24 +01:00
Chris Mi
920ec5761e tc: Keep header rewrite actions order.
Currently, tc merges all header rewrite actions into one tc pedit
action. So the header rewrite actions order is lost. Save each header
rewrite action into one tc pedit action to keep the order. And only
append one tc csum action to the last pedit action of a series.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-18 17:41:14 +01:00
Paul Blakey
f34a7626cc tc: Fix stats byte count on fragmented packets.
Fragmented packets with offset=0 are defragmented by tc act_ct, and
only when assembled pass to next action, in ovs offload case,
a goto action. Since stats are overwritten on each action dump,
only the stats for last action in the tc filter action priority
list is taken, the stats on the goto action, which count
only the assembled packets. See below for example.

Hardware updates just part of the actions (gact, ct, mirred) - those
that support stats_update() operation. Since datapath rules end
with either an output (mirred) or recirc/drop (both gact), tc rule
will at least have one action that supports it. For software packets,
the first action will have the max software packets count.
Tc dumps total packets (hw + sw) and hardware packets, then
software packets needs to be calculated from this (total - hw).

To fix the above, get hardware packets and calculate software packets
for each action, take the max of each set, then combine back
to get the total packets that went through software and hardware.

Example by running ping above MTU (ping <IP> -s 2000):
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=first),
  packets:14, bytes:19544,..., actions:ct(zone=1),recirc(0x1)
ct_state(-trk),recirc_id(0),...,ipv4(proto=1,frag=later),
  packets:14, bytes:28392,..., actions:ct(zone=1),recirc(0x1)

Second rule should have had bytes=14*<size of 'later' frag>, but instead
it's bytes=14*<size of assembled packets - size of 'first' + 'later'
frags>.

Fixes: 576126a931cd ("netdev-offload-tc: Add conntrack support")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-04 16:02:45 +01:00
Roi Dayan
96ad83bc7b tc: Fix incorrect TC rule for decap+encap datapath flow.
A datapath flow generated for traffic from vxlan port to another vxlan port
looks like this:

 tunnel(tun_id=0x65,src=10.10.11.3,dst=10.10.11.2,ttl=0/0,tp_dst=4789,flags(+key)),
 ...,in_port(vxlan_sys_4789),...,
 actions:set(tunnel(tun_id=0x66,src=10.10.12.2,dst=10.10.12.3,tp_dst=4789,flags(key))),
          vxlan_sys_4789

The generated TC rule with explicit tunnel key unset action added after
tunnel key set action, which is wrong.

filter protocol ip pref 7 flower chain 0 handle 0x1
  dst_mac fa:16:3e:2a:4e:23
  eth_type ipv4
  ip_tos 0x0/3
  enc_dst_ip 10.10.11.2
  enc_src_ip 10.10.11.3
  enc_key_id 101
  enc_dst_port 4789
  ip_flags nofrag
  not_in_hw
        action order 1: tunnel_key  set
        src_ip 10.10.12.2
        dst_ip 10.10.12.3
        key_id 102
        dst_port 4789
        nocsum pipe
         index 1 ref 1 bind 1 installed 568 sec used 0 sec
        Action statistics:
        Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: tunnel_key  unset pipe
         index 2 ref 1 bind 1 installed 568 sec used 0 sec
        Action statistics:
        Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 3: mirred (Egress Redirect to device vxlan_sys_4789) stolen
        index 1 ref 1 bind 1 installed 568 sec used 0 sec
        Action statistics:
        Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        cookie e0c82bfd504b701428b00db6b08db3b2

Fix it by also adding the the tunnel key unset action before the tunnel
key set action and not only before output port.

Fixes: 7c53bd7839d8 ("tc: Move tunnel_key unset action before output ports")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-02-09 15:40:19 +01:00
Vlad Buslov
6bb3f363d6 tc: Set action flags for tunnel_key release.
The commit that enabled 'no_percpu' flag for compatible actions missed the
tunnel_key release action code. Add missing call to nl_msg_put_act_flags().

Fixes: 292d5bd9bb34 ("tc: Set 'no_percpu' flag for compatible actions")
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Tested-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-08-16 19:40:29 +02:00
Eelco Chaudron
464b5b13e6 netdev-offload-tc: Verify the flower rule installed.
When OVS installs the flower rule, it only checks for the OK from the
kernel. It does not check if the rule requested matches the one
actually programmed. This change will add this check and warns the
user if this is not the case.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-09 19:54:56 +02:00
Jianbo Liu
47e73f7f00 netdev-offload-tc: Flush rules on all chains before attach ingress block
Previously, only chain 0 is deleted before attach the ingress block.
If there are rules on the chain other than 0, these rules are not flushed.
In this case, the recreation of qdisc also fails. To fix this issue, flush
rules from all chains.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-02-24 11:07:20 +01:00
Eelco Chaudron
0f252daa94 tc: Fix mpls bottom of stack bit mask reporting.
Fix the reported back value of the bos mask used by the revalidator
threads.

Fixes: 34b1695506f8 ("lib/tc: add single mpls match offload support")
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-02-02 19:30:58 +01:00
Maor Dickman
75e1e6fd2d lib/tc: add ICMP type and code match offload
Add TC offload support for classifying ICMPv4/6 type and code.

$ ovs-appctl dpctl/add-flow 'recirc_id(0),in_port(3),eth(),\
  eth_type(0x0800),ipv4(proto=1),icmp(type=9,code=0)' 2

$ ovs-appctl dpctl/dump-flows
  ... icmp(type=9,code=0) ...

$ tc filter show dev <ethx> ingress
  ...
  eth_type ipv4
  ip_proto icmp
  icmp_type 9
  icmp_code 0
  not_in_hw
  action order 1: mirred (Egress Redirect to device <ethy>) stolen
  ...

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-02-01 16:54:15 +01:00
wenxu
71d0c0d8b4 lib/tc: fix parse act pedit for tos rewrite
Check overlap between current pedit key, which is always 4 bytes
(range [off, off + 3]), and a map entry in flower_pedit_map
sf = ROUND_DOWN(mf, 4) (range [sf|mf, (mf + sz - 1)|ef]).

So for the tos the rewite the off + 3(3) is greater than mf,
and should less than ef(4) but not mf+sz(2).

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-12-04 07:33:14 +01:00
Roi Dayan
d5659751f6 tc: Use skip_hw flag when probing tc features
There is no need to pass tc rules to hw when just probing
for tc features. this will avoid redundant errors from hw drivers
that may happen.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-By: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-08-06 17:26:46 +02:00
wenxu
0fc38297ba lib/tc: only update the stats for non-empty counter
A packet with first frag and execute act_ct action.
The packet will stole by defrag. So the stats counter
for "gact action goto chain" will always 0. The openvswitch
update each action in order. So the flower stats finally
alway be zero. The rule will be delete adter max-idle time
even there are packet executing the action.

ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(1),eth_type(0x0800),ipv4(dst=11.0.0.7,frag=first), packets:0, bytes:0, used:5.390s, actions:ct(zone=1,nat),recirc(0x4)

filter protocol ip pref 2 flower chain 0 handle 0x2
  eth_type ipv4
  dst_ip 1.1.1.1
  ip_flags frag/firstfrag
  skip_hw
  not_in_hw
 action order 1: ct zone 1 nat pipe
  index 2 ref 1 bind 1 installed 11 sec used 1 sec
 Action statistics:
 Sent 15000 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 cookie e04106c2ac41769b278edaa9b5309960

 action order 2: gact action goto chain 1
  random type none pass val 0
  index 2 ref 1 bind 1 installed 11 sec used 11 sec
 Action statistics:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 cookie e04106c2ac41769b278edaa9b5309960

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-07-01 10:00:15 +02:00
Roi Dayan
1fe4297563 netdev-offload-tc: Revert tunnel src/dst port masks handling
The cited commit intended to add tc support for masking tunnel src/dst
ips and ports. It's not possible to do tunnel ports masking with
openflow rules and the default mask for tunnel ports set to 0 in
tnl_wc_init(), unlike tunnel ports default mask which is full mask.
So instead of never passing tunnel ports to tc, revert the changes
to tunnel ports to always pass the tunnel port.
In sw classification is done by the kernel, but for hw we must match
the tunnel dst port.

Fixes: 5f568d049130 ("netdev-offload-tc: Allow to match the IP and port mask of tunnel")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-06-19 08:51:11 +02:00
Tonghao Zhang
a3db6e473d netdev-offload-tc: Allow installing arp rules to TC dp.
This patch allows to install arp rules to tc dp.
In the future, arp will be offloaded to hardware to
be processed. So OvS enable this now.

$ ovs-appctl dpctl/add-flow 'recirc_id(0),in_port(3),eth(),\
  eth_type(0x0806),arp(op=2,tha=00:50:56:e1:4b:ab,tip=10.255.1.116)' 2

$ ovs-appctl dpctl/dump-flows
  ... arp(tip=10.255.1.116,op=2,tha=00:50:56:e1:4b:ab) ...

$ tc filter show dev <ethx> ingress
  ...
  eth_type arp
  arp_tip 10.255.1.116
  arp_op reply
  arp_tha 00:50:56:e1:4b:ab
  not_in_hw
    action order 1: mirred (Egress Redirect to device <ethy>) stolen
    ...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-06-08 11:08:05 +02:00
Vlad Buslov
5db012c4ac tc: Support new terse dump kernel API
When dumping flows in terse mode set TCA_DUMP_FLAGS attribute to
TCA_DUMP_FLAGS_TERSE flag to prevent unnecessary copying of data between
kernel and user spaces. Only expect kernel to provide cookie, stats and
flags when dumping filters in terse mode.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-06-05 10:14:27 +02:00
Tonghao Zhang
5f568d0491 netdev-offload-tc: Allow to match the IP and port mask of tunnel
This patch allows users to offload the TC flower rules with
tunnel mask. This patch allows masked match of the following,
where previously supported an exact match was supported:
* Remote (dst) tunnel endpoint address
* Local (src) tunnel endpoint address
* Remote (dst) tunnel endpoint UDP port

And also allows masked match of the following, where previously
no match was supported:
* Local (src) tunnel endpoint UDP port

In some case, mask is useful as wildcards. For example, DDOS,
in that case, we don’t want to allow specified hosts IPs or
only source Ports to access the targeted host. For example:

$ ovs-appctl dpctl/add-flow "tunnel(dst=2.2.2.100,src=2.2.2.0/255.255.255.0,tp_dst=4789),\
  recirc_id(0),in_port(3),eth(),eth_type(0x0800),ipv4()" ""

$ tc filter show dev vxlan_sys_4789 ingress
  ...
  eth_type ipv4
  enc_dst_ip 2.2.2.100
  enc_src_ip 2.2.2.0/24
  enc_dst_port 4789
  enc_ttl 64
  in_hw in_hw_count 2
	action order 1: gact action drop
    ...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-06-03 09:56:07 +02:00
Tonghao Zhang
4f4be08e47 netdev-offload-tc: Use ipv6_addr_is_set instead of is_all_zeros
Not bugfix, make the codes more readable.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2020-06-03 09:53:30 +02:00
Paul Blakey
2bf6ffb76a netdev-offload-tc: Add conntrack nat support
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:41 +01:00
Paul Blakey
9221c721be netdev-offload-tc: Add conntrack label and mark support
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
Paul Blakey
576126a931 netdev-offload-tc: Add conntrack support
Zone and ct_state first.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
Paul Blakey
b2ae40690e netdev-offload-tc: Add recirculation support via tc chains
Each recirculation id will create a tc chain, and we translate
the recirculation action to a tc goto chain action.

We check for kernel support for this by probing OvS Datapath for the
tc recirc id sharing feature. If supported, we can offload rules
that match on recirc_id, and recirculation action safely.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
Paul Blakey
7c53bd7839 tc: Move tunnel_key unset action before output ports
Since OvS datapath gets packets already decapsulated from tunnel devices,
it doesn't explicitly decapsulate them. So in a recirculation setup,
the tunnel matching continues in the recirculation as the tunnel
metadata still exists on the SKB.

Tunnel key unset action unsets this metadata. Some drivers might rely
on this explicit tunnel key unset to know when to decapsulate the packet
instead of the device type. So instead of removing it completly,
we move it near the output actions.

This way, we also keep SKB metadata through recirculation, and for
non-recirculation rules, the resulting tc rules should remain the same.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
Paul Blakey
acdd544c4c tc: Introduce tcf_id to specify a tc filter
Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
Vlad Buslov
292d5bd9bb tc: Set 'no_percpu' flag for compatible actions
Recent changes in Linux kernel TC action subsystem introduced new
TCA_ACT_FLAGS_NO_PERCPU_STATS flag. The purpose of the flag is to request
action implementation to skip allocating action stats with expensive percpu
allocator and use regular built-in action stats instead. Such approach
significantly improves rule insertion rate and reduce memory usage for
hardware-offloaded rules that don't need benefits provided by percpu
allocated stats (improved software TC fast-path performance). Set the flag
for all compatible actions.

Modify acinclude.m4 to use OVS-internal pkt_cls.h implementation when
TCA_ACT_FLAGS is not defined by kernel headers and to manually define
struct nla_bitfield32 in netlink.h (new file) when it is not defined by
kernel headers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-11-11 14:42:52 +01:00
Chris Mi
d0fbb09f90 tc: Limit the max action number to 16
Currently, ovs supports to offload max TCA_ACT_MAX_PRIO(32) actions.
But net sched api has a limit of 4K message size which is not enough
for 32 actions when echo flag is set.

After a lot of testing, we find that 16 actions is a reasonable number.
So in this commit, we introduced a new define to limit the max actions.

Fixes: 0c70132cd288("tc: Make the actions order consistent")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-10-18 10:56:44 +02:00
John Hurley
a8f005cf26 ovs-tc: offload MPLS set actions to TC datapath
Recent modifications to TC allows the modifying of fields within the
outermost MPLS header of a packet. OvS datapath rules impliment an MPLS
set action by supplying a new MPLS header that should overwrite the
current one.

Convert the OvS datapath MPLS set action to a TC modify action and allow
such rules to be offloaded to a TC datapath.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-08-01 18:09:42 +02:00
John Hurley
283dcf850d ovs-tc: offload MPLS push actions to TC datapath
TC can now be used to push an MPLS header onto a packet. The MPLS label is
the only information that needs to be passed here with the rest reverting
to default values if none are supplied. OvS, however, gives the entire
MPLS header to be pushed along with the MPLS protocol to use. TC can
optionally accept these values so can be made replicate the OvS datapath
rule.

Convert OvS MPLS push datapath rules to TC format and offload to a TC
datapath.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-08-01 18:09:42 +02:00
John Hurley
55412eac69 ovs-tc: offload MPLS pop actions to TC datapath
TC now supports an action to pop the outer MPLS header from a packet. The
next protocol after the header is required alongside this. Currently, OvS
datapath rules also supply this information.

Offload OvS MPLS pop actions to TC along with the next protocol.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-08-01 18:09:42 +02:00
John Hurley
95255018a8 ovs-tc: allow offloading TC rules to egress qdiscs
Offloading rules to a TC datapath only allows the creating of ingress hook
qdiscs and the application of filters to these. However, there may be
certain situations where an egress qdisc is more applicable (e.g. when
offloading to TC rules applied to OvS internal ports).

Extend the TC API in OvS to allow the creation of egress qdiscs and to add
or interact with flower filters applied to these.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-04-09 17:34:07 +02:00