2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

371 Commits

Author SHA1 Message Date
Dumitru Ceara
c8c49a9db9 netdev-linux: Properly access 32-bit aligned rtnl_link_stats64 structs.
Detected by UB Sanitizer when running system tests:

  lib/netdev-linux.c:6250:26:
  runtime error: member access within misaligned address 0x00000229a204
  for type 'const struct rpl_rtnl_link_stats64', which requires 8 byte
  alignment
  0x00000229a204: note: pointer points here
    c4 00 17 00 01 00 00 00  00 00 00 00 01 00 00 00
                ^
    00 00 00 00 6e 00 00 00  00 00 00 00 6e 00 00 00

  0  0x89f10e in netdev_stats_from_rtnl_link_stats64 lib/netdev-linux.c:6250
  1  0x89f10e in get_stats_via_netlink lib/netdev-linux.c:6298
  2  0x8a039a in netdev_linux_get_stats lib/netdev-linux.c:2227
  3  0x68e149 in netdev_get_stats lib/netdev.c:1599
  4  0x407b21 in iface_refresh_stats vswitchd/bridge.c:2687
  5  0x419eb6 in iface_create vswitchd/bridge.c:2134
  6  0x419eb6 in bridge_add_ports__ vswitchd/bridge.c:1170
  7  0x41f71c in bridge_add_ports vswitchd/bridge.c:1181
  8  0x41f71c in bridge_reconfigure vswitchd/bridge.c:898
  9  0x429f59 in bridge_run vswitchd/bridge.c:3331
  10 0x430af3 in main vswitchd/ovs-vswitchd.c:129
  11 0x7fbdfd43eb74 in __libc_start_main (/lib64/libc.so.6+0x27b74)
  12 0x4072fd in _start (/root/ovs/vswitchd/ovs-vswitchd+0x4072fd)

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-05-17 23:10:20 +02:00
Adrian Moreno
9e56549c2b hmap: use short version of safe loops if possible.
Using SHORT version of the *_SAFE loops makes the code cleaner and less
error prone. So, use the SHORT version and remove the extra variable
when possible for hmap and all its derived types.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Mike Pattrick
eb1ab5357b netdev-linux: Use matchall classifier for ingress policing.
Currently ingress policing uses the basic classifier to apply traffic
control filters if hardware offload is not enabled, in which case it
uses matchall. This change changes the behavior to always use matchall,
and fall back onto basic if the kernel is built without matchall
support.

The system tests are modified to allow either basic or matchall
classification on the ingestion filter, and to allow either 10000 or
10240 packets for the packet burst filter. 10000 is accurate for kernel
5.14 and the most recent iproute2, however, 10240 is left for
compatibility with older kernels.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Mike Pattrick <mkp@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-12 15:23:29 +01:00
Yunjian Wang
849a40ccfb netdev-linux: Fix a null pointer dereference in netdev_linux_notify_sock().
If nl_sock_join_mcgroup() returns an error, the 'sock' is freed and
set to NULL. This issues will lead to null pointer deference in
nl_sock_listen_all_nsid(). To fix it, we call nl_sock_listen_all_nsid()
before joining the mcgroups.

Fixes: cf114a7fce80 ("netlink linux: enable listening to all nsids")
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-09-16 01:19:39 +02:00
Yong Xu
d2e97030ed netdev-linux: fix compile error in nl_msg_put_act_police
Use 'memset' to init memory to 0.

This resolves a build problem with clang on Ubuntu 16.04 on ARM (in Travis):

libtool: compile:  clang -DHAVE_CONFIG_H -I. -I ./include -I ./include
-I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare
-Wpointer-arith -Wformat -Wformat-security -Wswitch-enum
-Wunused-parameter -Wbad-function-cast -Wcast-align
-Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes
-Wmissing-field-initializers -Wthread-safety -fno-strict-aliasing
-Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument
-Wshift-negative-value -Qunused-arguments -Wshadow
-Warray-bounds-pointer-arithmetic -Werror -Werror -g -O2
-Wno-error=unused-command-line-argument -DHAVE_AVX512F -MT
lib/netlink-conntrack.lo -MD -MP -MF lib/.deps/netlink-conntrack.Tpo
-c lib/netlink-conntrack.c -o lib/netlink-conntrack.o
lib/netdev-linux.c:2638:38: error: missing field 'action' initializer
[-Werror,-Wmissing-field-initializers]
    struct tc_police null_police = {0};
                                     ^
1 error generated.
make[2]: *** [lib/netdev-linux.lo] Error 1
make[2]: *** Waiting for unfinished jobs....

Fixes: c2567e533 ("add port-based ingress policing based packet-per-second rate-limiting")
Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-07-13 13:32:17 +02:00
Eelco Chaudron
b0d289bb5a netdev-linux: Ignore TSO packets when TSO is not enabled for userspace.
When TSO is disabled from a userspace forwarding datapath perspective,
but TSO has been wrongly enabled on the kernel side, log a warning
message, and drop the packet. With the current implementation,
OVS will crash.

[i.maximets]:

The call stack looks like this:

  0  dp_packet_set_size (b=0x0, b=0x0, v=13028) at lib/dp-packet.h:578
  1  netdev_linux_batch_rxq_recv_sock at lib/netdev-linux.c:1310
  2  netdev_linux_rxq_recv at lib/netdev-linux.c
  3  netdev_rxq_recv at lib/netdev.c
  4  dp_netdev_process_rxq_port at lib/dpif-netdev.c

The problem is that the code assumes that (mmsgs[i].msg_len > std_len)
can only be true if userpace-tso is enabled and additional buffers are
provided to the kernel.  However, since recvmmsg() is called with
MSG_TRUNC, the resulting msg_len reflects the original packet size
before truncation, and it can be larger than the buffer if TSO / GRO
is enabled on the network interface.  If TSO support for user space is
not enabled in OVS, the aux_bufs are not allocated and are left NULL,
resulting in a crash.

Fixes: 73858f9dbe83 ("netdev-linux: Prepend the std packet in the TSO packet")
Fixes: 2109841b7984 ("Use batch process recv for tap and raw socket in netdev datapath")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-07-09 21:33:41 +02:00
Yong Xu
c2567e533f add port-based ingress policing based packet-per-second rate-limiting
OVS has support for using policing to enforce a rate limit in
kilobits per second. This is configured using OVSDB. f.e.

$ ovs-vsctl set interface tap0 ingress_policing_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_burst=100

This patch adds a related feature, allowing policing to enforce a rate
limit in kilo-packets per second. This is also configured using OVSDB.

$ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000
$ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100

The kilo-bit and kilo-packet rate limits may be used separately or in
combination.

Add separate action for BPS and PPS in netlink message.

Revise code and change action result to pipe to allow
traffic pipe into second action.

This patch implements the feature for:
* OVSDB (northbound API)
* TC policer when used both with and without TC offload (kernel API)

Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2021-07-01 20:44:07 +02:00
Ilya Maximets
577b9a8169 netdev-linux: Fix use of uninitialized LAG master name.
'if_indextoname' may fail leaving the 'master_name' uninitialized:

 Conditional jump or move depends on uninitialised value(s)
    at 0x4C34329: strlen (vg_replace_strmem.c:459)
    by 0x51C638: hash_string (hash.h:342)
    by 0x51C638: hash_name (shash.c:28)
    by 0x51CC51: shash_find (shash.c:231)
    by 0x51CD38: shash_find_data (shash.c:245)
    by 0x4A797F: netdev_from_name (netdev.c:2013)
    by 0x544148: netdev_linux_update_lag (netdev-linux.c:676)
    by 0x544148: netdev_linux_run (netdev-linux.c:769)
    by 0x4A5997: netdev_run (netdev.c:186)
    by 0x40752B: main (ovs-vswitchd.c:129)
  Uninitialised value was created by a stack allocation
    at 0x543AFA: netdev_linux_run (netdev-linux.c:722)

Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
2021-05-24 20:21:56 +02:00
Michal Kazior
d90b4f2928 rtnetlink: ignore IFLA_WIRELESS events.
Some older wireless drivers - ones relying on the
old and long deprecated wireless extension ioctl
system - can generate quite a bit of IFLA_WIRELESS
events depending on their configuration and
runtime conditions. These are delivered as
RTNLGRP_LINK via RTM_NEWLINK messages.

These tend to be relatively easily identifiable
because they report the change mask being 0. This
isn't guaranteed but in practice it shouldn't be a
problem. None of the wireless events that I ever
observed actually carry any unique information
about netdev states that ovs-vswitchd is
interested in. Hence ignoring these shouldn't
cause any problems.

These events can be responsible for a significant
CPU churn as ovs-vswitchd attempts to do plenty of
work for each and every netlink message regardless
of what that message carries. On low-end devices
such as consumer-grade routers these can lead to a
lot of CPU cycles being wasted, adding up to heat
output and reducing performance.

It could be argued that wireless drivers in
question should be fixed, but that isn't exactly a
trivial thing to do. Patching ovs seems far more
viable while still making sense.

Signed-off-by: Michal Kazior <michal@plume.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-04-20 00:00:22 +02:00
Yong.Xu
67e0e0bc15 netdev-linux: correct unit of burst parameter
Correct calculation of burst parameter used when configuring TC policer
action for ingress port-based policing in the case where TC offload is in
use. This now matches the value calculated for the case where TC offload is
not in use.

The division by 8 is to convert from bits to bytes.
Its unclear why 64 was previously used.

Fixes: e7f6ba220 ("lib/tc: add ingress ratelimiting support for tc-offload")
Signed-off-by: Yong Xu <yong.xu@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
2021-04-13 15:36:35 +02:00
William Tu
e7df370cff netdev-linux: Fix indentation.
Remove one extra space. No actual code logic changed.

Fixes: 2109841b79845 ("Use batch process recv for tap and raw socket in netdev datapath")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2021-02-19 18:40:25 +01:00
Ben Pfaff
91fc374a9c Eliminate use of term "slave" in bond, LACP, and bundle contexts.
The new term is "member".

Most of these changes should not change user-visible behavior.  One
place where they do is in "ovs-ofctl dump-flows", which will now output
"members:..." inside "bundle" actions instead of "slaves:...".  I don't
expect this to cause real problems in most systems.  The old syntax
is still supported on input for backward compatibility.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
2020-10-21 11:28:24 -07:00
Yi-Hung Wei
fd4d477760 netdev-linux: Fix broken build on Ubuntu 14.04
Patch 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support") uses
__virtio16 which is defined in kernel 3.19.  Ubuntu 14.04 is using 3.13
kernel that lacks the virtio_types definition.  This patch fixes that.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-07-08 11:51:15 -07:00
Aaron Conole
7a076a5371 netdev-linux: Update LAG in all cases.
In some cases, when processing a netlink change event, it's possible for
an alternate part of OvS (like the IPv6 endpoint processing) to hold an
active netdev interface.  This creates a race-condition, where sometimes
the OvS change processing will take the normal path.  This doesn't work
because the netdev device object won't actually be enslaved to the
ovs-system (for instance, a linux bond) and ingress qdisc entries will
be missing.

To address this, we update the LAG information in ALL cases where
LAG information could come in.

Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
Cc: Marcelo Leitner <mleitner@redhat.com>
Cc: John Hurley <john.hurley@netronome.com>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-05-16 13:15:50 +02:00
William Tu
5119cfe32d netdev-afxdp: Fix missing init.
When introducing the interrupt mode for netdev-afxdp, the netdev
init function is accidentally removed.  Fix it by adding it back.

Fixes: 5bfc519fee499 ("netdev-afxdp: Add interrupt mode netdev class.")
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-05-04 16:13:29 -07:00
Jiang Lidong
b9f825a544 netdev-linux: remove sum of vport stats and kernel netdev stats.
When using kernel veth as OVS interface, doubled drop counter
value is shown when veth drops packets due to traffic overrun.

In netdev_linux_get_stats, it reads both vport stats and kernel
netdev stats, in case vport stats retrieve failure. If both of
them success, error counters are added to include errors from
different layers. But implementation of ovs_vport_get_stats in
kernel data path has included kernel netdev stats by calling
dev_get_stats. When drop or other error counters is not zero,
its value is doubled by netdev_linux_get_stats.

In this change, adding kernel netdev stats into vport stats
is removed, since vport stats includes all information of
kernel netdev stats.

Signed-off-by: Jiang Lidong <jianglidong3@jd.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-04-30 07:35:54 -07:00
William Tu
5bfc519fee netdev-afxdp: Add interrupt mode netdev class.
The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp
interrupt mode. This is similar to 'type=afxdp', except that the
is_pmd field is set to false. As a result, the packet processing
is handled by main thread, not pmd thread. This avoids burning
the CPU to always 100% when there is no traffic.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-04-28 17:58:31 +02:00
Usman Ansari
fed4282c53 netdev-linux.c: Fix coverity unreachable code warning
Coverity reports unreachable code in "?" statement
Fixed by removing code segment and unused variables & defines

Signed-off-by: Usman Ansari <ua1422@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-04-06 11:58:35 -07:00
Flavio Leitner
6211ad5708 netdev-linux: Enable TSO in the TAP device.
Use ioctl TUNSETOFFLOAD if kernel supports to enable TSO
offloading in the tap device.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Reported-by: "Yi Yang (杨�D)-云服务集团" <yangyi01@inspur.com>
Tested-by: William Tu <u9012063@gmail.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-03-02 09:50:17 -08:00
Flavio Leitner
35b5586ba7 userspace TSO: SCTP checksum offload optional.
Ideally SCTP checksum offload needs be advertised by the
NIC when userspace TSO is enabled. However, very few drivers
do that and it's not a widely used protocol. So, this patch
enables SCTP checksum offload if available, otherwise userspace
TSO can still be enabled but SCTP packets will be dropped on
NICs without support.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
8c5163fe81 userspace TSO: Include UDP checksum offload.
Virtio doesn't expose flags to control which protocols checksum
offload needs to be enabled or disabled. This patch checks if the
NIC supports UDP checksum offload and active it when TSO is enabled.

Reported-by: Ilya Maximets <i.maximets@ovn.org>
Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-02-26 15:24:15 +01:00
Flavio Leitner
73858f9dbe netdev-linux: Prepend the std packet in the TSO packet
Usually TSO packets are close to 50k, 60k bytes long, so to
to copy less bytes when receiving a packet from the kernel
change the approach. Instead of extending the MTU sized
packet received and append with remaining TSO data from
the TSO buffer, allocate a TSO packet with enough headroom
to prepend the std packet data.

Fixes: 29cf9c1b3b9c ("userspace: Add TCP Segmentation Offload support")
Suggested-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-02-06 11:37:23 -08:00
William Tu
105cf8df82 netdev-linux: Detect numa node id.
The patch detects the numa node id from the name of the netdev,
by reading the '/sys/class/net/<devname>/device/numa_node'.
If not available, ex: virtual device, or any error happens,
return numa id 0.  Currently only the afxdp netdev type uses it,
other linux netdev types are disabled due to no use case.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-18 01:42:22 +01:00
Flavio Leitner
29cf9c1b3b userspace: Add TCP Segmentation Offload support
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhostuser interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhostuser interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

It is recommended to check if NIC hardware supports TSO before enabling
the feature, which is off by default. For additional information please
check the tso.rst document.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Ciara Loftus <ciara.loftus.intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2020-01-17 22:27:25 +00:00
Yi Yang
2109841b79 Use batch process recv for tap and raw socket in netdev datapath
Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock
just receive single packet, that is very inefficient, per my test
case which adds two tap ports or veth ports into OVS bridge
(datapath_type=netdev) and use iperf3 to do performance test
between two ports (they are set into different network name space).

The result is as below:

  tap:  295 Mbits/sec
  veth: 207 Mbits/sec

After I change netdev_linux_rxq_recv_tap and
netdev_linux_rxq_recv_sock to use batch process, the performance
is boosted by about 7 times, here is the result:

  tap:  1.96 Gbits/sec
  veth: 1.47 Gbits/sec

Undoubtedly this is a huge improvement although it can't match
OVS kernel datapath yet.

FYI: here is thr result for OVS kernel datapath:

  tap:  37.2 Gbits/sec
  veth: 36.3 Gbits/sec

Note: performance result is highly related with your test machine,
you shouldn't expect the same results on your test machine.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2020-01-09 09:48:49 -08:00
Paul Blakey
acdd544c4c tc: Introduce tcf_id to specify a tc filter
Move all that is needed to identify a tc filter to a
new structure, tcf_id. This removes a lot of duplication
in accessing/creating tc filters.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-12-22 11:54:40 +01:00
William Tu
7bf075d95a netdev-afxdp: Enable libbpf logging to OVS.
libbpf has pr_warn, pr_info, and pr_debug. The patch registers
these print functions, integrating the libbpf logs to OVS log.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-21 09:20:10 -08:00
Eelco Chaudron
52b5a5c0a3 netdev-afxdp: add afxdp specific maximum MTU check
Drivers natively supporting AF_XDP will check that a configured MTU size
will not exceed the allowed size for AF_XDP. However, when the skb
compatibility mode is used there is no check and any value is accepted.
This, for example, is the case when using the TAP interface.

This fix adds a check to make sure only AF_XDP valid values are excepted.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2019-11-19 11:20:49 -08:00
Ilya Maximets
d560bc1baa netdev-afxdp: Convert AFXDP_DEBUG to custom stats.
These are valid statistics of a network interface and should be
exposed via custom stats.

The same MACRO trick as in vswitchd/bridge.c is used to reduce code
duplication and easily add new stats if necessary in the future.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-24 19:22:05 +03:00
Ilya Maximets
f627cf1dd9 netdev-afxdp: Fix use of unconfigured device.
In case of failure of 'xsk_configure_all()', 'n_rxq' and 'xdpmode'
will remain in a new state. This will result in successful
reconfiguration (immediate return, because configuration is already
applied) if 'netdev_reconfigure()' will be called again.

Same issue was fixed previously for netdev-dpdk using 'dev->started'
flag in commit:
606f66507250 ("netdev-dpdk: Don't use PMD driver if not configured successfully")

Let's use similar approach with checking the 'dev->xsks' which only
exists if configuration was successful.

Additionally implemented 'netdev_afxdp_construct()' function to
explicitly initialize all the specific fields and request the
reconfiguration.

CC: William Tu <u9012063@gmail.com>
Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-23 10:35:29 +03:00
William Tu
0de1b42596 netdev-afxdp: add new netdev type for AF_XDP.
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-19 17:42:06 +03:00
Ilya Maximets
5fc5c50f3d netdev: Dynamic per-port Flow API.
Current issues with Flow API:

* OVS calls offloading functions regardless of successful
  flow API initialization. (ex. on init_flow_api failure)
* Static initilaization of Flow API for a netdev_class forbids
  having different offloading types for different instances
  of netdev with the same netdev_class. (ex. different vports in
  'system' and 'netdev' datapaths at the same time)

Solution:

* Move Flow API from the netdev_class to netdev instance.
* Make Flow API dynamic, i.e. probe the APIs and choose the
  suitable one.

Side effects:

* Flow API providers localized as possible in their modules.
* Now we have an ability to make runtime checks. For example,
  we could check if particular device supports features we
  need, like if dpdk device supports RSS+MARK action.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Tonghao Zhang
718be50dae netdev-linux: Add coverage counters for netdev_set_policing when ingress tc-offload
When enable tc-offload, we should add coverage counters for netdev_set_policing.

Fixes: e7f6ba220e10 ("lib/tc: add ingress ratelimiting support for tc-offload")
Cc: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-22 10:06:41 -07:00
Flavio Leitner
23fa50f64b netlink linux: fix to append the netnsid netlink attr.
The attribute was being prepended to the netlink buffer, but
the function  nl_sock_transact_multiple__() expects to find the
netlink header as first to update the length, seq and pid fields.

This patch fixes to append the attribute instead of prepending it.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-16 15:48:59 -07:00
Flavio Leitner
b43762a5ad netlink linux: account for the netnsid netlink attr.
The buffer needs to be reallocated and data copied when
the netnsid netlink attribute is included, so avoid that
by accounting the attribute when the buffer is initially
allocated.

Fixes: 756819ddd788 ("netdev-linux: use netlink to update netdev.")
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-04-16 15:48:50 -07:00
John Hurley
608ff46aaf ovs-tc: offload datapath rules matching on internal ports
Rules applied to OvS internal ports are not represented in TC datapaths.
However, it is possible to support rules matching on internal ports in TC.
The start_xmit ndo of OvS internal ports directs packets back into the OvS
kernel datapath where they are rematched with the ingress port now being
that of the internal port. Due to this, rules matching on an internal port
can be added as TC filters to an egress qdisc for these ports.

Allow rules applied to internal ports to be offloaded to TC as egress
filters. Rules redirecting to an internal port are also offloaded. These
are supported by the redirect ingress functionality applied in an earlier
patch.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-04-10 13:55:59 +02:00
John Hurley
95255018a8 ovs-tc: allow offloading TC rules to egress qdiscs
Offloading rules to a TC datapath only allows the creating of ingress hook
qdiscs and the application of filters to these. However, there may be
certain situations where an egress qdisc is more applicable (e.g. when
offloading to TC rules applied to OvS internal ports).

Extend the TC API in OvS to allow the creation of egress qdiscs and to add
or interact with flower filters applied to these.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-04-09 17:34:07 +02:00
Sharon K
2f564bb153 netdev-linux: netem QoS support
Signed-off-by: Sharon Krendel <thekafkaf@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-03-14 19:20:19 -07:00
Roi Dayan
cae643534e netdev-linux: Remove ingress qdisc before trying to add shared block
Adding shared ingress block with ingress qdisc already exists results
in a failure. So remove the ingress qdisc first.
Also while at it log the slave name.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-03-12 10:18:29 +01:00
Pieter Jansen van Vuuren
e7f6ba220e lib/tc: add ingress ratelimiting support for tc-offload
Firstly this patch introduces the notion of reserved priority, as the
filter implementing ingress policing would require the highest priority.
Secondly it allows setting rate limiters while tc-offloads has been
enabled. Lastly it installs a matchall filter that matches all traffic
and then applies a police action, when configuring an ingress rate
limiter.

An example of what to expect:

OvS CLI:
ovs-vsctl set interface <netdev_name> ingress_policing_rate=5000
ovs-vsctl set interface <netdev_name> ingress_policing_burst=100

Resulting TC filter:
filter protocol ip pref 1 matchall chain 0
filter protocol ip pref 1 matchall chain 0 handle 0x1
  not_in_hw
	action order 1:  police 0x1 rate 5Mbit burst 125Kb mtu 64Kb
action drop/continue overhead 0b
        ref 1 bind 1 installed 3 sec used 3 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.200 () port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384  16384    60.13       4.49

ovs-vsctl list interface <netdev_name>
_uuid               : 2ca774e8-8b95-430f-a2c2-f8f742613ab1
admin_state         : up
...
ingress_policing_burst: 100
ingress_policing_rate: 5000
...
type                : ""

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2019-03-04 17:22:34 +01:00
Ben Pfaff
61265c03f0 netdev-linux: Fix function argument order in sfq_tc_load().
sfq_install__() takes quantum before perturb.

Acked-by: Justin Pettit <jpettti@ovn.org>
Reported-by: shaoke xi <xishaoke.xsk@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-01-17 16:31:44 -08:00
Ben Pfaff
64ed99ffbc netdev-linux: Don't include <net/if_packet.h>.
This header only defines sockaddr_pkt, which this source file doesn't use.

This was the only user of net/if_packet.h, so also remove the
configure-time test for it (which netdev-linux wasn't using anyway).

Reported-by: Andre McCurdy <armccurdy@gmail.com>
Reported-at: https://github.com/openvswitch/ovs/pull/253
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-10-03 16:55:55 -07:00
Andre McCurdy
b24751fff8 netdev-linux: use unsigned int for ifi_flags temporary variables
ifi_flags in struct netdev_linux is an unsigned int, therefore use
unsigned int for variables which will hold ifi_flags values.

Signed-off-by: Andre McCurdy <armccurdy@gmail.com>
2018-10-02 15:39:35 -07:00
Ben Pfaff
89c09c1cd1 netdev: Clean up class initialization.
The macros are hard to read.  This makes it a little more readable.

Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-08-27 17:48:23 +01:00
Ben Pfaff
1bab4901c4 netdev-linux: Avoid division by 0 if kernel reports bad scheduler data.
If the kernel reported a value of 0 for the second value in
/proc/net/psched, it would cause a division-by-zero fault in
read_psched().  I don't know of a kernel that would actually do that, but
it's still better to be safe.

Found by clang static analyzer.

Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
2018-08-20 09:30:11 -07:00
Tiago Lam
e3b5d7c536 netdev-linux: Fix segfault in update_lag().
A bissect shows that commit d22f892 ("netdev-linux: monitor and offload
LAG slaves to TC") introduced netdev_linux_update_lag(), which is now
triggering a crash in the "datapath - ping over bond" test in
system-userspace-testsuite:

  (gdb) bt
  #0  0x00000000009762e7 in netdev_linux_update_lag (change=0x7ffdff013750) at lib/netdev-linux.c:728
  728                 if (is_netdev_linux_class(master_netdev->netdev_class)) {

This fixes the crash by simply returning in case netdev_from_name()
returns NULL, as this should indicate the master is not attached to the
bridge.

Additionally, netdev_linux_update_lag() isn't "clearing" the netdev
reference it gets from netdev_from_name(), meaning its ref_cnt is
incremented but never decremented. Thus, also call netdev_close() before
returning.

CC: John Hurley <john.hurley@netronome.com>
Fixes: d22f8927 ("netdev-linux: monitor and offload LAG slaves to TC")
Signed-off-by: Tiago Lam <tiago.lam@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-07-05 14:05:21 -07:00
John Hurley
d22f8927c3 netdev-linux: monitor and offload LAG slaves to TC
A LAG slave cannot be added directly to an OvS bridge, nor can a OvS
bridge port be added to a LAG dev. However, LAG masters can be added to
OvS.

Use TC blocks to indirectly offload slaves when their master is attached
as a linux-netdev to an OvS bridge. In the kernel TC datapath, blocks link
together netdevs in a similar way to LAG devices. For example, if a filter
is added to a block then it is added to all block devices, or if stats are
incremented on 1 device then the stats on the entire block are incremented.
This mimics LAG devices in that if a rule is applied to the LAG master
then it should be applied to all slaves etc.

Monitor LAG slaves via the netlink socket in netdev-linux and, if their
master is attached to the OvS bridge and has a block id, add the slave's
qdisc to the same block. Similarly, if a slave is freed from a master,
remove the qdisc from the masters block.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:57:47 +02:00
John Hurley
25db83be5a netdev-linux: assign LAG devs to tc blocks
Assign block ids to LAG masters that are added to OvS as linux-netdevs and
offloaded via offload API calls. Only LAG masters are assigned to blocks.

To ensure uniqueness, the block ids are determined by the netdev ifindex.
Implement a get_block_id op for linux netdevs to achieve this.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:57:44 +02:00
John Hurley
3d9c99ab8a netdev-linux: indicate if netdev is a LAG master
If a linux netdev is added to OvS that is a LAG master (for example, a
bond or team netdev) then record this in bool form in the dev struct. Use
the link info extracted from rtnetlink calls to determine this.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:57:40 +02:00
John Hurley
88dcf2aa82 netdev-provider: add class op to get block_id
Add a new class op for netdevs to get the block_id if one exists. The
block_id is used in offload ops to group multiple qdiscs together.

Stub calls are made to the new class op (implementation to follow in
further patches). The default block_id of 0 (no block) will be used in
these cases.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
2018-06-29 14:51:47 +02:00