2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

18 Commits

Author SHA1 Message Date
Adrian Moreno
e9bf5bffb0 list: use short version of safe loops if possible.
Using the SHORT version of the *_SAFE loops makes the code cleaner
and less error-prone. So, use the SHORT version and remove the extra
variable when possible.

In order to be able to use both long and short versions without changing
the name of the macro for all the clients, overload the existing name
and select the appropriate version depending on the number of arguments.

Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-03-30 16:59:02 +02:00
Yi-Hung Wei
e8568993e0 netdev-afxdp: NUMA-aware memory allocation for XSK related memory.
Currently, the AF_XDP socket (XSK) related memory are allocated by main
thread in the main thread's NUMA domain.  With the patch that detects
netdev-linux's NUMA node id, the PMD thread of AF_XDP port will be run on
the AF_XDP netdev's NUMA domain.  If the net device's NUMA domain
is different from the main thread's NUMA domain, we will have two
cross-NUMA memory accesses (netdev <-> memory, memory <-> CPU).

This patch addresses the aforementioned issue by allocating
the memory in the net device's NUMA domain.

Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-18 02:11:39 +01:00
William Tu
105cf8df82 netdev-linux: Detect numa node id.
The patch detects the numa node id from the name of the netdev,
by reading the '/sys/class/net/<devname>/device/numa_node'.
If not available, ex: virtual device, or any error happens,
return numa id 0.  Currently only the afxdp netdev type uses it,
other linux netdev types are disabled due to no use case.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-01-18 01:42:22 +01:00
Ilya Maximets
161773c72a netdev-afxdp: Fix transmission freeze in native mode without zerocopy.
Kernel uses 'xsk_generic_xmit()' for all modes where zerocopy is
not enabled:

   net/xdp/xsk.c
   433  static int __xsk_sendmsg(struct sock *sk)
   434  {
            ...
   442      return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
   443  }

'xsk_generic_xmit ()' sends packets synchronously and no more than 16
packets at a time.  This means that we have to kick Tx with sendmsg()
for every 16 packets in simple native mode too, otherwise the packets
may never be sent.

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/365076.html
Fixes: e8f5634484e8 ("netdev-afxdp: Best-effort configuration of XDP mode.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
2020-01-06 11:47:28 -08:00
Ilya Maximets
28d0501623 ovs-thread: Avoid huge alignment on a base spinlock structure.
Marking the structure as 64 bytes aligned forces compiler to produce
big holes in the containing structures in order to fulfill this
requirement.  Also, any structure that contains this one as a member
automatically inherits this huge alignment making resulted memory
layout not efficient.  For example, 'struct umem_pool' currently
uses 3 full cache lines (192 bytes) with only 32 bytes of actual data:

  struct umem_pool {
    int                        index;                /*  0   4 */
    unsigned int               size;                 /*  4   4 */

    /* XXX 56 bytes hole, try to pack */

    /* --- cacheline 1 boundary (64 bytes) --- */
    struct ovs_spin lock __attribute__((__aligned__(64))); /* 64  64 */

    /* XXX last struct has 48 bytes of padding */

    /* --- cacheline 2 boundary (128 bytes) --- */
    void * *                   array;                /* 128  8 */

    /* size: 192, cachelines: 3, members: 4 */
    /* sum members: 80, holes: 1, sum holes: 56 */
    /* padding: 56 */
    /* paddings: 1, sum paddings: 48 */
    /* forced alignments: 1, forced holes: 1, sum forced holes: 56 */
  } __attribute__((__aligned__(64)));

Actual alignment of a spin lock is required only for Tx queue locks
inside netdev-afxdp to avoid false sharing, in all other cases
alignment only produces inefficient memory usage.

Also, CACHE_LINE_SIZE macro should be used instead of 64 as different
platforms may have different cache line sizes.

Using PADDED_MEMBERS to avoid alignment inheritance.

Fixes: ae36d63d7e3c ("ovs-thread: Make struct spin lock cache aligned.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
2019-12-19 01:06:54 +01:00
Ilya Maximets
37a2465523 netdev-afxdp: Avoid removing of XDP program if not loaded.
'bpf_set_link_xdp_fd' generates netlink event regardless of actual
changes it does, so if-notifier will receive link update even if
there was no XDP program previously loaded on the interface.

OVS tries to remove XDP program if device configuration was not
successful triggering if-notifier that triggers bridge reconfiguration
and another attempt to add failed port.  And so on in the infinite
loop.

This patch avoids the issue by not removing XDP program if it wasn't
loaded.  Since loading of the XDP program is one of the last steps
of port configuration, this should help to avoid infinite re-addition
for most types of misconfiguration.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
2019-12-18 01:39:56 +01:00
William Tu
7bf075d95a netdev-afxdp: Enable libbpf logging to OVS.
libbpf has pr_warn, pr_info, and pr_debug. The patch registers
these print functions, integrating the libbpf logs to OVS log.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-21 09:20:10 -08:00
Ilya Maximets
e8f5634484 netdev-afxdp: Best-effort configuration of XDP mode.
Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode.  This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.

This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface.  This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native               - new one, DRV without zero-copy
  * generic              - former SKB
  * best-effort          - new one, chooses the best available from
                           3 above modes

Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
2019-11-20 16:48:26 +01:00
Eelco Chaudron
52b5a5c0a3 netdev-afxdp: add afxdp specific maximum MTU check
Drivers natively supporting AF_XDP will check that a configured MTU size
will not exceed the allowed size for AF_XDP. However, when the skb
compatibility mode is used there is no check and any value is accepted.
This, for example, is the case when using the TAP interface.

This fix adds a check to make sure only AF_XDP valid values are excepted.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: William Tu <u9012063@gmail.com>
2019-11-19 11:20:49 -08:00
William Tu
e50547b51a netdev-afxdp: Add need_wakeup support.
The patch adds support for using need_wakeup flag in AF_XDP rings.
A new option, use-need-wakeup, is added.  When this option is used,
it means that OVS has to explicitly wake up the kernel RX, using poll()
syscall and wake up TX, using sendto() syscall. This feature improves
the performance by avoiding unnecessary sendto syscalls for TX.
For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
up the kernel RX processing when fill queue is replenished.

The need_wakeup feature is merged into Linux kernel bpf-next tee with commit
77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
OVS enables it by default, if libbpf supports it.  If users enable it but
runs in an older version of libbpf, then the need_wakeup feature has no effect,
and a warning message is logged.

For virtual interface, it's better set use-need-wakeup=false, since
the virtual device's AF_XDP xmit is synchronous: the sendto syscall
enters kernel and process the TX packet on tx queue directly.

On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
to physical port improves from 6.1Mpps to 7.3Mpps.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2019-10-29 19:26:59 +01:00
Ilya Maximets
53c0bd5de4 netdev-afxdp: Update memory locking limits unconditionally.
Any type of AF_XDP socket in all modes implies creation of BPF map of
type BPF_MAP_TYPE_XSKMAP.  This leads to BPF_MAP_CREATE syscall and
subsequently 'xsk_map_alloc()' function that will charge required
memory from the memlock limit and fail with EPERM if we're trying
to allocate more.

On my system with 64K bytes of max locked memory by default, OVS
frequently starts to fail after addition of 3rd afxdp port in SKB
mode:

  netdev_afxdp|ERR|xsk_socket__create failed (Operation not permitted)
                   mode: SKB qid: 0

Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
2019-10-10 11:22:05 -07:00
Ilya Maximets
ec92f8d2ff netdev-afxdp: Fix umem creation failure due to uninitialized config.
Later version of 'struct xsk_umem_config' contains additional field
'flags'.  OVS doesn't use that field passing uninitialized stack
memory to the 'xsk_umem__create()' call that could fail with
'Invalid argument' if 'flags' are non-zero or, even worse, create
umem with unexpected properties.

We need to clear the whole structure explicitly to avoid this kind
of issues.

Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: William Tu <u9012063@gmail.com>
2019-10-10 11:18:07 -07:00
Paul Chaignon
940ac2ce88 treewide: Use packet batch APIs
This patch replaces direct accesses to dp_packet_batch and dp_packet
internal components by the appropriate API calls.  It extends commit
1270b6e52 (treewide: Wider use of packet batch APIs).

This patch was generated using the following semantic patch (cf.
http://coccinelle.lip6.fr).

// <smpl>
@ dp_packet @
struct dp_packet_batch *b1;
struct dp_packet_batch b2;
struct dp_packet *p;
expression e;
@@

(
- b1->packets[b1->count++] = p;
+ dp_packet_batch_add(b1, p);
|
- b2.packets[b2.count++] = p;
+ dp_packet_batch_add(&b2, p);
|
- p->packet_type == htonl(PT_ETH)
+ dp_packet_is_eth(p)
|
- p->packet_type != htonl(PT_ETH)
+ !dp_packet_is_eth(p)
|
- b1->count == 0
+ dp_packet_batch_is_empty(b1)
|
- !b1->count
+ dp_packet_batch_is_empty(b1)
|
  b1->count = e;
|
  b1->count++
|
  b2.count = e;
|
  b2.count++
|
- b1->count
+ dp_packet_batch_size(b1)
|
- b2.count
+ dp_packet_batch_size(&b2)
)
// </smpl>

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2019-09-25 14:42:00 -07:00
Eelco Chaudron
05629ed271 netdev-afxdp: fix corner case where umem entries were not released
If for some reason the last element in the batch was already pushed on
the stack, none of the elements where pushed. This was leading to
buffer starvation.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-08-08 18:11:49 +03:00
William Tu
22b78906e6 netdev-afxdp: Error when no XDP program loaded.
netdev-afxdp requires XDP program to be loaded.  When prog_id == 0,
it indicates no XDP program, so return error and free resources.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-29 14:49:45 +03:00
Ilya Maximets
d560bc1baa netdev-afxdp: Convert AFXDP_DEBUG to custom stats.
These are valid statistics of a network interface and should be
exposed via custom stats.

The same MACRO trick as in vswitchd/bridge.c is used to reduce code
duplication and easily add new stats if necessary in the future.

Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-24 19:22:05 +03:00
Ilya Maximets
f627cf1dd9 netdev-afxdp: Fix use of unconfigured device.
In case of failure of 'xsk_configure_all()', 'n_rxq' and 'xdpmode'
will remain in a new state. This will result in successful
reconfiguration (immediate return, because configuration is already
applied) if 'netdev_reconfigure()' will be called again.

Same issue was fixed previously for netdev-dpdk using 'dev->started'
flag in commit:
606f66507250 ("netdev-dpdk: Don't use PMD driver if not configured successfully")

Let's use similar approach with checking the 'dev->xsks' which only
exists if configuration was successful.

Additionally implemented 'netdev_afxdp_construct()' function to
explicitly initialize all the specific fields and request the
reconfiguration.

CC: William Tu <u9012063@gmail.com>
Fixes: 0de1b425962d ("netdev-afxdp: add new netdev type for AF_XDP.")
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-23 10:35:29 +03:00
William Tu
0de1b42596 netdev-afxdp: add new netdev type for AF_XDP.
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-19 17:42:06 +03:00