2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 09:58:01 +00:00

netdev-afxdp: Best-effort configuration of XDP mode.

Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode.  This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.

This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface.  This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native               - new one, DRV without zero-copy
  * generic              - former SKB
  * best-effort          - new one, chooses the best available from
                           3 above modes

Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
This commit is contained in:
Ilya Maximets 2019-11-06 21:38:33 +00:00
parent 4ae8c4617f
commit e8f5634484
7 changed files with 233 additions and 124 deletions

View File

@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
make check-afxdp TESTSUITEFLAGS='1' make check-afxdp TESTSUITEFLAGS='1'
.. note:: .. note::
Not all test cases pass at this time. Currenly all TCP related Not all test cases pass at this time. Currenly all cvlan tests are skipped
tests, ex: using wget or http, are skipped due to XDP limitations due to kernel issues.
on veth. cvlan test is also skipped.
If a test case fails, check the log at:: If a test case fails, check the log at::
@ -177,21 +176,23 @@ in :doc:`general`::
ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
Make sure your device driver support AF_XDP, netdev-afxdp supports Make sure your device driver support AF_XDP, netdev-afxdp supports
the following additional options (see man ovs-vswitchd.conf.db for the following additional options (see ``man ovs-vswitchd.conf.db`` for
more details): more details):
* **xdpmode**: use "drv" for driver mode, or "skb" for skb mode. * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
``native`` or ``generic``. Defaults to ``best-effort``, i.e. best of
supported modes, so in most cases you don't need to change it.
* **use-need-wakeup**: default "true" if libbpf supports it, otherwise false. * ``use-need-wakeup``: default ``true`` if libbpf supports it,
otherwise ``false``.
For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device, For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**. configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
The **xdpmode** can be "drv" or "skb":: ``n_rxq``::
ethtool -L enp2s0 combined 1 ethtool -L enp2s0 combined 1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
options:n_rxq=1 options:xdpmode=drv \
other_config:pmd-rxq-affinity="0:4" other_config:pmd-rxq-affinity="0:4"
Or, use 4 pmds/cores and 4 queues by doing:: Or, use 4 pmds/cores and 4 queues by doing::
@ -199,11 +200,11 @@ Or, use 4 pmds/cores and 4 queues by doing::
ethtool -L enp2s0 combined 4 ethtool -L enp2s0 combined 4
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
options:n_rxq=4 options:xdpmode=drv \ options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
.. note:: .. note::
pmd-rxq-affinity is optional. If not specified, system will auto-assign. ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
``n_rxq`` equals ``1`` by default.
To validate that the bridge has successfully instantiated, you can use the:: To validate that the bridge has successfully instantiated, you can use the::
@ -214,12 +215,21 @@ Should show something like::
Port "ens802f0" Port "ens802f0"
Interface "ens802f0" Interface "ens802f0"
type: afxdp type: afxdp
options: {n_rxq="1", xdpmode=drv} options: {n_rxq="1"}
Otherwise, enable debugging by:: Otherwise, enable debugging by::
ovs-appctl vlog/set netdev_afxdp::dbg ovs-appctl vlog/set netdev_afxdp::dbg
To check which XDP mode was chosen by ``best-effort``, you can look for
``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
# ovs-appctl dpctl/show
netdev@ovs-netdev:
<...>
port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
xdp-mode=best-effort,
xdp-mode-in-use=native-with-zerocopy)
References References
---------- ----------
@ -323,8 +333,13 @@ Limitations/Known Issues
#. Most of the tests are done using i40e single port. Multiple ports and #. Most of the tests are done using i40e single port. Multiple ports and
also ixgbe driver also needs to be tested. also ixgbe driver also needs to be tested.
#. No latency test result (TODO items) #. No latency test result (TODO items)
#. Due to limitations of current upstream kernel, TCP and various offloading #. Due to limitations of current upstream kernel, various offloading
(vlan, cvlan) is not working over virtual interfaces (i.e. veth pair). (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
Also, TCP is not working over virtual interfaces (veth) in generic XDP mode.
Some more information and possible workaround available `here
<https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
For TAP interfaces generic mode seems to work fine (TCP works) and even
could provide better performance than native mode in some cases.
PVP using tap device PVP using tap device
@ -335,8 +350,7 @@ First, start OVS, then add physical port::
ethtool -L enp2s0 combined 1 ethtool -L enp2s0 combined 1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
options:n_rxq=1 options:xdpmode=drv \ options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
other_config:pmd-rxq-affinity="0:4"
Start a VM with virtio and tap device:: Start a VM with virtio and tap device::
@ -414,13 +428,11 @@ Create namespace and veth peer devices::
Attach the veth port to br0 (linux kernel mode):: Attach the veth port to br0 (linux kernel mode)::
ovs-vsctl add-port br0 afxdp-p0 -- \ ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
set interface afxdp-p0 options:n_rxq=1
Or, use AF_XDP with skb mode:: Or, use AF_XDP::
ovs-vsctl add-port br0 afxdp-p0 -- \ ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
Setup the OpenFlow rules:: Setup the OpenFlow rules::

12
NEWS
View File

@ -5,11 +5,19 @@ Post-v2.12.0
separate project. You can find it at separate project. You can find it at
https://github.com/ovn-org/ovn.git https://github.com/ovn-org/ovn.git
- Userspace datapath: - Userspace datapath:
* Add option to enable, disable and query TCP sequence checking in
conntrack.
- AF_XDP:
* New option 'use-need-wakeup' for netdev-afxdp to control enabling * New option 'use-need-wakeup' for netdev-afxdp to control enabling
of corresponding 'need_wakeup' flag in AF_XDP rings. Enabled by default of corresponding 'need_wakeup' flag in AF_XDP rings. Enabled by default
if supported by libbpf. if supported by libbpf.
* Add option to enable, disable and query TCP sequence checking in * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
conntrack. Modes also updated. New values:
native-with-zerocopy - former DRV
native - new one, DRV without zero-copy
generic - former SKB
best-effort [default] - new one, chooses the best available from
3 above modes
- DPDK: - DPDK:
* DPDK pdump packet capture support disabled by default. New configure * DPDK pdump packet capture support disabled by default. New configure
option '--enable-dpdk-pdump' to enable it. option '--enable-dpdk-pdump' to enable it.

View File

@ -89,12 +89,46 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base)) #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id, static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
int mode, bool use_need_wakeup); enum afxdp_mode mode,
static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode); bool use_need_wakeup,
bool report_socket_failures);
static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
static void xsk_destroy(struct xsk_socket_info *xsk); static void xsk_destroy(struct xsk_socket_info *xsk);
static int xsk_configure_all(struct netdev *netdev); static int xsk_configure_all(struct netdev *netdev);
static void xsk_destroy_all(struct netdev *netdev); static void xsk_destroy_all(struct netdev *netdev);
static struct {
const char *name;
uint32_t bind_flags;
uint32_t xdp_flags;
} xdp_modes[] = {
[OVS_AF_XDP_MODE_UNSPEC] = {
.name = "unspecified",
.bind_flags = 0,
.xdp_flags = 0,
},
[OVS_AF_XDP_MODE_BEST_EFFORT] = {
.name = "best-effort",
.bind_flags = 0,
.xdp_flags = 0,
},
[OVS_AF_XDP_MODE_NATIVE_ZC] = {
.name = "native-with-zerocopy",
.bind_flags = XDP_ZEROCOPY,
.xdp_flags = XDP_FLAGS_DRV_MODE,
},
[OVS_AF_XDP_MODE_NATIVE] = {
.name = "native",
.bind_flags = XDP_COPY,
.xdp_flags = XDP_FLAGS_DRV_MODE,
},
[OVS_AF_XDP_MODE_GENERIC] = {
.name = "generic",
.bind_flags = XDP_COPY,
.xdp_flags = XDP_FLAGS_SKB_MODE,
},
};
struct unused_pool { struct unused_pool {
struct xsk_umem_info *umem_info; struct xsk_umem_info *umem_info;
int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */ int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
@ -214,7 +248,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
} }
static struct xsk_umem_info * static struct xsk_umem_info *
xsk_configure_umem(void *buffer, uint64_t size, int xdpmode) xsk_configure_umem(void *buffer, uint64_t size)
{ {
struct xsk_umem_config uconfig; struct xsk_umem_config uconfig;
struct xsk_umem_info *umem; struct xsk_umem_info *umem;
@ -232,9 +266,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq, ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
&uconfig); &uconfig);
if (ret) { if (ret) {
VLOG_ERR("xsk_umem__create failed (%s) mode: %s", VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
ovs_strerror(errno),
xdpmode == XDP_COPY ? "SKB": "DRV");
free(umem); free(umem);
return NULL; return NULL;
} }
@ -290,7 +322,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
static struct xsk_socket_info * static struct xsk_socket_info *
xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
uint32_t queue_id, int xdpmode, bool use_need_wakeup) uint32_t queue_id, enum afxdp_mode mode,
bool use_need_wakeup, bool report_socket_failures)
{ {
struct xsk_socket_config cfg; struct xsk_socket_config cfg;
struct xsk_socket_info *xsk; struct xsk_socket_info *xsk;
@ -304,14 +337,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
cfg.rx_size = CONS_NUM_DESCS; cfg.rx_size = CONS_NUM_DESCS;
cfg.tx_size = PROD_NUM_DESCS; cfg.tx_size = PROD_NUM_DESCS;
cfg.libbpf_flags = 0; cfg.libbpf_flags = 0;
cfg.bind_flags = xdp_modes[mode].bind_flags;
if (xdpmode == XDP_ZEROCOPY) { cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
cfg.bind_flags = XDP_ZEROCOPY;
cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
} else {
cfg.bind_flags = XDP_COPY;
cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
}
#ifdef HAVE_XDP_NEED_WAKEUP #ifdef HAVE_XDP_NEED_WAKEUP
if (use_need_wakeup) { if (use_need_wakeup) {
@ -329,12 +356,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem, ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
&xsk->rx, &xsk->tx, &cfg); &xsk->rx, &xsk->tx, &cfg);
if (ret) { if (ret) {
VLOG_ERR("xsk_socket__create failed (%s) mode: %s " VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
"use-need-wakeup: %s qid: %d", "xsk_socket__create failed (%s) mode: %s, "
ovs_strerror(errno), "use-need-wakeup: %s, qid: %d",
xdpmode == XDP_COPY ? "SKB": "DRV", ovs_strerror(errno), xdp_modes[mode].name,
use_need_wakeup ? "true" : "false", use_need_wakeup ? "true" : "false", queue_id);
queue_id);
free(xsk); free(xsk);
return NULL; return NULL;
} }
@ -375,8 +401,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
} }
static struct xsk_socket_info * static struct xsk_socket_info *
xsk_configure(int ifindex, int xdp_queue_id, int xdpmode, xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
bool use_need_wakeup) bool use_need_wakeup, bool report_socket_failures)
{ {
struct xsk_socket_info *xsk; struct xsk_socket_info *xsk;
struct xsk_umem_info *umem; struct xsk_umem_info *umem;
@ -389,9 +415,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
memset(bufs, 0, NUM_FRAMES * FRAME_SIZE); memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
/* Create AF_XDP socket. */ /* Create AF_XDP socket. */
umem = xsk_configure_umem(bufs, umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
NUM_FRAMES * FRAME_SIZE,
xdpmode);
if (!umem) { if (!umem) {
free_pagealign(bufs); free_pagealign(bufs);
return NULL; return NULL;
@ -399,8 +423,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem); VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode, xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
use_need_wakeup); use_need_wakeup, report_socket_failures);
if (!xsk) { if (!xsk) {
/* Clean up umem and xpacket pool. */ /* Clean up umem and xpacket pool. */
if (xsk_umem__delete(umem->umem)) { if (xsk_umem__delete(umem->umem)) {
@ -414,12 +438,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
return xsk; return xsk;
} }
static int
xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
enum afxdp_mode mode, bool report_socket_failures)
{
struct xsk_socket_info *xsk_info;
VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
dev->use_need_wakeup ? "true" : "false");
xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
report_socket_failures);
if (!xsk_info) {
VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
"%s: Failed to create AF_XDP socket on queue %d in %s mode.",
netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
dev->xsks[queue_id] = NULL;
return -1;
}
dev->xsks[queue_id] = xsk_info;
atomic_init(&xsk_info->tx_dropped, 0);
xsk_info->outstanding_tx = 0;
xsk_info->available_rx = PROD_NUM_DESCS;
return 0;
}
static int static int
xsk_configure_all(struct netdev *netdev) xsk_configure_all(struct netdev *netdev)
{ {
struct netdev_linux *dev = netdev_linux_cast(netdev); struct netdev_linux *dev = netdev_linux_cast(netdev);
struct xsk_socket_info *xsk_info;
int i, ifindex, n_rxq, n_txq; int i, ifindex, n_rxq, n_txq;
int qid = 0;
ifindex = linux_get_ifindex(netdev_get_name(netdev)); ifindex = linux_get_ifindex(netdev_get_name(netdev));
@ -429,23 +479,36 @@ xsk_configure_all(struct netdev *netdev)
n_rxq = netdev_n_rxq(netdev); n_rxq = netdev_n_rxq(netdev);
dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks); dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
/* Configure each queue. */ if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
for (i = 0; i < n_rxq; i++) { /* Trying to configure first queue with different modes to
VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.", * find the most suitable. */
netdev_get_name(netdev), i, for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
dev->xdpmode == XDP_COPY ? "SKB" : "DRV", if (!xsk_configure_queue(dev, ifindex, qid, i,
dev->use_need_wakeup ? "true" : "false"); i == OVS_AF_XDP_MODE_MAX - 1)) {
xsk_info = xsk_configure(ifindex, i, dev->xdpmode, dev->xdp_mode_in_use = i;
dev->use_need_wakeup); VLOG_INFO("%s: %s XDP mode will be in use.",
if (!xsk_info) { netdev_get_name(netdev), xdp_modes[i].name);
VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i); break;
dev->xsks[i] = NULL; }
}
if (i == OVS_AF_XDP_MODE_MAX) {
VLOG_ERR("%s: Failed to detect suitable XDP mode.",
netdev_get_name(netdev));
goto err;
}
qid++;
} else {
dev->xdp_mode_in_use = dev->xdp_mode;
}
/* Configure remaining queues. */
for (; qid < n_rxq; qid++) {
if (xsk_configure_queue(dev, ifindex, qid,
dev->xdp_mode_in_use, true)) {
VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
netdev_get_name(netdev), qid);
goto err; goto err;
} }
dev->xsks[i] = xsk_info;
atomic_init(&xsk_info->tx_dropped, 0);
xsk_info->outstanding_tx = 0;
xsk_info->available_rx = PROD_NUM_DESCS;
} }
n_txq = netdev_n_txq(netdev); n_txq = netdev_n_txq(netdev);
@ -500,7 +563,7 @@ xsk_destroy_all(struct netdev *netdev)
if (dev->xsks[i]) { if (dev->xsks[i]) {
xsk_destroy(dev->xsks[i]); xsk_destroy(dev->xsks[i]);
dev->xsks[i] = NULL; dev->xsks[i] = NULL;
VLOG_INFO("Destroyed xsk[%d].", i); VLOG_DBG("%s: Destroyed xsk[%d].", netdev_get_name(netdev), i);
} }
} }
@ -510,7 +573,7 @@ xsk_destroy_all(struct netdev *netdev)
VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev)); VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
ifindex = linux_get_ifindex(netdev_get_name(netdev)); ifindex = linux_get_ifindex(netdev_get_name(netdev));
xsk_remove_xdp_program(ifindex, dev->xdpmode); xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
if (dev->tx_locks) { if (dev->tx_locks) {
for (i = 0; i < netdev_n_txq(netdev); i++) { for (i = 0; i < netdev_n_txq(netdev); i++) {
@ -526,9 +589,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
char **errp OVS_UNUSED) char **errp OVS_UNUSED)
{ {
struct netdev_linux *dev = netdev_linux_cast(netdev); struct netdev_linux *dev = netdev_linux_cast(netdev);
const char *str_xdpmode; const char *str_xdp_mode;
int xdpmode, new_n_rxq; enum afxdp_mode xdp_mode;
bool need_wakeup; bool need_wakeup;
int new_n_rxq;
ovs_mutex_lock(&dev->mutex); ovs_mutex_lock(&dev->mutex);
new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
@ -539,14 +603,17 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
return EINVAL; return EINVAL;
} }
str_xdpmode = smap_get_def(args, "xdpmode", "skb"); str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort");
if (!strcasecmp(str_xdpmode, "drv")) { for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
xdpmode = XDP_ZEROCOPY; xdp_mode < OVS_AF_XDP_MODE_MAX;
} else if (!strcasecmp(str_xdpmode, "skb")) { xdp_mode++) {
xdpmode = XDP_COPY; if (!strcasecmp(str_xdp_mode, xdp_modes[xdp_mode].name)) {
} else { break;
VLOG_ERR("%s: Incorrect xdpmode (%s).", }
netdev_get_name(netdev), str_xdpmode); }
if (xdp_mode == OVS_AF_XDP_MODE_MAX) {
VLOG_ERR("%s: Incorrect xdp-mode (%s).",
netdev_get_name(netdev), str_xdp_mode);
ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dev->mutex);
return EINVAL; return EINVAL;
} }
@ -560,10 +627,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
#endif #endif
if (dev->requested_n_rxq != new_n_rxq if (dev->requested_n_rxq != new_n_rxq
|| dev->requested_xdpmode != xdpmode || dev->requested_xdp_mode != xdp_mode
|| dev->requested_need_wakeup != need_wakeup) { || dev->requested_need_wakeup != need_wakeup) {
dev->requested_n_rxq = new_n_rxq; dev->requested_n_rxq = new_n_rxq;
dev->requested_xdpmode = xdpmode; dev->requested_xdp_mode = xdp_mode;
dev->requested_need_wakeup = need_wakeup; dev->requested_need_wakeup = need_wakeup;
netdev_request_reconfigure(netdev); netdev_request_reconfigure(netdev);
} }
@ -578,8 +645,9 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
ovs_mutex_lock(&dev->mutex); ovs_mutex_lock(&dev->mutex);
smap_add_format(args, "n_rxq", "%d", netdev->n_rxq); smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
smap_add_format(args, "xdpmode", "%s", smap_add_format(args, "xdp-mode", "%s", xdp_modes[dev->xdp_mode].name);
dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb"); smap_add_format(args, "xdp-mode-in-use", "%s",
xdp_modes[dev->xdp_mode_in_use].name);
smap_add_format(args, "use-need-wakeup", "%s", smap_add_format(args, "use-need-wakeup", "%s",
dev->use_need_wakeup ? "true" : "false"); dev->use_need_wakeup ? "true" : "false");
ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dev->mutex);
@ -596,7 +664,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
ovs_mutex_lock(&dev->mutex); ovs_mutex_lock(&dev->mutex);
if (netdev->n_rxq == dev->requested_n_rxq if (netdev->n_rxq == dev->requested_n_rxq
&& dev->xdpmode == dev->requested_xdpmode && dev->xdp_mode == dev->requested_xdp_mode
&& dev->use_need_wakeup == dev->requested_need_wakeup && dev->use_need_wakeup == dev->requested_need_wakeup
&& dev->xsks) { && dev->xsks) {
goto out; goto out;
@ -607,9 +675,9 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
netdev->n_rxq = dev->requested_n_rxq; netdev->n_rxq = dev->requested_n_rxq;
netdev->n_txq = netdev->n_rxq; netdev->n_txq = netdev->n_rxq;
dev->xdpmode = dev->requested_xdpmode; dev->xdp_mode = dev->requested_xdp_mode;
VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev), VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev),
dev->xdpmode == XDP_ZEROCOPY ? "DRV" : "SKB"); xdp_modes[dev->xdp_mode].name);
if (setrlimit(RLIMIT_MEMLOCK, &r)) { if (setrlimit(RLIMIT_MEMLOCK, &r)) {
VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno)); VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno));
@ -618,7 +686,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
err = xsk_configure_all(netdev); err = xsk_configure_all(netdev);
if (err) { if (err) {
VLOG_ERR("AF_XDP device %s reconfig failed.", netdev_get_name(netdev)); VLOG_ERR("%s: AF_XDP device reconfiguration failed.",
netdev_get_name(netdev));
} }
netdev_change_seq_changed(netdev); netdev_change_seq_changed(netdev);
out: out:
@ -638,17 +707,9 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev)
} }
static void static void
xsk_remove_xdp_program(uint32_t ifindex, int xdpmode) xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
{ {
uint32_t flags; uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
if (xdpmode == XDP_COPY) {
flags |= XDP_FLAGS_SKB_MODE;
} else if (xdpmode == XDP_ZEROCOPY) {
flags |= XDP_FLAGS_DRV_MODE;
}
bpf_set_link_xdp_fd(ifindex, -1, flags); bpf_set_link_xdp_fd(ifindex, -1, flags);
} }
@ -662,7 +723,7 @@ signal_remove_xdp(struct netdev *netdev)
ifindex = linux_get_ifindex(netdev_get_name(netdev)); ifindex = linux_get_ifindex(netdev_get_name(netdev));
VLOG_WARN("Force removing xdp program."); VLOG_WARN("Force removing xdp program.");
xsk_remove_xdp_program(ifindex, dev->xdpmode); xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
} }
static struct dp_packet_afxdp * static struct dp_packet_afxdp *
@ -782,7 +843,8 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
} }
static inline int static inline int
kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup) kick_tx(struct xsk_socket_info *xsk_info, enum afxdp_mode mode,
bool use_need_wakeup)
{ {
int ret, retries; int ret, retries;
static const int KERNEL_TX_BATCH_SIZE = 16; static const int KERNEL_TX_BATCH_SIZE = 16;
@ -791,11 +853,11 @@ kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
return 0; return 0;
} }
/* In SKB_MODE packet transmission is synchronous, and the kernel xmits /* In generic mode packet transmission is synchronous, and the kernel xmits
* only TX_BATCH_SIZE(16) packets for a single sendmsg syscall. * only TX_BATCH_SIZE(16) packets for a single sendmsg syscall.
* So, we have to kick the kernel (n_packets / 16) times to be sure that * So, we have to kick the kernel (n_packets / 16) times to be sure that
* all packets are transmitted. */ * all packets are transmitted. */
retries = (xdpmode == XDP_COPY) retries = (mode == OVS_AF_XDP_MODE_GENERIC)
? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE ? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE
: 0; : 0;
kick_retry: kick_retry:
@ -962,7 +1024,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
&orig); &orig);
COVERAGE_INC(afxdp_tx_full); COVERAGE_INC(afxdp_tx_full);
afxdp_complete_tx(xsk_info); afxdp_complete_tx(xsk_info);
kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup); kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
error = ENOMEM; error = ENOMEM;
goto out; goto out;
} }
@ -986,7 +1048,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
xsk_ring_prod__submit(&xsk_info->tx, dp_packet_batch_size(batch)); xsk_ring_prod__submit(&xsk_info->tx, dp_packet_batch_size(batch));
xsk_info->outstanding_tx += dp_packet_batch_size(batch); xsk_info->outstanding_tx += dp_packet_batch_size(batch);
ret = kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup); ret = kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
if (OVS_UNLIKELY(ret)) { if (OVS_UNLIKELY(ret)) {
VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.", VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.",
netdev_get_name(netdev), ovs_strerror(ret)); netdev_get_name(netdev), ovs_strerror(ret));
@ -1052,10 +1114,11 @@ netdev_afxdp_construct(struct netdev *netdev)
/* Queues should not be used before the first reconfiguration. Clearing. */ /* Queues should not be used before the first reconfiguration. Clearing. */
netdev->n_rxq = 0; netdev->n_rxq = 0;
netdev->n_txq = 0; netdev->n_txq = 0;
dev->xdpmode = 0; dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC;
dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC;
dev->requested_n_rxq = NR_QUEUE; dev->requested_n_rxq = NR_QUEUE;
dev->requested_xdpmode = XDP_COPY; dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT; dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
dev->xsks = NULL; dev->xsks = NULL;

View File

@ -25,6 +25,15 @@
/* These functions are Linux AF_XDP specific, so they should be used directly /* These functions are Linux AF_XDP specific, so they should be used directly
* only by Linux-specific code. */ * only by Linux-specific code. */
enum afxdp_mode {
OVS_AF_XDP_MODE_UNSPEC,
OVS_AF_XDP_MODE_BEST_EFFORT,
OVS_AF_XDP_MODE_NATIVE_ZC,
OVS_AF_XDP_MODE_NATIVE,
OVS_AF_XDP_MODE_GENERIC,
OVS_AF_XDP_MODE_MAX,
};
struct netdev; struct netdev;
struct xsk_socket_info; struct xsk_socket_info;
struct xdp_umem; struct xdp_umem;

View File

@ -100,10 +100,14 @@ struct netdev_linux {
/* AF_XDP information. */ /* AF_XDP information. */
struct xsk_socket_info **xsks; struct xsk_socket_info **xsks;
int requested_n_rxq; int requested_n_rxq;
int xdpmode; /* AF_XDP running mode: driver or skb. */
int requested_xdpmode; enum afxdp_mode xdp_mode; /* Configured AF_XDP mode. */
enum afxdp_mode requested_xdp_mode; /* Requested AF_XDP mode. */
enum afxdp_mode xdp_mode_in_use; /* Effective AF_XDP mode. */
bool use_need_wakeup; bool use_need_wakeup;
bool requested_need_wakeup; bool requested_need_wakeup;
struct ovs_spin *tx_locks; /* spin lock array for TX queues. */ struct ovs_spin *tx_locks; /* spin lock array for TX queues. */
#endif #endif
}; };

View File

@ -30,10 +30,3 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore]) AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
] ]
) )
# OVS_START_L7([namespace], [protocol])
#
# AF_XDP doesn't work with TCP over virtual interfaces for now.
#
m4_define([OVS_START_L7],
[AT_SKIP_IF([:])])

View File

@ -3107,18 +3107,38 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
</p> </p>
</column> </column>
<column name="options" key="xdpmode" <column name="options" key="xdp-mode"
type='{"type": "string", type='{"type": "string",
"enum": ["set", ["skb", "drv"]]}'> "enum": ["set", ["best-effort", "native-with-zerocopy",
"native", "generic"]]}'>
<p> <p>
Specifies the operational mode of the XDP program. Specifies the operational mode of the XDP program.
If "drv", the XDP program is loaded into the device driver with <p>
zero-copy RX and TX enabled. This mode requires device driver with In <code>native-with-zerocopy</code> mode the XDP program is loaded
AF_XDP support and has the best performance. into the device driver with zero-copy RX and TX enabled. This mode
If "skb", the XDP program is using generic XDP mode in kernel with requires device driver support and has the best performance because
extra data copying between userspace and kernel. No device driver there should be no copying of packets.
support is needed. Note that this is afxdp netdev type only. </p>
Defaults to "skb" mode. <p>
<code>native</code> is the same as
<code>native-with-zerocopy</code>, but without zero-copy
capability. This requires at least one copy between kernel and the
userspace. This mode also requires support from device driver.
</p>
<p>
In <code>generic</code> case the XDP program in kernel works after
skb allocation on early stages of packet processing inside the
network stack. This mode doesn't require driver support, but has
much lower performance.
</p>
<p>
<code>best-effort</code> tries to detect and choose the best
(fastest) from the available modes for current interface.
</p>
<p>
Note that this option is specific to netdev-afxdp.
Defaults to <code>best-effort</code> mode.
</p>
</p> </p>
</column> </column>