2
0
mirror of https://github.com/openvswitch/ovs synced 2025-09-05 08:45:23 +00:00

netdev-afxdp: add new netdev type for AF_XDP.

The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
type built upon the eBPF and XDP technology.  It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems
As a result, AF_XDP socket shows much better performance than AF_PACKET
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst. Note that by default, this feature is
not compiled in.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
This commit is contained in:
William Tu
2019-07-18 13:11:14 -07:00
committed by Ilya Maximets
parent 884ca8aceb
commit 0de1b42596
27 changed files with 2240 additions and 109 deletions

View File

@@ -10,6 +10,7 @@ DOC_SOURCE = \
Documentation/intro/why-ovs.rst \
Documentation/intro/install/index.rst \
Documentation/intro/install/bash-completion.rst \
Documentation/intro/install/afxdp.rst \
Documentation/intro/install/debian.rst \
Documentation/intro/install/documentation.rst \
Documentation/intro/install/distributions.rst \

View File

@@ -59,6 +59,7 @@ vSwitch? Start here.
:doc:`intro/install/windows` |
:doc:`intro/install/xenserver` |
:doc:`intro/install/dpdk` |
:doc:`intro/install/afxdp` |
:doc:`Installation FAQs <faq/releases>`
- **Tutorials:** :doc:`tutorials/faucet` |

View File

@@ -0,0 +1,432 @@
..
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
Convention for heading levels in Open vSwitch documentation:
======= Heading 0 (reserved for the title in a document)
------- Heading 1
~~~~~~~ Heading 2
+++++++ Heading 3
''''''' Heading 4
Avoid deeper levels because they do not render well.
========================
Open vSwitch with AF_XDP
========================
This document describes how to build and install Open vSwitch using
AF_XDP netdev.
.. warning::
The AF_XDP support of Open vSwitch is considered 'experimental',
and it is not compiled in by default.
Introduction
------------
AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
built upon the eBPF and XDP technology. It is aims to have comparable
performance to DPDK but cooperate better with existing kernel's networking
stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
attached to the netdev, by-passing a couple of Linux kernel's subsystems.
As a result, AF_XDP socket shows much better performance than AF_PACKET.
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst
AF_XDP Netdev
-------------
OVS has a couple of netdev types, i.e., system, tap, or
dpdk. The AF_XDP feature adds a new netdev types called
"afxdp", and implement its configuration, packet reception,
and transmit functions. Since the AF_XDP socket, called xsk,
operates in userspace, once ovs-vswitchd receives packets
from xsk, the afxdp netdev re-uses the existing userspace
dpif-netdev datapath. As a result, most of the packet processing
happens at the userspace instead of linux kernel.
::
| +-------------------+
| | ovs-vswitchd |<-->ovsdb-server
| +-------------------+
| | ofproto |<-->OpenFlow controllers
| +--------+-+--------+
| | netdev | |ofproto-|
userspace | +--------+ | dpif |
| | afxdp | +--------+
| | netdev | | dpif |
| +---||---+ +--------+
| || | dpif- |
| || | netdev |
|_ || +--------+
||
_ +---||-----+--------+
| | AF_XDP prog + |
kernel | | xsk_map |
|_ +--------||---------+
||
physical
NIC
Build requirements
------------------
In addition to the requirements described in :doc:`general`, building Open
vSwitch with AF_XDP will require the following:
- libbpf from kernel source tree (kernel 5.0.0 or later)
- Linux kernel XDP support, with the following options (required)
* CONFIG_BPF=y
* CONFIG_BPF_SYSCALL=y
* CONFIG_XDP_SOCKETS=y
- The following optional Kconfig options are also recommended, but not
required:
* CONFIG_BPF_JIT=y (Performance)
* CONFIG_HAVE_BPF_JIT=y (Performance)
* CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
- Once your AF_XDP-enabled kernel is ready, if possible, run
**./xdpsock -r -N -z -i <your device>** under linux/samples/bpf.
This is an OVS independent benchmark tools for AF_XDP.
It makes sure your basic kernel requirements are met for AF_XDP.
Installing
----------
For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
First, clone a recent version of Linux bpf-next tree::
git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
Second, go into the Linux source directory and build libbpf in the tools
directory::
cd bpf-next/
cd tools/lib/bpf/
make && make install
make install_headers
.. note::
Make sure xsk.h and bpf.h are installed in system's library path,
e.g. /usr/local/include/bpf/ or /usr/include/bpf/
Make sure the libbpf.so is installed correctly::
ldconfig
ldconfig -p | grep libbpf
Third, ensure the standard OVS requirements are installed and
bootstrap/configure the package::
./boot.sh && ./configure --enable-afxdp
Finally, build and install OVS::
make && make install
To kick start end-to-end autotesting::
uname -a # make sure having 5.0+ kernel
make check-afxdp TESTSUITEFLAGS='1'
.. note::
Not all test cases pass at this time. Currenly all TCP related
tests, ex: using wget or http, are skipped due to XDP limitations
on veth. cvlan test is also skipped.
If a test case fails, check the log at::
cat \
tests/system-afxdp-testsuite.dir/<test num>/system-afxdp-testsuite.log
Setup AF_XDP netdev
-------------------
Before running OVS with AF_XDP, make sure the libbpf and libelf are
set-up right::
ldd vswitchd/ovs-vswitchd
Open vSwitch should be started using userspace datapath as described
in :doc:`general`::
ovs-vswitchd ...
ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
Make sure your device driver support AF_XDP, and to use 1 PMD (on core 4)
on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
ethtool -L enp2s0 combined 1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
options:n_rxq=1 options:xdpmode=drv \
other_config:pmd-rxq-affinity="0:4"
Or, use 4 pmds/cores and 4 queues by doing::
ethtool -L enp2s0 combined 4
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
options:n_rxq=4 options:xdpmode=drv \
other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
.. note::
pmd-rxq-affinity is optional. If not specified, system will auto-assign.
To validate that the bridge has successfully instantiated, you can use the::
ovs-vsctl show
Should show something like::
Port "ens802f0"
Interface "ens802f0"
type: afxdp
options: {n_rxq="1", xdpmode=drv}
Otherwise, enable debugging by::
ovs-appctl vlog/set netdev_afxdp::dbg
References
----------
Most of the design details are described in the paper presented at
Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
section 4, and slides[2][4].
"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
about AF_XDP current and future work.
[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
Performance Tuning
------------------
The name of the game is to keep your CPU running in userspace, allowing PMD
to keep polling the AF_XDP queues without any interferences from kernel.
#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
running cores, device plug-in slot)
#. Isolate your CPU by doing isolcpu at grub configure.
#. IRQ should not set to pmd running core.
#. The Spectre and Meltdown fixes increase the overhead of system calls.
Debugging performance issue
~~~~~~~~~~~~~~~~~~~~~~~~~~~
While running the traffic, use linux perf tool to see where your cpu
spends its cycle::
cd bpf-next/tools/perf
make
./perf record -p `pidof ovs-vswitchd` sleep 10
./perf report
Measure your system call rate by doing::
pstree -p `pidof ovs-vswitchd`
strace -c -p <your pmd's PID>
Or, use OVS pmd tool::
ovs-appctl dpif-netdev/pmd-stats-show
Example Script
--------------
Below is a script using namespaces and veth peer::
#!/bin/bash
ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
--disable-system --detach \
ovs-vsctl -- add-br br0 -- set Bridge br0 \
protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
fail-mode=secure datapath_type=netdev
ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
ip netns add at_ns0
ovs-appctl vlog/set netdev_afxdp::dbg
ip link add p0 type veth peer name afxdp-p0
ip link set p0 netns at_ns0
ip link set dev afxdp-p0 up
ovs-vsctl add-port br0 afxdp-p0 -- \
set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
ip addr add "10.1.1.1/24" dev p0
ip link set dev p0 up
NS_EXEC_HEREDOC
ip netns add at_ns1
ip link add p1 type veth peer name afxdp-p1
ip link set p1 netns at_ns1
ip link set dev afxdp-p1 up
ovs-vsctl add-port br0 afxdp-p1 -- \
set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
ip addr add "10.1.1.2/24" dev p1
ip link set dev p1 up
NS_EXEC_HEREDOC
ip netns exec at_ns0 ping -i .2 10.1.1.2
Limitations/Known Issues
------------------------
#. Device's numa ID is always 0, need a way to find numa id from a netdev.
#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
work-around is to use OpenFlow meter action.
#. Most of the tests are done using i40e single port. Multiple ports and
also ixgbe driver also needs to be tested.
#. No latency test result (TODO items)
#. Due to limitations of current upstream kernel, TCP and various offloading
(vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
PVP using tap device
--------------------
Assume you have enp2s0 as physical nic, and a tap device connected to VM.
First, start OVS, then add physical port::
ethtool -L enp2s0 combined 1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
options:n_rxq=1 options:xdpmode=drv \
other_config:pmd-rxq-affinity="0:4"
Start a VM with virtio and tap device::
qemu-system-x86_64 -hda ubuntu1810.qcow \
-m 4096 \
-cpu host,+x2apic -enable-kvm \
-device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
-netdev type=tap,id=net0,vhost=on,queues=8 \
-object memory-backend-file,id=mem,size=4096M,\
mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc -smp 2
Create OpenFlow rules::
ovs-vsctl add-port br0 tap0 -- set interface tap0
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
Inside the VM, use xdp_rxq_info to bounce back the traffic::
./xdp_rxq_info --dev ens3 --action XDP_TX
PVP using vhostuser device
--------------------------
First, build OVS with DPDK and AFXDP::
./configure --enable-afxdp --with-dpdk=<dpdk path>
make -j4 && make install
Create a vhost-user port from OVS::
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
other_config:pmd-cpu-mask=0xfff
ovs-vsctl add-port br0 vhost-user-1 \
-- set Interface vhost-user-1 type=dpdkvhostuser
Start VM using vhost-user mode::
qemu-system-x86_64 -hda ubuntu1810.qcow \
-m 4096 \
-cpu host,+x2apic -enable-kvm \
-chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \
-netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \
-device virtio-net-pci,mac=00:00:00:00:00:01,\
netdev=mynet1,mq=on,vectors=10 \
-object memory-backend-file,id=mem,size=4096M,\
mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc -smp 2
Setup the OpenFlow ruls::
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:vhost-user-1"
ovs-ofctl add-flow br0 "in_port=vhost-user-1, actions=output:enp2s0"
Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
./xdp_rxq_info --dev ens3 --action XDP_DROP
./xdp_rxq_info --dev ens3 --action XDP_TX
PCP container using veth
------------------------
Create namespace and veth peer devices::
ip netns add at_ns0
ip link add p0 type veth peer name afxdp-p0
ip link set p0 netns at_ns0
ip link set dev afxdp-p0 up
ip netns exec at_ns0 ip link set dev p0 up
Attach the veth port to br0 (linux kernel mode)::
ovs-vsctl add-port br0 afxdp-p0 -- \
set interface afxdp-p0 options:n_rxq=1
Or, use AF_XDP with skb mode::
ovs-vsctl add-port br0 afxdp-p0 -- \
set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
Setup the OpenFlow rules::
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
In the namespace, run drop or bounce back the packet::
ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
Bug Reporting
-------------
Please report problems to dev@openvswitch.org.

View File

@@ -45,6 +45,7 @@ Installation from Source
xenserver
userspace
dpdk
afxdp
Installation from Packages
--------------------------

1
NEWS
View File

@@ -39,6 +39,7 @@ Post-v2.11.0
the lookup implementation at runtime. This enables specialization of
specific subtables based on the miniflow attributes, enhancing the
performance of the subtable search.
* Add Linux AF_XDP support through a new experimental netdev type "afxdp".
- OVSDB:
* OVSDB clients can now resynchronize with clustered servers much more
quickly after a brief disconnection, saving bandwidth and CPU time.

View File

@@ -238,6 +238,41 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
])
])
dnl OVS_CHECK_LINUX_AF_XDP
dnl
dnl Check both Linux kernel AF_XDP and libbpf support
AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
AC_ARG_ENABLE([afxdp],
[AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
[], [enable_afxdp=no])
AC_MSG_CHECKING([whether AF_XDP is enabled])
if test "$enable_afxdp" != yes; then
AC_MSG_RESULT([no])
AF_XDP_ENABLE=false
else
AC_MSG_RESULT([yes])
AF_XDP_ENABLE=true
AC_CHECK_HEADER([bpf/libbpf.h], [],
[AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
AC_CHECK_HEADER([linux/if_xdp.h], [],
[AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
AC_CHECK_HEADER([bpf/xsk.h], [],
[AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
AC_CHECK_FUNCS([pthread_spin_lock], [],
[AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
AC_DEFINE([HAVE_AF_XDP], [1],
[Define to 1 if AF_XDP support is available and enabled.])
LIBBPF_LDADD=" -lbpf -lelf"
AC_SUBST([LIBBPF_LDADD])
fi
AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
])
dnl OVS_CHECK_DPDK
dnl
dnl Configure DPDK source tree

View File

@@ -100,6 +100,7 @@ OVS_CHECK_SPHINX
OVS_CHECK_DOT
OVS_CHECK_IF_DL
OVS_CHECK_STRTOK_R
OVS_CHECK_LINUX_AF_XDP
AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
[], [], [[#include <sys/stat.h>]])

View File

@@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
if WIN32
lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
@@ -396,6 +397,7 @@ lib_libopenvswitch_la_SOURCES += \
lib/if-notifier.h \
lib/netdev-linux.c \
lib/netdev-linux.h \
lib/netdev-linux-private.h \
lib/netdev-offload-tc.c \
lib/netlink-conntrack.c \
lib/netlink-conntrack.h \
@@ -412,6 +414,14 @@ lib_libopenvswitch_la_SOURCES += \
lib/tc.h
endif
if HAVE_AF_XDP
lib_libopenvswitch_la_SOURCES += \
lib/netdev-afxdp-pool.c \
lib/netdev-afxdp-pool.h \
lib/netdev-afxdp.c \
lib/netdev-afxdp.h
endif
if DPDK_NETDEV
lib_libopenvswitch_la_SOURCES += \
lib/dpdk.c \

View File

@@ -19,6 +19,7 @@
#include <string.h>
#include "dp-packet.h"
#include "netdev-afxdp.h"
#include "netdev-dpdk.h"
#include "openvswitch/dynamic-string.h"
#include "util.h"
@@ -59,6 +60,22 @@ dp_packet_use(struct dp_packet *b, void *base, size_t allocated)
dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
}
#if HAVE_AF_XDP
/* Initialize 'b' as an empty dp_packet that contains
* memory starting at AF_XDP umem base.
*/
void
dp_packet_use_afxdp(struct dp_packet *b, void *data, size_t allocated,
size_t headroom)
{
dp_packet_set_base(b, (char *)data - headroom);
dp_packet_set_data(b, data);
dp_packet_set_size(b, 0);
dp_packet_init__(b, allocated, DPBUF_AFXDP);
}
#endif
/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
* memory starting at 'base'. 'base' should point to a buffer on the stack.
* (Nothing actually relies on 'base' being allocated on the stack. It could
@@ -122,6 +139,8 @@ dp_packet_uninit(struct dp_packet *b)
* created as a dp_packet */
free_dpdk_buf((struct dp_packet*) b);
#endif
} else if (b->source == DPBUF_AFXDP) {
free_afxdp_buf(b);
}
}
}
@@ -248,6 +267,9 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
case DPBUF_STACK:
OVS_NOT_REACHED();
case DPBUF_AFXDP:
OVS_NOT_REACHED();
case DPBUF_STUB:
b->source = DPBUF_MALLOC;
new_base = xmalloc(new_allocated);
@@ -433,6 +455,7 @@ dp_packet_steal_data(struct dp_packet *b)
{
void *p;
ovs_assert(b->source != DPBUF_DPDK);
ovs_assert(b->source != DPBUF_AFXDP);
if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) {
p = dp_packet_data(b);

View File

@@ -25,6 +25,7 @@
#include <rte_mbuf.h>
#endif
#include "netdev-afxdp.h"
#include "netdev-dpdk.h"
#include "openvswitch/list.h"
#include "packets.h"
@@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
DPBUF_DPDK, /* buffer data is from DPDK allocated memory.
* ref to dp_packet_init_dpdk() in dp-packet.c.
*/
DPBUF_AFXDP, /* Buffer data from XDP frame. */
};
#define DP_PACKET_CONTEXT_SIZE 64
@@ -89,6 +91,13 @@ struct dp_packet {
};
};
#if HAVE_AF_XDP
struct dp_packet_afxdp {
struct umem_pool *mpool;
struct dp_packet packet;
};
#endif
static inline void *dp_packet_data(const struct dp_packet *);
static inline void dp_packet_set_data(struct dp_packet *, void *);
static inline void *dp_packet_base(const struct dp_packet *);
@@ -122,7 +131,9 @@ static inline const void *dp_packet_get_nd_payload(const struct dp_packet *);
void dp_packet_use(struct dp_packet *, void *, size_t);
void dp_packet_use_stub(struct dp_packet *, void *, size_t);
void dp_packet_use_const(struct dp_packet *, const void *, size_t);
#if HAVE_AF_XDP
void dp_packet_use_afxdp(struct dp_packet *, void *, size_t, size_t);
#endif
void dp_packet_init_dpdk(struct dp_packet *);
void dp_packet_init(struct dp_packet *, size_t);
@@ -184,6 +195,11 @@ dp_packet_delete(struct dp_packet *b)
return;
}
if (b->source == DPBUF_AFXDP) {
free_afxdp_buf(b);
return;
}
dp_packet_uninit(b);
free(b);
}

View File

@@ -21,6 +21,7 @@
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#include <math.h>
#ifdef DPDK_NETDEV
@@ -186,6 +187,22 @@ struct pmd_perf_stats {
char *log_reason;
};
#ifdef __linux__
static inline uint64_t
rdtsc_syscall(struct pmd_perf_stats *s)
{
struct timespec val;
uint64_t v;
if (clock_gettime(CLOCK_MONOTONIC_RAW, &val) != 0) {
return s->last_tsc;
}
v = val.tv_sec * UINT64_C(1000000000) + val.tv_nsec;
return s->last_tsc = v;
}
#endif
/* Support for accurate timing of PMD execution on TSC clock cycle level.
* These functions are intended to be invoked in the context of pmd threads. */
@@ -198,6 +215,13 @@ cycles_counter_update(struct pmd_perf_stats *s)
{
#ifdef DPDK_NETDEV
return s->last_tsc = rte_get_tsc_cycles();
#elif !defined(_MSC_VER) && defined(__x86_64__)
uint32_t h, l;
asm volatile("rdtsc" : "=a" (l), "=d" (h));
return s->last_tsc = ((uint64_t) h << 32) | l;
#elif defined(__linux__)
return rdtsc_syscall(s);
#else
return s->last_tsc = 0;
#endif

167
lib/netdev-afxdp-pool.c Normal file
View File

@@ -0,0 +1,167 @@
/*
* Copyright (c) 2018, 2019 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <config.h>
#include "dp-packet.h"
#include "netdev-afxdp-pool.h"
#include "openvswitch/util.h"
/* Note:
* umem_elem_push* shouldn't overflow because we always pop
* elem first, then push back to the stack.
*/
static inline void
umem_elem_push_n__(struct umem_pool *umemp, int n, void **addrs)
{
void *ptr;
ovs_assert(umemp->index + n <= umemp->size);
ptr = &umemp->array[umemp->index];
memcpy(ptr, addrs, n * sizeof(void *));
umemp->index += n;
}
void umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
{
ovs_spin_lock(&umemp->lock);
umem_elem_push_n__(umemp, n, addrs);
ovs_spin_unlock(&umemp->lock);
}
static inline void
umem_elem_push__(struct umem_pool *umemp, void *addr)
{
ovs_assert(umemp->index + 1 <= umemp->size);
umemp->array[umemp->index++] = addr;
}
void
umem_elem_push(struct umem_pool *umemp, void *addr)
{
ovs_spin_lock(&umemp->lock);
umem_elem_push__(umemp, addr);
ovs_spin_unlock(&umemp->lock);
}
static inline int
umem_elem_pop_n__(struct umem_pool *umemp, int n, void **addrs)
{
void *ptr;
if (OVS_UNLIKELY(umemp->index - n < 0)) {
return -ENOMEM;
}
umemp->index -= n;
ptr = &umemp->array[umemp->index];
memcpy(addrs, ptr, n * sizeof(void *));
return 0;
}
int
umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
{
int ret;
ovs_spin_lock(&umemp->lock);
ret = umem_elem_pop_n__(umemp, n, addrs);
ovs_spin_unlock(&umemp->lock);
return ret;
}
static inline void *
umem_elem_pop__(struct umem_pool *umemp)
{
if (OVS_UNLIKELY(umemp->index - 1 < 0)) {
return NULL;
}
return umemp->array[--umemp->index];
}
void *
umem_elem_pop(struct umem_pool *umemp)
{
void *ptr;
ovs_spin_lock(&umemp->lock);
ptr = umem_elem_pop__(umemp);
ovs_spin_unlock(&umemp->lock);
return ptr;
}
static void **
umem_pool_alloc__(unsigned int size)
{
void **bufs;
bufs = xmalloc_pagealign(size * sizeof *bufs);
memset(bufs, 0, size * sizeof *bufs);
return bufs;
}
int
umem_pool_init(struct umem_pool *umemp, unsigned int size)
{
umemp->array = umem_pool_alloc__(size);
if (!umemp->array) {
return -ENOMEM;
}
umemp->size = size;
umemp->index = 0;
ovs_spin_init(&umemp->lock);
return 0;
}
void
umem_pool_cleanup(struct umem_pool *umemp)
{
free_pagealign(umemp->array);
umemp->array = NULL;
ovs_spin_destroy(&umemp->lock);
}
unsigned int
umem_pool_count(struct umem_pool *umemp)
{
return umemp->index;
}
/* AF_XDP metadata init/destroy. */
int
xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
{
xp->array = xmalloc_pagealign(size * sizeof *xp->array);
xp->size = size;
memset(xp->array, 0, size * sizeof *xp->array);
return 0;
}
void
xpacket_pool_cleanup(struct xpacket_pool *xp)
{
free_pagealign(xp->array);
xp->array = NULL;
}

56
lib/netdev-afxdp-pool.h Normal file
View File

@@ -0,0 +1,56 @@
/*
* Copyright (c) 2018, 2019 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef XDPSOCK_H
#define XDPSOCK_H 1
#ifdef HAVE_AF_XDP
#include <bpf/xsk.h>
#include <errno.h>
#include <stdbool.h>
#include "openvswitch/thread.h"
#include "ovs-atomic.h"
/* LIFO ptr_array. */
struct umem_pool {
int index; /* Point to top. */
unsigned int size;
struct ovs_spin lock;
void **array; /* A pointer array pointing to umem buf. */
};
/* Array-based dp_packet_afxdp. */
struct xpacket_pool {
unsigned int size;
struct dp_packet_afxdp *array;
};
void umem_elem_push(struct umem_pool *umemp, void *addr);
void umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
void *umem_elem_pop(struct umem_pool *umemp);
int umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
int umem_pool_init(struct umem_pool *umemp, unsigned int size);
void umem_pool_cleanup(struct umem_pool *umemp);
unsigned int umem_pool_count(struct umem_pool *umemp);
int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
void xpacket_pool_cleanup(struct xpacket_pool *xp);
#endif
#endif

1041
lib/netdev-afxdp.c Normal file

File diff suppressed because it is too large Load Diff

71
lib/netdev-afxdp.h Normal file
View File

@@ -0,0 +1,71 @@
/*
* Copyright (c) 2018, 2019 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef NETDEV_AFXDP_H
#define NETDEV_AFXDP_H 1
#ifdef HAVE_AF_XDP
#include <stdint.h>
#include <stdbool.h>
/* These functions are Linux AF_XDP specific, so they should be used directly
* only by Linux-specific code. */
struct netdev;
struct xsk_socket_info;
struct xdp_umem;
struct dp_packet_batch;
struct smap;
struct dp_packet;
struct netdev_rxq;
struct netdev_stats;
int netdev_afxdp_rxq_construct(struct netdev_rxq *rxq_);
void netdev_afxdp_rxq_destruct(struct netdev_rxq *rxq_);
void netdev_afxdp_destruct(struct netdev *netdev_);
int netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_,
struct dp_packet_batch *batch,
int *qfill);
int netdev_afxdp_batch_send(struct netdev *netdev_, int qid,
struct dp_packet_batch *batch,
bool concurrent_txq);
int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
char **errp);
int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args);
int netdev_afxdp_get_numa_id(const struct netdev *netdev);
int netdev_afxdp_get_stats(const struct netdev *netdev_,
struct netdev_stats *stats);
void free_afxdp_buf(struct dp_packet *p);
int netdev_afxdp_reconfigure(struct netdev *netdev);
void signal_remove_xdp(struct netdev *netdev);
#else /* !HAVE_AF_XDP */
#include "openvswitch/compiler.h"
struct dp_packet;
static inline void
free_afxdp_buf(struct dp_packet *p OVS_UNUSED)
{
/* Nothing. */
}
#endif /* HAVE_AF_XDP */
#endif /* netdev-afxdp.h */

130
lib/netdev-linux-private.h Normal file
View File

@@ -0,0 +1,130 @@
/*
* Copyright (c) 2019 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef NETDEV_LINUX_PRIVATE_H
#define NETDEV_LINUX_PRIVATE_H 1
#include <linux/filter.h>
#include <linux/gen_stats.h>
#include <linux/if_ether.h>
#include <linux/if_tun.h>
#include <linux/types.h>
#include <linux/ethtool.h>
#include <linux/mii.h>
#include <stdint.h>
#include <stdbool.h>
#include "netdev-afxdp.h"
#include "netdev-afxdp-pool.h"
#include "netdev-provider.h"
#include "netdev-vport.h"
#include "openvswitch/thread.h"
#include "ovs-atomic.h"
#include "timer.h"
struct netdev;
struct netdev_rxq_linux {
struct netdev_rxq up;
bool is_tap;
int fd;
};
void netdev_linux_run(const struct netdev_class *);
int get_stats_via_netlink(const struct netdev *netdev_,
struct netdev_stats *stats);
struct netdev_linux {
struct netdev up;
/* Protects all members below. */
struct ovs_mutex mutex;
unsigned int cache_valid;
bool miimon; /* Link status of last poll. */
long long int miimon_interval; /* Miimon Poll rate. Disabled if <= 0. */
struct timer miimon_timer;
int netnsid; /* Network namespace ID. */
/* The following are figured out "on demand" only. They are only valid
* when the corresponding VALID_* bit in 'cache_valid' is set. */
int ifindex;
struct eth_addr etheraddr;
int mtu;
unsigned int ifi_flags;
long long int carrier_resets;
uint32_t kbits_rate; /* Policing data. */
uint32_t kbits_burst;
int vport_stats_error; /* Cached error code from vport_get_stats().
0 or an errno value. */
int netdev_mtu_error; /* Cached error code from SIOCGIFMTU
* or SIOCSIFMTU.
*/
int ether_addr_error; /* Cached error code from set/get etheraddr. */
int netdev_policing_error; /* Cached error code from set policing. */
int get_features_error; /* Cached error code from ETHTOOL_GSET. */
int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */
enum netdev_features current; /* Cached from ETHTOOL_GSET. */
enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. */
struct tc *tc;
/* For devices of class netdev_tap_class only. */
int tap_fd;
bool present; /* If the device is present in the namespace */
uint64_t tx_dropped; /* tap device can drop if the iface is down */
/* LAG information. */
bool is_lag_master; /* True if the netdev is a LAG master. */
#ifdef HAVE_AF_XDP
/* AF_XDP information. */
struct xsk_socket_info **xsks;
int requested_n_rxq;
int xdpmode; /* AF_XDP running mode: driver or skb. */
int requested_xdpmode;
struct ovs_spin *tx_locks; /* spin lock array for TX queues. */
#endif
};
static bool
is_netdev_linux_class(const struct netdev_class *netdev_class)
{
return netdev_class->run == netdev_linux_run;
}
static struct netdev_linux *
netdev_linux_cast(const struct netdev *netdev)
{
ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
return CONTAINER_OF(netdev, struct netdev_linux, up);
}
static struct netdev_rxq_linux *
netdev_rxq_linux_cast(const struct netdev_rxq *rx)
{
ovs_assert(is_netdev_linux_class(netdev_get_class(rx->netdev)));
return CONTAINER_OF(rx, struct netdev_rxq_linux, up);
}
#endif /* netdev-linux-private.h */

View File

@@ -17,6 +17,7 @@
#include <config.h>
#include "netdev-linux.h"
#include "netdev-linux-private.h"
#include <errno.h>
#include <fcntl.h>
@@ -54,6 +55,7 @@
#include "fatal-signal.h"
#include "hash.h"
#include "openvswitch/hmap.h"
#include "netdev-afxdp.h"
#include "netdev-provider.h"
#include "netdev-vport.h"
#include "netlink-notifier.h"
@@ -486,57 +488,6 @@ static int tc_calc_cell_log(unsigned int mtu);
static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, int mtu);
static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_t burst_bytes);
struct netdev_linux {
struct netdev up;
/* Protects all members below. */
struct ovs_mutex mutex;
unsigned int cache_valid;
bool miimon; /* Link status of last poll. */
long long int miimon_interval; /* Miimon Poll rate. Disabled if <= 0. */
struct timer miimon_timer;
int netnsid; /* Network namespace ID. */
/* The following are figured out "on demand" only. They are only valid
* when the corresponding VALID_* bit in 'cache_valid' is set. */
int ifindex;
struct eth_addr etheraddr;
int mtu;
unsigned int ifi_flags;
long long int carrier_resets;
uint32_t kbits_rate; /* Policing data. */
uint32_t kbits_burst;
int vport_stats_error; /* Cached error code from vport_get_stats().
0 or an errno value. */
int netdev_mtu_error; /* Cached error code from SIOCGIFMTU or SIOCSIFMTU. */
int ether_addr_error; /* Cached error code from set/get etheraddr. */
int netdev_policing_error; /* Cached error code from set policing. */
int get_features_error; /* Cached error code from ETHTOOL_GSET. */
int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */
enum netdev_features current; /* Cached from ETHTOOL_GSET. */
enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. */
struct tc *tc;
/* For devices of class netdev_tap_class only. */
int tap_fd;
bool present; /* If the device is present in the namespace */
uint64_t tx_dropped; /* tap device can drop if the iface is down */
/* LAG information. */
bool is_lag_master; /* True if the netdev is a LAG master. */
};
struct netdev_rxq_linux {
struct netdev_rxq up;
bool is_tap;
int fd;
};
/* This is set pretty low because we probably won't learn anything from the
* additional log messages. */
@@ -550,8 +501,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
* changes in the device miimon status, so we can use atomic_count. */
static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0);
static void netdev_linux_run(const struct netdev_class *);
static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *,
int cmd, const char *cmd_name);
static int get_flags(const struct netdev *, unsigned int *flags);
@@ -565,39 +514,17 @@ static int do_set_addr(struct netdev *netdev,
struct in_addr addr);
static int get_etheraddr(const char *netdev_name, struct eth_addr *ea);
static int set_etheraddr(const char *netdev_name, const struct eth_addr);
static int get_stats_via_netlink(const struct netdev *, struct netdev_stats *);
static int af_packet_sock(void);
static bool netdev_linux_miimon_enabled(void);
static void netdev_linux_miimon_run(void);
static void netdev_linux_miimon_wait(void);
static int netdev_linux_get_mtu__(struct netdev_linux *netdev, int *mtup);
static bool
is_netdev_linux_class(const struct netdev_class *netdev_class)
{
return netdev_class->run == netdev_linux_run;
}
static bool
is_tap_netdev(const struct netdev *netdev)
{
return netdev_get_class(netdev) == &netdev_tap_class;
}
static struct netdev_linux *
netdev_linux_cast(const struct netdev *netdev)
{
ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
return CONTAINER_OF(netdev, struct netdev_linux, up);
}
static struct netdev_rxq_linux *
netdev_rxq_linux_cast(const struct netdev_rxq *rx)
{
ovs_assert(is_netdev_linux_class(netdev_get_class(rx->netdev)));
return CONTAINER_OF(rx, struct netdev_rxq_linux, up);
}
static int
netdev_linux_netnsid_update__(struct netdev_linux *netdev)
@@ -773,7 +700,7 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
}
}
static void
void
netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
{
struct nl_sock *sock;
@@ -3278,9 +3205,7 @@ exit:
.run = netdev_linux_run, \
.wait = netdev_linux_wait, \
.alloc = netdev_linux_alloc, \
.destruct = netdev_linux_destruct, \
.dealloc = netdev_linux_dealloc, \
.send = netdev_linux_send, \
.send_wait = netdev_linux_send_wait, \
.set_etheraddr = netdev_linux_set_etheraddr, \
.get_etheraddr = netdev_linux_get_etheraddr, \
@@ -3311,39 +3236,74 @@ exit:
.arp_lookup = netdev_linux_arp_lookup, \
.update_flags = netdev_linux_update_flags, \
.rxq_alloc = netdev_linux_rxq_alloc, \
.rxq_construct = netdev_linux_rxq_construct, \
.rxq_destruct = netdev_linux_rxq_destruct, \
.rxq_dealloc = netdev_linux_rxq_dealloc, \
.rxq_recv = netdev_linux_rxq_recv, \
.rxq_wait = netdev_linux_rxq_wait, \
.rxq_drain = netdev_linux_rxq_drain
const struct netdev_class netdev_linux_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "system",
.is_pmd = false,
.construct = netdev_linux_construct,
.destruct = netdev_linux_destruct,
.get_stats = netdev_linux_get_stats,
.get_features = netdev_linux_get_features,
.get_status = netdev_linux_get_status,
.get_block_id = netdev_linux_get_block_id
.get_block_id = netdev_linux_get_block_id,
.send = netdev_linux_send,
.rxq_construct = netdev_linux_rxq_construct,
.rxq_destruct = netdev_linux_rxq_destruct,
.rxq_recv = netdev_linux_rxq_recv,
};
const struct netdev_class netdev_tap_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "tap",
.is_pmd = false,
.construct = netdev_linux_construct_tap,
.destruct = netdev_linux_destruct,
.get_stats = netdev_tap_get_stats,
.get_features = netdev_linux_get_features,
.get_status = netdev_linux_get_status,
.send = netdev_linux_send,
.rxq_construct = netdev_linux_rxq_construct,
.rxq_destruct = netdev_linux_rxq_destruct,
.rxq_recv = netdev_linux_rxq_recv,
};
const struct netdev_class netdev_internal_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "internal",
.is_pmd = false,
.construct = netdev_linux_construct,
.destruct = netdev_linux_destruct,
.get_stats = netdev_internal_get_stats,
.get_status = netdev_internal_get_status,
.send = netdev_linux_send,
.rxq_construct = netdev_linux_rxq_construct,
.rxq_destruct = netdev_linux_rxq_destruct,
.rxq_recv = netdev_linux_rxq_recv,
};
#ifdef HAVE_AF_XDP
const struct netdev_class netdev_afxdp_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "afxdp",
.is_pmd = true,
.construct = netdev_linux_construct,
.destruct = netdev_afxdp_destruct,
.get_stats = netdev_afxdp_get_stats,
.get_status = netdev_linux_get_status,
.set_config = netdev_afxdp_set_config,
.get_config = netdev_afxdp_get_config,
.reconfigure = netdev_afxdp_reconfigure,
.get_numa_id = netdev_afxdp_get_numa_id,
.send = netdev_afxdp_batch_send,
.rxq_construct = netdev_afxdp_rxq_construct,
.rxq_destruct = netdev_afxdp_rxq_destruct,
.rxq_recv = netdev_afxdp_rxq_recv,
};
#endif
#define CODEL_N_QUEUES 0x0000
@@ -5915,7 +5875,7 @@ netdev_stats_from_rtnl_link_stats64(struct netdev_stats *dst,
dst->tx_window_errors = src->tx_window_errors;
}
static int
int
get_stats_via_netlink(const struct netdev *netdev_, struct netdev_stats *stats)
{
struct ofpbuf request;

View File

@@ -832,6 +832,9 @@ extern const struct netdev_class netdev_linux_class;
extern const struct netdev_class netdev_internal_class;
extern const struct netdev_class netdev_tap_class;
#ifdef HAVE_AF_XDP
extern const struct netdev_class netdev_afxdp_class;
#endif
#ifdef __cplusplus
}
#endif

View File

@@ -103,6 +103,9 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
static void restore_all_flags(void *aux OVS_UNUSED);
void update_device_args(struct netdev *, const struct shash *args);
#ifdef HAVE_AF_XDP
void signal_remove_xdp(struct netdev *netdev);
#endif
int
netdev_n_txq(const struct netdev *netdev)
@@ -147,6 +150,9 @@ netdev_initialize(void)
netdev_vport_tunnel_register();
netdev_register_flow_api_provider(&netdev_offload_tc);
#ifdef HAVE_AF_XDP
netdev_register_provider(&netdev_afxdp_class);
#endif
#endif
#if defined(__FreeBSD__) || defined(__NetBSD__)
netdev_register_provider(&netdev_tap_class);
@@ -2021,6 +2027,11 @@ restore_all_flags(void *aux OVS_UNUSED)
saved_flags & ~saved_values,
&old_flags);
}
#ifdef HAVE_AF_XDP
if (netdev->netdev_class == &netdev_afxdp_class) {
signal_remove_xdp(netdev);
}
#endif
}
}

View File

@@ -214,20 +214,19 @@ x2nrealloc(void *p, size_t *n, size_t s)
return xrealloc(p, *n * s);
}
/* Allocates and returns 'size' bytes of memory aligned to a cache line and in
* dedicated cache lines. That is, the memory block returned will not share a
* cache line with other data, avoiding "false sharing".
/* Allocates and returns 'size' bytes of memory aligned to 'alignment' bytes.
* 'alignment' must be a power of two and a multiple of sizeof(void *).
*
* Use free_cacheline() to free the returned memory block. */
* Use free_size_align() to free the returned memory block. */
void *
xmalloc_cacheline(size_t size)
xmalloc_size_align(size_t size, size_t alignment)
{
#ifdef HAVE_POSIX_MEMALIGN
void *p;
int error;
COVERAGE_INC(util_xalloc);
error = posix_memalign(&p, CACHE_LINE_SIZE, size ? size : 1);
error = posix_memalign(&p, alignment, size ? size : 1);
if (error != 0) {
out_of_memory();
}
@@ -235,16 +234,16 @@ xmalloc_cacheline(size_t size)
#else
/* Allocate room for:
*
* - Header padding: Up to CACHE_LINE_SIZE - 1 bytes, to allow the
* pointer to be aligned exactly sizeof(void *) bytes before the
* beginning of a cache line.
* - Header padding: Up to alignment - 1 bytes, to allow the
* pointer 'q' to be aligned exactly sizeof(void *) bytes before the
* beginning of the alignment.
*
* - Pointer: A pointer to the start of the header padding, to allow us
* to free() the block later.
*
* - User data: 'size' bytes.
*
* - Trailer padding: Enough to bring the user data up to a cache line
* - Trailer padding: Enough to bring the user data up to a alignment
* multiple.
*
* +---------------+---------+------------------------+---------+
@@ -255,18 +254,55 @@ xmalloc_cacheline(size_t size)
* p q r
*
*/
void *p = xmalloc((CACHE_LINE_SIZE - 1)
+ sizeof(void *)
+ ROUND_UP(size, CACHE_LINE_SIZE));
bool runt = PAD_SIZE((uintptr_t) p, CACHE_LINE_SIZE) < sizeof(void *);
void *r = (void *) ROUND_UP((uintptr_t) p + (runt ? CACHE_LINE_SIZE : 0),
CACHE_LINE_SIZE);
void **q = (void **) r - 1;
void *p, *r, **q;
bool runt;
if (!IS_POW2(alignment) || (alignment % sizeof(void *) != 0)) {
ovs_abort(0, "Invalid alignment");
}
p = xmalloc((alignment - 1)
+ sizeof(void *)
+ ROUND_UP(size, alignment));
runt = PAD_SIZE((uintptr_t) p, alignment) < sizeof(void *);
/* When the padding size < sizeof(void*), we don't have enough room for
* pointer 'q'. As a reuslt, need to move 'r' to the next alignment.
* So ROUND_UP when xmalloc above, and ROUND_UP again when calculate 'r'
* below.
*/
r = (void *) ROUND_UP((uintptr_t) p + (runt ? alignment : 0), alignment);
q = (void **) r - 1;
*q = p;
return r;
#endif
}
void
free_size_align(void *p)
{
#ifdef HAVE_POSIX_MEMALIGN
free(p);
#else
if (p) {
void **q = (void **) p - 1;
free(*q);
}
#endif
}
/* Allocates and returns 'size' bytes of memory aligned to a cache line and in
* dedicated cache lines. That is, the memory block returned will not share a
* cache line with other data, avoiding "false sharing".
*
* Use free_cacheline() to free the returned memory block. */
void *
xmalloc_cacheline(size_t size)
{
return xmalloc_size_align(size, CACHE_LINE_SIZE);
}
/* Like xmalloc_cacheline() but clears the allocated memory to all zero
* bytes. */
void *
@@ -282,14 +318,19 @@ xzalloc_cacheline(size_t size)
void
free_cacheline(void *p)
{
#ifdef HAVE_POSIX_MEMALIGN
free(p);
#else
if (p) {
void **q = (void **) p - 1;
free(*q);
}
#endif
free_size_align(p);
}
void *
xmalloc_pagealign(size_t size)
{
return xmalloc_size_align(size, get_page_size());
}
void
free_pagealign(void *p)
{
free_size_align(p);
}
char *

View File

@@ -169,6 +169,11 @@ void ovs_strzcpy(char *dst, const char *src, size_t size);
int string_ends_with(const char *str, const char *suffix);
void *xmalloc_pagealign(size_t) MALLOC_LIKE;
void free_pagealign(void *);
void *xmalloc_size_align(size_t, size_t) MALLOC_LIKE;
void free_size_align(void *);
/* The C standards say that neither the 'dst' nor 'src' argument to
* memcpy() may be null, even if 'n' is zero. This wrapper tolerates
* the null case. */

3
tests/.gitignore vendored
View File

@@ -13,6 +13,9 @@
/ovsdb-cluster-testsuite.dir/
/ovsdb-cluster-testsuite.log
/pki/
/system-afxdp-testsuite
/system-afxdp-testsuite.dir/
/system-afxdp-testsuite.log
/system-dpdk-testsuite
/system-dpdk-testsuite.dir/
/system-dpdk-testsuite.log

View File

@@ -4,12 +4,14 @@ EXTRA_DIST += \
$(SYSTEM_TESTSUITE_AT) \
$(SYSTEM_KMOD_TESTSUITE_AT) \
$(SYSTEM_USERSPACE_TESTSUITE_AT) \
$(SYSTEM_AFXDP_TESTSUITE_AT) \
$(SYSTEM_OFFLOADS_TESTSUITE_AT) \
$(SYSTEM_DPDK_TESTSUITE_AT) \
$(OVSDB_CLUSTER_TESTSUITE_AT) \
$(TESTSUITE) \
$(SYSTEM_KMOD_TESTSUITE) \
$(SYSTEM_USERSPACE_TESTSUITE) \
$(SYSTEM_AFXDP_TESTSUITE) \
$(SYSTEM_OFFLOADS_TESTSUITE) \
$(SYSTEM_DPDK_TESTSUITE) \
$(OVSDB_CLUSTER_TESTSUITE) \
@@ -160,6 +162,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
tests/system-userspace-macros.at \
tests/system-userspace-packet-type-aware.at
SYSTEM_AFXDP_TESTSUITE_AT = \
tests/system-userspace-macros.at \
tests/system-afxdp-testsuite.at \
tests/system-afxdp-macros.at
SYSTEM_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-ovn.at \
@@ -184,6 +191,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
@@ -317,6 +325,10 @@ check-system-userspace: all
set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
check-afxdp: all
set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
"$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
check-offloads: all
set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
@@ -354,6 +366,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@
$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@
$(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@

View File

@@ -0,0 +1,39 @@
# Add port to ovs bridge by using afxdp mode.
# This will use generic XDP support in the veth driver.
m4_define([ADD_VETH],
[ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
CONFIGURE_VETH_OFFLOADS([$1])
AT_CHECK([ip link set $1 netns $2])
AT_CHECK([ip link set dev ovs-$1 up])
AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"])
NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
NS_CHECK_EXEC([$2], [ip link set dev $1 up])
if test -n "$5"; then
NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
fi
if test -n "$6"; then
NS_CHECK_EXEC([$2], [ip route add default via $6])
fi
on_exit 'ip link del ovs-$1'
]
)
m4_define([OVS_CHECK_8021AD],
[AT_SKIP_IF([:])])
# CONFIGURE_VETH_OFFLOADS([VETH])
#
# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
m4_define([CONFIGURE_VETH_OFFLOADS],
[AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
]
)
# OVS_START_L7([namespace], [protocol])
#
# AF_XDP doesn't work with TCP over virtual interfaces for now.
#
m4_define([OVS_START_L7],
[AT_SKIP_IF([:])])

View File

@@ -0,0 +1,26 @@
AT_INIT
AT_COPYRIGHT([Copyright (c) 2018, 2019 Nicira, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.])
m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
m4_include([tests/ovs-macros.at])
m4_include([tests/ovsdb-macros.at])
m4_include([tests/ofproto-macros.at])
m4_include([tests/system-common-macros.at])
m4_include([tests/system-userspace-macros.at])
m4_include([tests/system-afxdp-macros.at])
m4_include([tests/system-traffic.at])

View File

@@ -71,6 +71,7 @@ AT_CLEANUP
AT_SETUP([datapath - ping between two ports on cvlan])
OVS_TRAFFIC_VSWITCHD_START()
OVS_CHECK_8021AD()
AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
@@ -161,6 +162,7 @@ AT_CLEANUP
AT_SETUP([datapath - ping6 between two ports on cvlan])
OVS_TRAFFIC_VSWITCHD_START()
OVS_CHECK_8021AD()
AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])

View File

@@ -3107,6 +3107,21 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
</p>
</column>
<column name="other_config" key="xdpmode"
type='{"type": "string",
"enum": ["set", ["skb", "drv"]]}'>
<p>
Specifies the operational mode of the XDP program.
If "drv", the XDP program is loaded into the device driver with
zero-copy RX and TX enabled. This mode requires device driver with
AF_XDP support and has the best performance.
If "skb", the XDP program is using generic XDP mode in kernel with
extra data copying between userspace and kernel. No device driver
support is needed. Note that this is afxdp netdev type only.
Defaults to "skb" mode.
</p>
</column>
<column name="options" key="vhost-server-path"
type='{"type": "string"}'>
<p>