2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

netdev-dpdk: add dpdk vhost-cuse ports

This patch adds support for a new port type to userspace datapath
called dpdkvhost. This allows KVM (QEMU) to offload the servicing
of virtio-net devices to its associated dpdkvhost port. Instructions
for use are in INSTALL.DPDK.

This has been tested on Intel multi-core platforms and with clients
that have virtio-net interfaces.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This commit is contained in:
Kevin Traynor 2015-03-05 13:42:04 -08:00 committed by Pravin B Shelar
parent 58be9c9fd7
commit 58397e6c1e
9 changed files with 1123 additions and 56 deletions

View File

@ -17,6 +17,7 @@ Building and Installing:
------------------------
Required DPDK 1.8.0
Optional `fuse`, `fuse-devel`
1. Configure build & install DPDK:
1. Set `$DPDK_DIR`
@ -290,12 +291,262 @@ A general rule of thumb for better performance is that the client
application should not be assigned the same dpdk core mask "-c" as
the vswitchd.
DPDK vhost:
-----------
vhost-cuse is only supported at present i.e. not using the standard QEMU
vhost-user interface. It is intended that vhost-user support will be added
in future releases when supported in DPDK and that vhost-cuse will eventually
be deprecated. See [DPDK Docs] for more info on vhost.
Prerequisites:
1. DPDK 1.8 with vhost support enabled and recompile OVS as above.
Update `config/common_linuxapp` so that DPDK is built with vhost
libraries:
`CONFIG_RTE_LIBRTE_VHOST=y`
2. Insert the Cuse module:
`modprobe cuse`
3. Build and insert the `eventfd_link` module:
`cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
`make`
`insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
Following the steps above to create a bridge, you can now add DPDK vhost
as a port to the vswitch.
`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
However, please note that when attaching userspace devices to QEMU, the
name provided during the add-port operation must match the ifname parameter
on the QEMU command line.
DPDK vhost VM configuration:
----------------------------
vhost ports use a Linux* character device to communicate with QEMU.
By default it is set to `/dev/vhost-net`. It is possible to reuse this
standard device for DPDK vhost, which makes setup a little simpler but it
is better practice to specify an alternative character device in order to
avoid any conflicts if kernel vhost is to be used in parallel.
1. This step is only needed if using an alternative character device.
The new character device filename must be specified on the vswitchd
commandline:
`./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...`
Note that the `--cuse_dev_name` argument and associated string must be the first
arguments after `--dpdk` and come before the EAL arguments. In the example
above, the character device to be used will be `/dev/my-vhost-net`.
2. This step is only needed if reusing the standard character device. It will
conflict with the kernel vhost character device so the user must first
remove it.
`rm -rf /dev/vhost-net`
3a. Configure virtio-net adaptors:
The following parameters must be passed to the QEMU binary:
```
-netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
-device virtio-net-pci,netdev=net1,mac=<mac>
```
Repeat the above parameters for multiple devices.
The DPDK vhost library will negiotiate its own features, so they
need not be passed in as command line params. Note that as offloads are
disabled this is the equivalent of setting:
`csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off`
3b. If using an alternative character device. It must be also explicitly
passed to QEMU using the `vhostfd` argument:
```
-netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
vhostfd=<open_fd>
-device virtio-net-pci,netdev=net1,mac=<mac>
```
The open file descriptor must be passed to QEMU running as a child
process. This could be done with a simple python script.
```
#!/usr/bin/python
fd = os.open("/dev/usvhost", os.O_RDWR)
subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\
vhost=on,vhostfd=" + fd +"...", shell=True)
Alternatively the the `qemu-wrap.py` script can be used to automate the
requirements specified above and can be used in conjunction with libvirt if
desired. See the "DPDK vhost VM configuration with QEMU wrapper" section
below.
4. Configure huge pages:
QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-ports to map the VM's
memory into their process address space, pass the following paramters
to QEMU:
`-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on -numa node,memdev=mem -mem-prealloc`
DPDK vhost VM configuration with QEMU wrapper:
----------------------------------------------
The QEMU wrapper script automatically detects and calls QEMU with the
necessary parameters. It performs the following actions:
* Automatically detects the location of the hugetlbfs and inserts this
into the command line parameters.
* Automatically open file descriptors for each virtio-net device and
inserts this into the command line parameters.
* Calls QEMU passing both the command line parameters passed to the
script itself and those it has auto-detected.
Before use, you **must** edit the configuration parameters section of the
script to point to the correct emulator location and set additional
settings. Of these settings, `emul_path` and `us_vhost_path` **must** be
set. All other settings are optional.
To use directly from the command line simply pass the wrapper some of the
QEMU parameters: it will configure the rest. For example:
```
qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
--enable-kvm -nographic -vnc none -net none -netdev tap,id=net1,
script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci,
netdev=net1,mac=00:00:00:00:00:01
DPDK vhost VM configuration with libvirt:
-----------------------------------------
If you are using libvirt, you must enable libvirt to access the character
device by adding it to controllers cgroup for libvirtd using the following
steps.
1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
```
1) clear_emulator_capabilities = 0
2) user = "root"
3) group = "root"
4) cgroup_device_acl = [
"/dev/null", "/dev/full", "/dev/zero",
"/dev/random", "/dev/urandom",
"/dev/ptmx", "/dev/kvm", "/dev/kqemu",
"/dev/rtc", "/dev/hpet", "/dev/net/tun",
"/dev/<my-vhost-device>",
"/dev/hugepages"]
```
<my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net`
device. If you have specificed a different name on the ovs-vswitchd
commandline using the "--cuse_dev_name" parameter, please specify that
filename instead.
2. Disable SELinux or set to permissive mode
3. Restart the libvirtd process
For example, on Fedora:
`systemctl restart libvirtd.service`
After successfully editing the configuration, you may launch your
vhost-enabled VM. The XML describing the VM can be configured like so
within the <qemu:commandline> section:
1. Set up shared hugepages:
```
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/>
<qemu:arg value='-numa'/>
<qemu:arg value='node,memdev=mem'/>
<qemu:arg value='-mem-prealloc'/>
```
2. Set up your tap devices:
```
<qemu:arg value='-netdev'/>
<qemu:arg value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/>
<qemu:arg value='-device'/>
<qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/>
```
Repeat for as many devices as are desired, modifying the id, ifname
and mac as necessary.
Again, if you are using an alternative character device (other than
`/dev/vhost-net`), please specify the file descriptor like so:
`<qemu:arg value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>`
Where <open_fd> refers to the open file descriptor of the character device.
Instructions of how to retrieve the file descriptor can be found in the
"DPDK vhost VM configuration" section.
Alternatively, the process is automated with the qemu-wrap.py script,
detailed in the next section.
Now you may launch your VM using virt-manager, or like so:
`virsh create my_vhost_vm.xml`
DPDK vhost VM configuration with libvirt and QEMU wrapper:
----------------------------------------------------------
To use the qemu-wrapper script in conjuntion with libvirt, follow the
steps in the previous section before proceeding with the following steps:
1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH)
Ideally in the same directory that the QEMU binary is located.
2. Ensure that the script has the same owner/group and file permissions
as the QEMU binary.
3. Update the VM xml file using "virsh edit VM.xml"
1. Set the VM to use the launch script.
Set the emulator path contained in the `<emulator><emulator/>` tags.
For example, replace:
`<emulator>/usr/bin/qemu-kvm<emulator/>`
with:
`<emulator>/usr/bin/qemu-wrap.py<emulator/>`
4. Edit the Configuration Parameters section of the script to point to
the correct emulator location and set any additional options. If you are
using a alternative character device name, please set "us_vhost_path" to the
location of that device. The script will automatically detect and insert
the correct "vhostfd" value in the QEMU command line arguements.
5. Use virt-manager to launch the VM
Restrictions:
-------------
- This Support is for Physical NIC. I have tested with Intel NIC only.
- Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
- Currently DPDK port does not make use any offload functionality.
- DPDK-vHost support works with 1G huge pages.
ivshmem:
- The shared memory is currently restricted to the use of a 1GB
@ -311,3 +562,4 @@ Please report problems to bugs@openvswitch.org.
[INSTALL.userspace.md]:INSTALL.userspace.md
[INSTALL.md]:INSTALL.md
[DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules
[DPDK Docs]: http://dpdk.org/doc

View File

@ -32,6 +32,10 @@ AM_CFLAGS = -Wstrict-prototypes
AM_CFLAGS += $(WARNING_FLAGS)
AM_CFLAGS += $(OVS_CFLAGS)
if DPDK_NETDEV
AM_CFLAGS += -D_FILE_OFFSET_BITS=64
endif
if NDEBUG
AM_CPPFLAGS += -DNDEBUG
AM_CFLAGS += -fomit-frame-pointer

1
NEWS
View File

@ -71,6 +71,7 @@ Post-v2.3.0
Auto-Attach.
- The default OpenFlow and OVSDB ports are now the IANA-assigned
numbers. OpenFlow is 6653 and OVSDB is 6640.
- Support for DPDK vHost.
v2.3.0 - 14 Aug 2014

View File

@ -170,7 +170,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
DPDK_INCLUDE=$RTE_SDK/include
DPDK_LIB_DIR=$RTE_SDK/lib
DPDK_LIB=-lintel_dpdk
DPDK_LIB="-lintel_dpdk -lfuse "
ovs_save_CFLAGS="$CFLAGS"
ovs_save_LDFLAGS="$LDFLAGS"
@ -214,7 +214,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
#
# These options are specified inside a single -Wl directive to prevent
# autotools from reordering them.
DPDK_vswitchd_LDFLAGS=-Wl,--whole-archive,$DPDK_LIB,--no-whole-archive
DPDK_vswitchd_LDFLAGS=-Wl,$DPDK_LIB
AC_SUBST([DPDK_vswitchd_LDFLAGS])
AC_DEFINE([DPDK_NETDEV], [1], [System uses the DPDK module.])
else

View File

@ -49,6 +49,7 @@
#include "rte_config.h"
#include "rte_mbuf.h"
#include "rte_virtio_net.h"
VLOG_DEFINE_THIS_MODULE(dpdk);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
@ -84,6 +85,11 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
#define TX_HTHRESH 0 /* Default values of TX host threshold reg. */
#define TX_WTHRESH 0 /* Default values of TX write-back threshold reg. */
#define MAX_PKT_BURST 32 /* Max burst size for RX/TX */
/* Character device cuse_dev_name. */
char *cuse_dev_name = NULL;
static const struct rte_eth_conf port_conf = {
.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,
@ -131,6 +137,11 @@ enum { DPDK_RING_SIZE = 256 };
BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
enum { DRAIN_TSC = 200000ULL };
enum dpdk_dev_type {
DPDK_DEV_ETH = 0,
DPDK_DEV_VHOST = 1
};
static int rte_eal_init_ret = ENODEV;
static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
@ -185,6 +196,7 @@ struct netdev_dpdk {
struct netdev up;
int port_id;
int max_packet_len;
enum dpdk_dev_type type;
struct dpdk_tx_queue *tx_q;
@ -202,9 +214,12 @@ struct netdev_dpdk {
struct rte_eth_link link;
int link_reset_cnt;
/* virtio-net structure for vhost device */
OVSRCU_TYPE(struct virtio_net *) virtio_dev;
/* In dpdk_list. */
struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
rte_spinlock_t dpdkr_tx_lock;
rte_spinlock_t txq_lock;
};
struct netdev_rxq_dpdk {
@ -216,14 +231,16 @@ static bool thread_is_pmd(void);
static int netdev_dpdk_construct(struct netdev *);
struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
static bool
is_dpdk_class(const struct netdev_class *class)
{
return class->construct == netdev_dpdk_construct;
}
/* XXX: use dpdk malloc for entire OVS. infact huge page shld be used
* for all other sengments data, bss and text. */
/* XXX: use dpdk malloc for entire OVS. in fact huge page should be used
* for all other segments data, bss and text. */
static void *
dpdk_rte_mzalloc(size_t sz)
@ -483,7 +500,8 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *netdev, unsigned int n_txqs)
}
static int
netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no)
netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
enum dpdk_dev_type type)
OVS_REQUIRES(dpdk_mutex)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
@ -491,20 +509,24 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no)
int err = 0;
ovs_mutex_init(&netdev->mutex);
ovs_mutex_lock(&netdev->mutex);
/* If the 'sid' is negative, it means that the kernel fails
* to obtain the pci numa info. In that situation, always
* use 'SOCKET0'. */
sid = rte_eth_dev_socket_id(port_no);
if (type == DPDK_DEV_ETH) {
sid = rte_eth_dev_socket_id(port_no);
} else {
sid = rte_lcore_to_socket_id(rte_get_master_lcore());
}
netdev->socket_id = sid < 0 ? SOCKET0 : sid;
netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
netdev->port_id = port_no;
netdev->type = type;
netdev->flags = 0;
netdev->mtu = ETHER_MTU;
netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu);
rte_spinlock_init(&netdev->dpdkr_tx_lock);
rte_spinlock_init(&netdev->txq_lock);
netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu);
if (!netdev->dpdk_mp) {
@ -514,9 +536,13 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no)
netdev_->n_txq = NR_QUEUE;
netdev_->n_rxq = NR_QUEUE;
err = dpdk_eth_dev_init(netdev);
if (err) {
goto unlock;
if (type == DPDK_DEV_ETH) {
netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
err = dpdk_eth_dev_init(netdev);
if (err) {
goto unlock;
}
}
list_push_back(&dpdk_list, &netdev->list_node);
@ -544,6 +570,22 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
return 0;
}
static int
netdev_dpdk_vhost_construct(struct netdev *netdev_)
{
int err;
if (rte_eal_init_ret) {
return rte_eal_init_ret;
}
ovs_mutex_lock(&dpdk_mutex);
err = netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
ovs_mutex_unlock(&dpdk_mutex);
return err;
}
static int
netdev_dpdk_construct(struct netdev *netdev)
{
@ -561,7 +603,7 @@ netdev_dpdk_construct(struct netdev *netdev)
}
ovs_mutex_lock(&dpdk_mutex);
err = netdev_dpdk_init(netdev, port_no);
err = netdev_dpdk_init(netdev, port_no, DPDK_DEV_ETH);
ovs_mutex_unlock(&dpdk_mutex);
return err;
}
@ -580,8 +622,23 @@ netdev_dpdk_destruct(struct netdev *netdev_)
list_remove(&dev->list_node);
dpdk_mp_put(dev->dpdk_mp);
ovs_mutex_unlock(&dpdk_mutex);
}
ovs_mutex_destroy(&dev->mutex);
static void
netdev_dpdk_vhost_destruct(struct netdev *netdev_)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev_);
/* Can't remove a port while a guest is attached to it. */
if (netdev_dpdk_get_virtio(dev) != NULL) {
VLOG_ERR("Can not remove port, vhost device still attached");
return;
}
ovs_mutex_lock(&dpdk_mutex);
list_remove(&dev->list_node);
dpdk_mp_put(dev->dpdk_mp);
ovs_mutex_unlock(&dpdk_mutex);
}
static void
@ -635,6 +692,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned int n_txq,
netdev->up.n_txq = n_txq;
netdev->up.n_rxq = n_rxq;
rte_free(netdev->tx_q);
netdev_dpdk_alloc_txq(netdev, n_txq);
err = dpdk_eth_dev_init(netdev);
@ -645,6 +703,29 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned int n_txq,
return err;
}
static int
netdev_dpdk_vhost_set_multiq(struct netdev *netdev_, unsigned int n_txq,
unsigned int n_rxq)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
int err = 0;
if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) {
return err;
}
ovs_mutex_lock(&dpdk_mutex);
ovs_mutex_lock(&netdev->mutex);
netdev->up.n_txq = n_txq;
netdev->up.n_rxq = n_rxq;
ovs_mutex_unlock(&netdev->mutex);
ovs_mutex_unlock(&dpdk_mutex);
return err;
}
static struct netdev_rxq *
netdev_dpdk_rxq_alloc(void)
{
@ -731,6 +812,43 @@ dpdk_queue_flush(struct netdev_dpdk *dev, int qid)
dpdk_queue_flush__(dev, qid);
}
static bool
is_vhost_running(struct virtio_net *dev)
{
return (dev != NULL && (dev->flags & VIRTIO_DEV_RUNNING));
}
/*
* The receive path for the vhost port is the TX path out from guest.
*/
static int
netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq_,
struct dp_packet **packets, int *c)
{
struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq_);
struct netdev *netdev = rx->up.netdev;
struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
int qid = 1;
uint16_t nb_rx = 0;
if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) {
return EAGAIN;
}
nb_rx = rte_vhost_dequeue_burst(virtio_dev, qid,
vhost_dev->dpdk_mp->mp,
(struct rte_mbuf **)packets,
MAX_PKT_BURST);
if (!nb_rx) {
return EAGAIN;
}
vhost_dev->stats.rx_packets += (uint64_t)nb_rx;
*c = (int) nb_rx;
return 0;
}
static int
netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet **packets,
int *c)
@ -759,6 +877,38 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet **packets,
return 0;
}
static void
__netdev_dpdk_vhost_send(struct netdev *netdev, struct dp_packet **pkts,
int cnt, bool may_steal)
{
struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
int tx_pkts, i;
if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) {
ovs_mutex_lock(&vhost_dev->mutex);
vhost_dev->stats.tx_dropped+= cnt;
ovs_mutex_unlock(&vhost_dev->mutex);
goto out;
}
/* There is vHost TX single queue, So we need to lock it for TX. */
rte_spinlock_lock(&vhost_dev->txq_lock);
tx_pkts = rte_vhost_enqueue_burst(virtio_dev, VIRTIO_RXQ,
(struct rte_mbuf **)pkts, cnt);
vhost_dev->stats.tx_packets += tx_pkts;
vhost_dev->stats.tx_dropped += (cnt - tx_pkts);
rte_spinlock_unlock(&vhost_dev->txq_lock);
out:
if (may_steal) {
for (i = 0; i < cnt; i++) {
dp_packet_delete(pkts[i]);
}
}
}
inline static void
dpdk_queue_pkts(struct netdev_dpdk *dev, int qid,
struct rte_mbuf **pkts, int cnt)
@ -790,7 +940,7 @@ dpdk_queue_pkts(struct netdev_dpdk *dev, int qid,
/* Tx function. Transmit packets indefinitely */
static void
dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet ** pkts,
dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet **pkts,
int cnt)
OVS_NO_THREAD_SAFETY_ANALYSIS
{
@ -840,14 +990,37 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet ** pkts,
ovs_mutex_unlock(&dev->mutex);
}
dpdk_queue_pkts(dev, qid, mbufs, newcnt);
dpdk_queue_flush(dev, qid);
if (dev->type == DPDK_DEV_VHOST) {
__netdev_dpdk_vhost_send(netdev, (struct dp_packet **) mbufs, newcnt, true);
} else {
dpdk_queue_pkts(dev, qid, mbufs, newcnt);
dpdk_queue_flush(dev, qid);
}
if (!thread_is_pmd()) {
ovs_mutex_unlock(&nonpmd_mempool_mutex);
}
}
static int
netdev_dpdk_vhost_send(struct netdev *netdev, int qid OVS_UNUSED, struct dp_packet **pkts,
int cnt, bool may_steal)
{
if (OVS_UNLIKELY(pkts[0]->source != DPBUF_DPDK)) {
int i;
dpdk_do_tx_copy(netdev, qid, pkts, cnt);
if (may_steal) {
for (i = 0; i < cnt; i++) {
dp_packet_delete(pkts[i]);
}
}
} else {
__netdev_dpdk_vhost_send(netdev, pkts, cnt, may_steal);
}
return 0;
}
static inline void
netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
struct dp_packet **pkts, int cnt, bool may_steal)
@ -1001,6 +1174,44 @@ out:
static int
netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
static int
netdev_dpdk_vhost_get_stats(const struct netdev *netdev,
struct netdev_stats *stats)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
ovs_mutex_lock(&dev->mutex);
memset(stats, 0, sizeof(*stats));
/* Unsupported Stats */
stats->rx_errors = UINT64_MAX;
stats->tx_errors = UINT64_MAX;
stats->multicast = UINT64_MAX;
stats->collisions = UINT64_MAX;
stats->rx_crc_errors = UINT64_MAX;
stats->rx_fifo_errors = UINT64_MAX;
stats->rx_frame_errors = UINT64_MAX;
stats->rx_length_errors = UINT64_MAX;
stats->rx_missed_errors = UINT64_MAX;
stats->rx_over_errors = UINT64_MAX;
stats->tx_aborted_errors = UINT64_MAX;
stats->tx_carrier_errors = UINT64_MAX;
stats->tx_errors = UINT64_MAX;
stats->tx_fifo_errors = UINT64_MAX;
stats->tx_heartbeat_errors = UINT64_MAX;
stats->tx_window_errors = UINT64_MAX;
stats->rx_bytes += UINT64_MAX;
stats->rx_dropped += UINT64_MAX;
stats->tx_bytes += UINT64_MAX;
/* Supported Stats */
stats->rx_packets += dev->stats.rx_packets;
stats->tx_packets += dev->stats.tx_packets;
stats->tx_dropped += dev->stats.tx_dropped;
ovs_mutex_unlock(&dev->mutex);
return 0;
}
static int
netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats)
{
@ -1095,6 +1306,26 @@ netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier)
ovs_mutex_lock(&dev->mutex);
check_link_status(dev);
*carrier = dev->link.link_status;
ovs_mutex_unlock(&dev->mutex);
return 0;
}
static int
netdev_dpdk_vhost_get_carrier(const struct netdev *netdev_, bool *carrier)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev_);
struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev);
ovs_mutex_lock(&dev->mutex);
if (is_vhost_running(virtio_dev)) {
*carrier = 1;
} else {
*carrier = 0;
}
ovs_mutex_unlock(&dev->mutex);
return 0;
@ -1139,18 +1370,20 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
return 0;
}
if (dev->flags & NETDEV_UP) {
err = rte_eth_dev_start(dev->port_id);
if (err)
return -err;
}
if (dev->type == DPDK_DEV_ETH) {
if (dev->flags & NETDEV_UP) {
err = rte_eth_dev_start(dev->port_id);
if (err)
return -err;
}
if (dev->flags & NETDEV_PROMISC) {
rte_eth_promiscuous_enable(dev->port_id);
}
if (dev->flags & NETDEV_PROMISC) {
rte_eth_promiscuous_enable(dev->port_id);
}
if (!(dev->flags & NETDEV_UP)) {
rte_eth_dev_stop(dev->port_id);
if (!(dev->flags & NETDEV_UP)) {
rte_eth_dev_stop(dev->port_id);
}
}
return 0;
@ -1261,6 +1494,134 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, int argc,
unixctl_command_reply(conn, "OK");
}
/*
* Set virtqueue flags so that we do not receive interrupts.
*/
static void
set_irq_status(struct virtio_net *dev)
{
dev->virtqueue[VIRTIO_RXQ]->used->flags = VRING_USED_F_NO_NOTIFY;
dev->virtqueue[VIRTIO_TXQ]->used->flags = VRING_USED_F_NO_NOTIFY;
}
/*
* A new virtio-net device is added to a vhost port.
*/
static int
new_device(struct virtio_net *dev)
{
struct netdev_dpdk *netdev;
bool exists = false;
ovs_mutex_lock(&dpdk_mutex);
/* Add device to the vhost port with the same name as that passed down. */
LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
ovs_mutex_lock(&netdev->mutex);
ovsrcu_set(&netdev->virtio_dev, dev);
ovs_mutex_unlock(&netdev->mutex);
exists = true;
dev->flags |= VIRTIO_DEV_RUNNING;
/* Disable notifications. */
set_irq_status(dev);
break;
}
}
ovs_mutex_unlock(&dpdk_mutex);
if (!exists) {
VLOG_INFO("vHost Device '%s' (%ld) can't be added - name not found",
dev->ifname, dev->device_fh);
return -1;
}
VLOG_INFO("vHost Device '%s' (%ld) has been added",
dev->ifname, dev->device_fh);
return 0;
}
/*
* Remove a virtio-net device from the specific vhost port. Use dev->remove
* flag to stop any more packets from being sent or received to/from a VM and
* ensure all currently queued packets have been sent/received before removing
* the device.
*/
static void
destroy_device(volatile struct virtio_net *dev)
{
struct netdev_dpdk *vhost_dev;
ovs_mutex_lock(&dpdk_mutex);
LIST_FOR_EACH (vhost_dev, list_node, &dpdk_list) {
if (netdev_dpdk_get_virtio(vhost_dev) == dev) {
ovs_mutex_lock(&vhost_dev->mutex);
dev->flags &= ~VIRTIO_DEV_RUNNING;
ovsrcu_set(&vhost_dev->virtio_dev, NULL);
ovs_mutex_unlock(&vhost_dev->mutex);
/*
* Wait for other threads to quiesce before
* setting the virtio_dev to NULL.
*/
ovsrcu_synchronize();
}
}
ovs_mutex_unlock(&dpdk_mutex);
VLOG_INFO("vHost Device '%s' (%ld) has been removed",
dev->ifname, dev->device_fh);
}
struct virtio_net *
netdev_dpdk_get_virtio(const struct netdev_dpdk *dev)
{
return ovsrcu_get(struct virtio_net *, &dev->virtio_dev);
}
/*
* These callbacks allow virtio-net devices to be added to vhost ports when
* configuration has been fully complete.
*/
const struct virtio_net_device_ops virtio_net_device_ops =
{
.new_device = new_device,
.destroy_device = destroy_device,
};
static void *
start_cuse_session_loop(void *dummy OVS_UNUSED)
{
pthread_detach(pthread_self());
rte_vhost_driver_session_start();
return NULL;
}
static int
dpdk_vhost_class_init(void)
{
pthread_t thread;
int err = -1;
rte_vhost_driver_callback_register(&virtio_net_device_ops);
/* Register CUSE device to handle IOCTLs.
* Unless otherwise specified on the vswitchd command line, cuse_dev_name
* is set to vhost-net.
*/
err = rte_vhost_driver_register(cuse_dev_name);
if (err != 0) {
VLOG_ERR("CUSE device setup failure.");
return -1;
}
/* start_cuse_session_loop blocks OVS RCU quiescent state, so directly use
* pthread API. */
return pthread_create(&thread, NULL, start_cuse_session_loop, NULL);
}
static void
dpdk_common_init(void)
{
@ -1345,7 +1706,7 @@ dpdk_ring_open(const char dev_name[], unsigned int *eth_port_id) OVS_REQUIRES(dp
/* look through our list to find the device */
LIST_FOR_EACH (ivshmem, list_node, &dpdk_ring_list) {
if (ivshmem->user_port_id == port_no) {
VLOG_INFO("Found dpdk ring device %s:\n", dev_name);
VLOG_INFO("Found dpdk ring device %s:", dev_name);
*eth_port_id = ivshmem->eth_port_id; /* really all that is needed */
return 0;
}
@ -1361,9 +1722,9 @@ netdev_dpdk_ring_send(struct netdev *netdev, int qid OVS_UNUSED,
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
/* DPDK Rings have a single TX queue, Therefore needs locking. */
rte_spinlock_lock(&dev->dpdkr_tx_lock);
rte_spinlock_lock(&dev->txq_lock);
netdev_dpdk_send__(dev, 0, pkts, cnt, may_steal);
rte_spinlock_unlock(&dev->dpdkr_tx_lock);
rte_spinlock_unlock(&dev->txq_lock);
return 0;
}
@ -1384,14 +1745,15 @@ netdev_dpdk_ring_construct(struct netdev *netdev)
goto unlock_dpdk;
}
err = netdev_dpdk_init(netdev, port_no);
err = netdev_dpdk_init(netdev, port_no, DPDK_DEV_ETH);
unlock_dpdk:
ovs_mutex_unlock(&dpdk_mutex);
return err;
}
#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, MULTIQ, SEND) \
#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \
{ \
NAME, \
INIT, /* init */ \
@ -1400,14 +1762,14 @@ unlock_dpdk:
\
netdev_dpdk_alloc, \
CONSTRUCT, \
netdev_dpdk_destruct, \
DESTRUCT, \
netdev_dpdk_dealloc, \
netdev_dpdk_get_config, \
NULL, /* netdev_dpdk_set_config */ \
NULL, /* get_tunnel_config */ \
NULL, /* build header */ \
NULL, /* push header */ \
NULL, /* pop header */ \
NULL, /* build header */ \
NULL, /* push header */ \
NULL, /* pop header */ \
netdev_dpdk_get_numa_id, /* get_numa_id */ \
MULTIQ, /* set_multiq */ \
\
@ -1419,11 +1781,11 @@ unlock_dpdk:
netdev_dpdk_get_mtu, \
netdev_dpdk_set_mtu, \
netdev_dpdk_get_ifindex, \
netdev_dpdk_get_carrier, \
GET_CARRIER, \
netdev_dpdk_get_carrier_resets, \
netdev_dpdk_set_miimon, \
netdev_dpdk_get_stats, \
netdev_dpdk_get_features, \
GET_STATS, \
GET_FEATURES, \
NULL, /* set_advertisements */ \
\
NULL, /* set_policing */ \
@ -1445,7 +1807,7 @@ unlock_dpdk:
NULL, /* get_in6 */ \
NULL, /* add_router */ \
NULL, /* get_next_hop */ \
netdev_dpdk_get_status, \
GET_STATUS, \
NULL, /* arp_lookup */ \
\
netdev_dpdk_update_flags, \
@ -1454,7 +1816,7 @@ unlock_dpdk:
netdev_dpdk_rxq_construct, \
netdev_dpdk_rxq_destruct, \
netdev_dpdk_rxq_dealloc, \
netdev_dpdk_rxq_recv, \
RXQ_RECV, \
NULL, /* rx_wait */ \
NULL, /* rxq_drain */ \
}
@ -1463,20 +1825,48 @@ int
dpdk_init(int argc, char **argv)
{
int result;
int base = 0;
char *pragram_name = argv[0];
if (argc < 2 || strcmp(argv[1], "--dpdk"))
return 0;
/* Make sure program name passed to rte_eal_init() is vswitchd. */
argv[1] = argv[0];
/* Remove the --dpdk argument from arg list.*/
argc--;
argv++;
/* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
* this string if it meets the correct criteria. Otherwise, set it to the
* default (vhost-net).
*/
if (!strcmp(argv[1], "--cuse_dev_name") &&
(strlen(argv[2]) <= NAME_MAX)) {
cuse_dev_name = strdup(argv[2]);
/* Remove the cuse_dev_name configuration parameters from the argument
* list, so that the correct elements are passed to the DPDK
* initialization function
*/
argc -= 2;
argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */
base = 2;
VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
} else {
cuse_dev_name = "vhost-net";
VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
}
/* Keep the program name argument as this is needed for call to
* rte_eal_init()
*/
argv[0] = pragram_name;
/* Make sure things are initialized ... */
result = rte_eal_init(argc, argv);
if (result < 0) {
ovs_abort(result, "Cannot init EAL\n");
ovs_abort(result, "Cannot init EAL");
}
rte_memzone_dump(stdout);
@ -1489,7 +1879,7 @@ dpdk_init(int argc, char **argv)
/* We are called from the main thread here */
thread_set_nonpmd();
return result + 1;
return result + 1 + base;
}
const struct netdev_class dpdk_class =
@ -1497,16 +1887,42 @@ const struct netdev_class dpdk_class =
"dpdk",
NULL,
netdev_dpdk_construct,
netdev_dpdk_destruct,
netdev_dpdk_set_multiq,
netdev_dpdk_eth_send);
netdev_dpdk_eth_send,
netdev_dpdk_get_carrier,
netdev_dpdk_get_stats,
netdev_dpdk_get_features,
netdev_dpdk_get_status,
netdev_dpdk_rxq_recv);
const struct netdev_class dpdk_ring_class =
NETDEV_DPDK_CLASS(
"dpdkr",
NULL,
netdev_dpdk_ring_construct,
netdev_dpdk_destruct,
NULL,
netdev_dpdk_ring_send);
netdev_dpdk_ring_send,
netdev_dpdk_get_carrier,
netdev_dpdk_get_stats,
netdev_dpdk_get_features,
netdev_dpdk_get_status,
netdev_dpdk_rxq_recv);
const struct netdev_class dpdk_vhost_class =
NETDEV_DPDK_CLASS(
"dpdkvhost",
dpdk_vhost_class_init,
netdev_dpdk_vhost_construct,
netdev_dpdk_vhost_destruct,
netdev_dpdk_vhost_set_multiq,
netdev_dpdk_vhost_send,
netdev_dpdk_vhost_get_carrier,
netdev_dpdk_vhost_get_stats,
NULL,
NULL,
netdev_dpdk_vhost_rxq_recv);
void
netdev_dpdk_register(void)
@ -1521,6 +1937,7 @@ netdev_dpdk_register(void)
dpdk_common_init();
netdev_register_provider(&dpdk_class);
netdev_register_provider(&dpdk_ring_class);
netdev_register_provider(&dpdk_vhost_class);
ovsthread_once_done(&once);
}
}

View File

@ -108,7 +108,8 @@ bool
netdev_is_pmd(const struct netdev *netdev)
{
return (!strcmp(netdev->netdev_class->type, "dpdk") ||
!strcmp(netdev->netdev_class->type, "dpdkr"));
!strcmp(netdev->netdev_class->type, "dpdkr") ||
!strcmp(netdev->netdev_class->type, "dpdkvhost"));
}
static void

View File

@ -46,7 +46,8 @@ EXTRA_DIST += \
utilities/ovs-save \
utilities/ovs-tcpundump.in \
utilities/ovs-test.in \
utilities/ovs-vlan-test.in
utilities/ovs-vlan-test.in \
utilities/qemu-wrap.py
MAN_ROOTS += \
utilities/ovs-appctl.8.in \
utilities/ovs-benchmark.1.in \

389
utilities/qemu-wrap.py Executable file
View File

@ -0,0 +1,389 @@
#!/usr/bin/python
#
# BSD LICENSE
#
# Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
#####################################################################
# This script is designed to modify the call to the QEMU emulator
# to support userspace vhost when starting a guest machine through
# libvirt with vhost enabled. The steps to enable this are as follows
# and should be run as root:
#
# 1. Place this script in a libvirtd's binary search PATH ($PATH)
# A good location would be in the same directory that the QEMU
# binary is located
#
# 2. Ensure that the script has the same owner/group and file
# permissions as the QEMU binary
#
# 3. Update the VM xml file using "virsh edit VM.xml"
#
# 3.a) Set the VM to use the launch script
#
# Set the emulator path contained in the
# <emulator><emulator/> tags
#
# e.g replace <emulator>/usr/bin/qemu-kvm<emulator/>
# with <emulator>/usr/bin/qemu-wrap.py<emulator/>
#
# 3.b) Set the VM's device's to use vhost-net offload
#
# <interface type="network">
# <model type="virtio"/>
# <driver name="vhost"/>
# <interface/>
#
# 4. Enable libvirt to access our userpace device file by adding it to
# controllers cgroup for libvirtd using the following steps
#
# 4.a) In /etc/libvirt/qemu.conf add/edit the following lines:
# 1) cgroup_controllers = [ ... "devices", ... ]
# 2) clear_emulator_capabilities = 0
# 3) user = "root"
# 4) group = "root"
# 5) cgroup_device_acl = [
# "/dev/null", "/dev/full", "/dev/zero",
# "/dev/random", "/dev/urandom",
# "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
# "/dev/rtc", "/dev/hpet", "/dev/net/tun",
# "/dev/<devbase-name>-<index>",
# "/dev/hugepages"
# ]
#
# 4.b) Disable SELinux or set to permissive mode
#
# 4.c) Mount cgroup device controller
# "mkdir /dev/cgroup"
# "mount -t cgroup none /dev/cgroup -o devices"
#
# 4.d) Set hugetlbfs_mount variable - ( Optional )
# VMs using userspace vhost must use hugepage backed
# memory. This can be enabled in the libvirt XML
# config by adding a memory backing section to the
# XML config e.g.
# <memoryBacking>
# <hugepages/>
# </memoryBacking>
# This memory backing section should be added after the
# <memory> and <currentMemory> sections. This will add
# flags "-mem-prealloc -mem-path <path>" to the QEMU
# command line. The hugetlbfs_mount variable can be used
# to override the default <path> passed through by libvirt.
#
# if "-mem-prealloc" or "-mem-path <path>" are not passed
# through and a vhost device is detected then these options will
# be automatically added by this script. This script will detect
# the system hugetlbfs mount point to be used for <path>. The
# default <path> for this script can be overidden by the
# hugetlbfs_dir variable in the configuration section of this script.
#
#
# 4.e) Restart the libvirtd system process
# e.g. on Fedora "systemctl restart libvirtd.service"
#
#
# 4.f) Edit the Configuration Parameters section of this script
# to point to the correct emulator location and set any
# addition options
#
# The script modifies the libvirtd Qemu call by modifying/adding
# options based on the configuration parameters below.
# NOTE:
# emul_path and us_vhost_path must be set
# All other parameters are optional
#####################################################################
#############################################
# Configuration Parameters
#############################################
#Path to QEMU binary
emul_path = "/usr/local/bin/qemu-system-x86_64"
#Path to userspace vhost device file
# This filename should match the --dev-basename --dev-index parameters of
# the command used to launch the userspace vhost sample application e.g.
# if the sample app lauch command is:
# ./build/vhost-switch ..... --dev-basename usvhost --dev-index 1
# then this variable should be set to:
# us_vhost_path = "/dev/usvhost-1"
us_vhost_path = "/dev/usvhost-1"
#List of additional user defined emulation options. These options will
#be added to all Qemu calls
emul_opts_user = []
#List of additional user defined emulation options for vhost only.
#These options will only be added to vhost enabled guests
emul_opts_user_vhost = []
#For all VHOST enabled VMs, the VM memory is preallocated from hugetlbfs
# Set this variable to one to enable this option for all VMs
use_huge_all = 0
#Instead of autodetecting, override the hugetlbfs directory by setting
#this variable
hugetlbfs_dir = ""
#############################################
#############################################
# ****** Do Not Modify Below this Line ******
#############################################
import sys, os, subprocess
import time
import signal
#List of open userspace vhost file descriptors
fd_list = []
#additional virtio device flags when using userspace vhost
vhost_flags = [ "csum=off",
"gso=off",
"guest_tso4=off",
"guest_tso6=off",
"guest_ecn=off"
]
#String of the path to the Qemu process pid
qemu_pid = "/tmp/%d-qemu.pid" % os.getpid()
#############################################
# Signal haldler to kill Qemu subprocess
#############################################
def kill_qemu_process(signum, stack):
pidfile = open(qemu_pid, 'r')
pid = int(pidfile.read())
os.killpg(pid, signal.SIGTERM)
pidfile.close()
#############################################
# Find the system hugefile mount point.
# Note:
# if multiple hugetlbfs mount points exist
# then the first one found will be used
#############################################
def find_huge_mount():
if (len(hugetlbfs_dir)):
return hugetlbfs_dir
huge_mount = ""
if (os.access("/proc/mounts", os.F_OK)):
f = open("/proc/mounts", "r")
line = f.readline()
while line:
line_split = line.split(" ")
if line_split[2] == 'hugetlbfs':
huge_mount = line_split[1]
break
line = f.readline()
else:
print "/proc/mounts not found"
exit (1)
f.close
if len(huge_mount) == 0:
print "Failed to find hugetlbfs mount point"
exit (1)
return huge_mount
#############################################
# Get a userspace Vhost file descriptor
#############################################
def get_vhost_fd():
if (os.access(us_vhost_path, os.F_OK)):
fd = os.open( us_vhost_path, os.O_RDWR)
else:
print ("US-Vhost file %s not found" %us_vhost_path)
exit (1)
return fd
#############################################
# Check for vhostfd. if found then replace
# with our own vhost fd and append any vhost
# flags onto the end
#############################################
def modify_netdev_arg(arg):
global fd_list
vhost_in_use = 0
s = ''
new_opts = []
netdev_opts = arg.split(",")
for opt in netdev_opts:
#check if vhost is used
if "vhost" == opt[:5]:
vhost_in_use = 1
else:
new_opts.append(opt)
#if using vhost append vhost options
if vhost_in_use == 1:
#append vhost on option
new_opts.append('vhost=on')
#append vhostfd ption
new_fd = get_vhost_fd()
new_opts.append('vhostfd=' + str(new_fd))
fd_list.append(new_fd)
#concatenate all options
for opt in new_opts:
if len(s) > 0:
s+=','
s+=opt
return s
#############################################
# Main
#############################################
def main():
global fd_list
global vhost_in_use
new_args = []
num_cmd_args = len(sys.argv)
emul_call = ''
mem_prealloc_set = 0
mem_path_set = 0
num = 0;
#parse the parameters
while (num < num_cmd_args):
arg = sys.argv[num]
#Check netdev +1 parameter for vhostfd
if arg == '-netdev':
num_vhost_devs = len(fd_list)
new_args.append(arg)
num+=1
arg = sys.argv[num]
mod_arg = modify_netdev_arg(arg)
new_args.append(mod_arg)
#append vhost flags if this is a vhost device
# and -device is the next arg
# i.e -device -opt1,-opt2,...,-opt3,%vhost
if (num_vhost_devs < len(fd_list)):
num+=1
arg = sys.argv[num]
if arg == '-device':
new_args.append(arg)
num+=1
new_arg = sys.argv[num]
for flag in vhost_flags:
new_arg = ''.join([new_arg,',',flag])
new_args.append(new_arg)
else:
new_args.append(arg)
elif arg == '-mem-prealloc':
mem_prealloc_set = 1
new_args.append(arg)
elif arg == '-mem-path':
mem_path_set = 1
new_args.append(arg)
else:
new_args.append(arg)
num+=1
#Set Qemu binary location
emul_call+=emul_path
emul_call+=" "
#Add prealloc mem options if using vhost and not already added
if ((len(fd_list) > 0) and (mem_prealloc_set == 0)):
emul_call += "-mem-prealloc "
#Add mempath mem options if using vhost and not already added
if ((len(fd_list) > 0) and (mem_path_set == 0)):
#Detect and add hugetlbfs mount point
mp = find_huge_mount()
mp = "".join(["-mem-path ", mp])
emul_call += mp
emul_call += " "
#add user options
for opt in emul_opts_user:
emul_call += opt
emul_call += " "
#Add add user vhost only options
if len(fd_list) > 0:
for opt in emul_opts_user_vhost:
emul_call += opt
emul_call += " "
#Add updated libvirt options
iter_args = iter(new_args)
#skip 1st arg i.e. call to this script
next(iter_args)
for arg in iter_args:
emul_call+=str(arg)
emul_call+= " "
emul_call += "-pidfile %s " % qemu_pid
#Call QEMU
process = subprocess.Popen(emul_call, shell=True, preexec_fn=os.setsid)
for sig in [signal.SIGTERM, signal.SIGINT, signal.SIGHUP, signal.SIGQUIT]:
signal.signal(sig, kill_qemu_process)
process.wait()
#Close usvhost files
for fd in fd_list:
os.close(fd)
#Cleanup temporary files
if os.access(qemu_pid, os.F_OK):
os.remove(qemu_pid)
if __name__ == "__main__":
main()

View File

@ -252,11 +252,13 @@ usage(void)
daemon_usage();
vlog_usage();
printf("\nDPDK options:\n"
" --dpdk options Initialize DPDK datapath.\n");
" --dpdk options Initialize DPDK datapath.\n"
" --cuse_dev_name BASENAME override default character device name\n"
" for use with userspace vHost.\n");
printf("\nOther options:\n"
" --unixctl=SOCKET override default control socket name\n"
" -h, --help display this help message\n"
" -V, --version display version information\n");
" --unixctl=SOCKET override default control socket name\n"
" -h, --help display this help message\n"
" -V, --version display version information\n");
exit(EXIT_SUCCESS);
}