mirror of
https://github.com/openvswitch/ovs
synced 2025-08-31 06:15:47 +00:00
doc: Add "PMD" topic document
This continues the breakup of the huge DPDK "howto" into smaller components. There are a couple of related changes included, such as using "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit is contained in:
committed by
Ian Stokes
parent
6477dbb9d6
commit
31d0dae22a
@@ -35,6 +35,7 @@ DOC_SOURCE = \
|
||||
Documentation/topics/design.rst \
|
||||
Documentation/topics/dpdk/index.rst \
|
||||
Documentation/topics/dpdk/phy.rst \
|
||||
Documentation/topics/dpdk/pmd.rst \
|
||||
Documentation/topics/dpdk/ring.rst \
|
||||
Documentation/topics/dpdk/vhost-user.rst \
|
||||
Documentation/topics/testing.rst \
|
||||
|
@@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
|
||||
$ ovs-appctl -t ovsdb-server exit
|
||||
$ ovs-vsctl del-br br0
|
||||
|
||||
PMD Thread Statistics
|
||||
---------------------
|
||||
|
||||
To show current stats::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-stats-show
|
||||
|
||||
To clear previous stats::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-stats-clear
|
||||
|
||||
Port/RXQ Assigment to PMD Threads
|
||||
---------------------------------
|
||||
|
||||
To show port/rxq assignment::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
||||
|
||||
To change default rxq assignment to pmd threads, rxqs may be manually pinned to
|
||||
desired cores using::
|
||||
|
||||
$ ovs-vsctl set Interface <iface> \
|
||||
other_config:pmd-rxq-affinity=<rxq-affinity-list>
|
||||
|
||||
where:
|
||||
|
||||
- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
|
||||
|
||||
For example::
|
||||
|
||||
$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
|
||||
other_config:pmd-rxq-affinity="0:3,1:7,3:8"
|
||||
|
||||
This will ensure:
|
||||
|
||||
- Queue #0 pinned to core 3
|
||||
- Queue #1 pinned to core 7
|
||||
- Queue #2 not pinned
|
||||
- Queue #3 pinned to core 8
|
||||
|
||||
After that PMD threads on cores where RX queues was pinned will become
|
||||
``isolated``. This means that this thread will poll only pinned RX queues.
|
||||
|
||||
.. warning::
|
||||
If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will
|
||||
not be polled. Also, if provided ``core_id`` is not available (ex. this
|
||||
``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD
|
||||
thread.
|
||||
|
||||
If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
|
||||
automatically. The processing cycles that have been stored for each rxq
|
||||
will be used where known to assign rxqs to pmd based on a round robin of the
|
||||
sorted rxqs.
|
||||
|
||||
For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8)
|
||||
available, and the measured usage of core cycles per rxq over the last
|
||||
interval is seen to be:
|
||||
|
||||
- Queue #0: 30%
|
||||
- Queue #1: 80%
|
||||
- Queue #3: 60%
|
||||
- Queue #4: 70%
|
||||
- Queue #5: 10%
|
||||
|
||||
The rxqs will be assigned to cores 3,7,8 in the following order:
|
||||
|
||||
Core 3: Q1 (80%) |
|
||||
Core 7: Q4 (70%) | Q5 (10%)
|
||||
core 8: Q3 (60%) | Q0 (30%)
|
||||
|
||||
To see the current measured usage history of pmd core cycles for each rxq::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
||||
|
||||
.. note::
|
||||
|
||||
A history of one minute is recorded and shown for each rxq to allow for
|
||||
traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic
|
||||
pattern or reconfig changes will take one minute before they are fully
|
||||
reflected in the stats.
|
||||
|
||||
Rxq to pmds assignment takes place whenever there are configuration changes
|
||||
or can be triggered by using::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
|
||||
|
||||
QoS
|
||||
---
|
||||
|
||||
|
@@ -31,3 +31,4 @@ The DPDK Datapath
|
||||
phy
|
||||
vhost-user
|
||||
ring
|
||||
pmd
|
||||
|
@@ -113,3 +113,15 @@ tool::
|
||||
For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
|
||||
|
||||
.. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
|
||||
|
||||
.. _dpdk-phy-multiqueue:
|
||||
|
||||
Multiqueue
|
||||
----------
|
||||
|
||||
Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
|
||||
the DPDK datapath. Correct configuration of PMD threads and the Rx queues they
|
||||
utilize is a requirement in order to deliver the high-performance possible with
|
||||
DPDK acceleration. It is possible to configure multiple Rx queues for ``dpdk``
|
||||
ports, thus ensuring this is not a bottleneck for performance. For information
|
||||
on configuring PMD threads, refer to :doc:`pmd`.
|
||||
|
161
Documentation/topics/dpdk/pmd.rst
Normal file
161
Documentation/topics/dpdk/pmd.rst
Normal file
@@ -0,0 +1,161 @@
|
||||
..
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||
not use this file except in compliance with the License. You may obtain
|
||||
a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||
License for the specific language governing permissions and limitations
|
||||
under the License.
|
||||
|
||||
Convention for heading levels in Open vSwitch documentation:
|
||||
|
||||
======= Heading 0 (reserved for the title in a document)
|
||||
------- Heading 1
|
||||
~~~~~~~ Heading 2
|
||||
+++++++ Heading 3
|
||||
''''''' Heading 4
|
||||
|
||||
Avoid deeper levels because they do not render well.
|
||||
|
||||
===========
|
||||
PMD Threads
|
||||
===========
|
||||
|
||||
Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
|
||||
the DPDK datapath and perform tasks such as continuous polling of input ports
|
||||
for packets, classifying packets once received, and executing actions on the
|
||||
packets once they are classified.
|
||||
|
||||
PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly known as
|
||||
*rxq*\s and *txq*\s. While Tx queue configuration happens automatically, Rx
|
||||
queues can be configured by the user. This can happen in one of two ways:
|
||||
|
||||
- For physical interfaces, configuration is done using the
|
||||
:program:`ovs-appctl` utility.
|
||||
|
||||
- For virtual interfaces, configuration is done using the :program:`ovs-appctl`
|
||||
utility, but this configuration must be reflected in the guest configuration
|
||||
(e.g. QEMU command line arguments).
|
||||
|
||||
The :program:`ovs-appctl` utility also provides a number of commands for
|
||||
querying PMD threads and their respective queues. This, and all of the above,
|
||||
is discussed here.
|
||||
|
||||
.. todo::
|
||||
|
||||
Add an overview of Tx queues including numbers created, how they relate to
|
||||
PMD threads, etc.
|
||||
|
||||
PMD Thread Statistics
|
||||
---------------------
|
||||
|
||||
To show current stats::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-stats-show
|
||||
|
||||
To clear previous stats::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-stats-clear
|
||||
|
||||
Port/Rx Queue Assigment to PMD Threads
|
||||
--------------------------------------
|
||||
|
||||
.. todo::
|
||||
|
||||
This needs a more detailed overview of *why* this should be done, along with
|
||||
the impact on things like NUMA affinity.
|
||||
|
||||
Correct configuration of PMD threads and the Rx queues they utilize is a
|
||||
requirement in order to achieve maximum performance. This is particularly true
|
||||
for enabling things like multiqueue for :ref:`physical <dpdk-phy-multiqueue>`
|
||||
and :ref:`vhost-user <dpdk-vhost-user>` interfaces.
|
||||
|
||||
To show port/Rx queue assignment::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
||||
|
||||
Rx queues may be manually pinned to cores. This will change the default Rx
|
||||
queue assignment to PMD threads::
|
||||
|
||||
$ ovs-vsctl set Interface <iface> \
|
||||
other_config:pmd-rxq-affinity=<rxq-affinity-list>
|
||||
|
||||
where:
|
||||
|
||||
- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
|
||||
|
||||
For example::
|
||||
|
||||
$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
|
||||
other_config:pmd-rxq-affinity="0:3,1:7,3:8"
|
||||
|
||||
This will ensure there are *4* Rx queues and that these queues are configured
|
||||
like so:
|
||||
|
||||
- Queue #0 pinned to core 3
|
||||
- Queue #1 pinned to core 7
|
||||
- Queue #2 not pinned
|
||||
- Queue #3 pinned to core 8
|
||||
|
||||
PMD threads on cores where Rx queues are *pinned* will become *isolated*. This
|
||||
means that this thread will only poll the *pinned* Rx queues.
|
||||
|
||||
.. warning::
|
||||
|
||||
If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not
|
||||
be polled. Also, if the provided ``<core-id>`` is not available (e.g. the
|
||||
``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be polled
|
||||
by any PMD thread.
|
||||
|
||||
If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
|
||||
(cores) automatically. Where known, the processing cycles that have been stored
|
||||
for each Rx queue will be used to assign Rx queue to PMDs based on a round
|
||||
robin of the sorted Rx queues. For example, take the following example, where
|
||||
there are five Rx queues and three cores - 3, 7, and 8 - available and the
|
||||
measured usage of core cycles per Rx queue over the last interval is seen to
|
||||
be:
|
||||
|
||||
- Queue #0: 30%
|
||||
- Queue #1: 80%
|
||||
- Queue #3: 60%
|
||||
- Queue #4: 70%
|
||||
- Queue #5: 10%
|
||||
|
||||
The Rx queues will be assigned to the cores in the following order::
|
||||
|
||||
Core 3: Q1 (80%) |
|
||||
Core 7: Q4 (70%) | Q5 (10%)
|
||||
Core 8: Q3 (60%) | Q0 (30%)
|
||||
|
||||
To see the current measured usage history of PMD core cycles for each Rx
|
||||
queue::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
||||
|
||||
.. note::
|
||||
|
||||
A history of one minute is recorded and shown for each Rx queue to allow for
|
||||
traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
|
||||
due to traffic pattern or reconfig changes, will take one minute to be fully
|
||||
reflected in the stats.
|
||||
|
||||
Rx queue to PMD assignment takes place whenever there are configuration changes
|
||||
or can be triggered by using::
|
||||
|
||||
$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
|
||||
|
||||
.. versionchanged:: 2.8.0
|
||||
|
||||
Automatic assignment of Rx queues to PMDs and the two related commands,
|
||||
``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. Prior
|
||||
to this, behavior was round-robin and processing cycles were not taken into
|
||||
consideration. Tracking for stats was not available.
|
||||
|
||||
.. versionchanged:: 2.9.0
|
||||
|
||||
The output of ``pmd-rxq-show`` was modified to include utilization as a
|
||||
percentage.
|
@@ -130,11 +130,10 @@ an additional set of parameters::
|
||||
-netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
|
||||
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
|
||||
|
||||
In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user
|
||||
ports access a virtio-net device's virtual rings and packet buffers mapping the
|
||||
VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
|
||||
memory into their process address space, pass the following parameters to
|
||||
QEMU::
|
||||
In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
|
||||
access a virtio-net device's virtual rings and packet buffers mapping the VM's
|
||||
physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory
|
||||
into their process address space, pass the following parameters to QEMU::
|
||||
|
||||
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
|
||||
-numa node,memdev=mem -mem-prealloc
|
||||
@@ -154,18 +153,18 @@ where:
|
||||
The number of vectors, which is ``$q`` * 2 + 2
|
||||
|
||||
The vhost-user interface will be automatically reconfigured with required
|
||||
number of rx and tx queues after connection of virtio device. Manual
|
||||
number of Rx and Tx queues after connection of virtio device. Manual
|
||||
configuration of ``n_rxq`` is not supported because OVS will work properly only
|
||||
if ``n_rxq`` will match number of queues configured in QEMU.
|
||||
|
||||
A least 2 PMDs should be configured for the vswitch when using multiqueue.
|
||||
A least two PMDs should be configured for the vswitch when using multiqueue.
|
||||
Using a single PMD will cause traffic to be enqueued to the same vhost queue
|
||||
rather than being distributed among different vhost queues for a vhost-user
|
||||
interface.
|
||||
|
||||
If traffic destined for a VM configured with multiqueue arrives to the vswitch
|
||||
via a physical DPDK port, then the number of rxqs should also be set to at
|
||||
least 2 for that physical DPDK port. This is required to increase the
|
||||
via a physical DPDK port, then the number of Rx queues should also be set to at
|
||||
least two for that physical DPDK port. This is required to increase the
|
||||
probability that a different PMD will handle the multiqueue transmission to the
|
||||
guest using a different vhost queue.
|
||||
|
||||
|
Reference in New Issue
Block a user