mirror of
https://github.com/openvswitch/ovs
synced 2025-09-02 07:15:17 +00:00
doc: Add "PMD" topic document
This continues the breakup of the huge DPDK "howto" into smaller components. There are a couple of related changes included, such as using "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. Signed-off-by: Stephen Finucane <stephen@that.guru> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit is contained in:
committed by
Ian Stokes
parent
6477dbb9d6
commit
31d0dae22a
@@ -35,6 +35,7 @@ DOC_SOURCE = \
|
|||||||
Documentation/topics/design.rst \
|
Documentation/topics/design.rst \
|
||||||
Documentation/topics/dpdk/index.rst \
|
Documentation/topics/dpdk/index.rst \
|
||||||
Documentation/topics/dpdk/phy.rst \
|
Documentation/topics/dpdk/phy.rst \
|
||||||
|
Documentation/topics/dpdk/pmd.rst \
|
||||||
Documentation/topics/dpdk/ring.rst \
|
Documentation/topics/dpdk/ring.rst \
|
||||||
Documentation/topics/dpdk/vhost-user.rst \
|
Documentation/topics/dpdk/vhost-user.rst \
|
||||||
Documentation/topics/testing.rst \
|
Documentation/topics/testing.rst \
|
||||||
|
@@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run::
|
|||||||
$ ovs-appctl -t ovsdb-server exit
|
$ ovs-appctl -t ovsdb-server exit
|
||||||
$ ovs-vsctl del-br br0
|
$ ovs-vsctl del-br br0
|
||||||
|
|
||||||
PMD Thread Statistics
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
To show current stats::
|
|
||||||
|
|
||||||
$ ovs-appctl dpif-netdev/pmd-stats-show
|
|
||||||
|
|
||||||
To clear previous stats::
|
|
||||||
|
|
||||||
$ ovs-appctl dpif-netdev/pmd-stats-clear
|
|
||||||
|
|
||||||
Port/RXQ Assigment to PMD Threads
|
|
||||||
---------------------------------
|
|
||||||
|
|
||||||
To show port/rxq assignment::
|
|
||||||
|
|
||||||
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
|
||||||
|
|
||||||
To change default rxq assignment to pmd threads, rxqs may be manually pinned to
|
|
||||||
desired cores using::
|
|
||||||
|
|
||||||
$ ovs-vsctl set Interface <iface> \
|
|
||||||
other_config:pmd-rxq-affinity=<rxq-affinity-list>
|
|
||||||
|
|
||||||
where:
|
|
||||||
|
|
||||||
- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
|
|
||||||
|
|
||||||
For example::
|
|
||||||
|
|
||||||
$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
|
|
||||||
other_config:pmd-rxq-affinity="0:3,1:7,3:8"
|
|
||||||
|
|
||||||
This will ensure:
|
|
||||||
|
|
||||||
- Queue #0 pinned to core 3
|
|
||||||
- Queue #1 pinned to core 7
|
|
||||||
- Queue #2 not pinned
|
|
||||||
- Queue #3 pinned to core 8
|
|
||||||
|
|
||||||
After that PMD threads on cores where RX queues was pinned will become
|
|
||||||
``isolated``. This means that this thread will poll only pinned RX queues.
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will
|
|
||||||
not be polled. Also, if provided ``core_id`` is not available (ex. this
|
|
||||||
``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD
|
|
||||||
thread.
|
|
||||||
|
|
||||||
If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores)
|
|
||||||
automatically. The processing cycles that have been stored for each rxq
|
|
||||||
will be used where known to assign rxqs to pmd based on a round robin of the
|
|
||||||
sorted rxqs.
|
|
||||||
|
|
||||||
For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8)
|
|
||||||
available, and the measured usage of core cycles per rxq over the last
|
|
||||||
interval is seen to be:
|
|
||||||
|
|
||||||
- Queue #0: 30%
|
|
||||||
- Queue #1: 80%
|
|
||||||
- Queue #3: 60%
|
|
||||||
- Queue #4: 70%
|
|
||||||
- Queue #5: 10%
|
|
||||||
|
|
||||||
The rxqs will be assigned to cores 3,7,8 in the following order:
|
|
||||||
|
|
||||||
Core 3: Q1 (80%) |
|
|
||||||
Core 7: Q4 (70%) | Q5 (10%)
|
|
||||||
core 8: Q3 (60%) | Q0 (30%)
|
|
||||||
|
|
||||||
To see the current measured usage history of pmd core cycles for each rxq::
|
|
||||||
|
|
||||||
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
A history of one minute is recorded and shown for each rxq to allow for
|
|
||||||
traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic
|
|
||||||
pattern or reconfig changes will take one minute before they are fully
|
|
||||||
reflected in the stats.
|
|
||||||
|
|
||||||
Rxq to pmds assignment takes place whenever there are configuration changes
|
|
||||||
or can be triggered by using::
|
|
||||||
|
|
||||||
$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
|
|
||||||
|
|
||||||
QoS
|
QoS
|
||||||
---
|
---
|
||||||
|
|
||||||
|
@@ -31,3 +31,4 @@ The DPDK Datapath
|
|||||||
phy
|
phy
|
||||||
vhost-user
|
vhost-user
|
||||||
ring
|
ring
|
||||||
|
pmd
|
||||||
|
@@ -113,3 +113,15 @@ tool::
|
|||||||
For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
|
For more information, refer to the `DPDK documentation <dpdk-drivers>`__.
|
||||||
|
|
||||||
.. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
|
.. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html
|
||||||
|
|
||||||
|
.. _dpdk-phy-multiqueue:
|
||||||
|
|
||||||
|
Multiqueue
|
||||||
|
----------
|
||||||
|
|
||||||
|
Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
|
||||||
|
the DPDK datapath. Correct configuration of PMD threads and the Rx queues they
|
||||||
|
utilize is a requirement in order to deliver the high-performance possible with
|
||||||
|
DPDK acceleration. It is possible to configure multiple Rx queues for ``dpdk``
|
||||||
|
ports, thus ensuring this is not a bottleneck for performance. For information
|
||||||
|
on configuring PMD threads, refer to :doc:`pmd`.
|
||||||
|
161
Documentation/topics/dpdk/pmd.rst
Normal file
161
Documentation/topics/dpdk/pmd.rst
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
..
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may
|
||||||
|
not use this file except in compliance with the License. You may obtain
|
||||||
|
a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
||||||
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
||||||
|
License for the specific language governing permissions and limitations
|
||||||
|
under the License.
|
||||||
|
|
||||||
|
Convention for heading levels in Open vSwitch documentation:
|
||||||
|
|
||||||
|
======= Heading 0 (reserved for the title in a document)
|
||||||
|
------- Heading 1
|
||||||
|
~~~~~~~ Heading 2
|
||||||
|
+++++++ Heading 3
|
||||||
|
''''''' Heading 4
|
||||||
|
|
||||||
|
Avoid deeper levels because they do not render well.
|
||||||
|
|
||||||
|
===========
|
||||||
|
PMD Threads
|
||||||
|
===========
|
||||||
|
|
||||||
|
Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for
|
||||||
|
the DPDK datapath and perform tasks such as continuous polling of input ports
|
||||||
|
for packets, classifying packets once received, and executing actions on the
|
||||||
|
packets once they are classified.
|
||||||
|
|
||||||
|
PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly known as
|
||||||
|
*rxq*\s and *txq*\s. While Tx queue configuration happens automatically, Rx
|
||||||
|
queues can be configured by the user. This can happen in one of two ways:
|
||||||
|
|
||||||
|
- For physical interfaces, configuration is done using the
|
||||||
|
:program:`ovs-appctl` utility.
|
||||||
|
|
||||||
|
- For virtual interfaces, configuration is done using the :program:`ovs-appctl`
|
||||||
|
utility, but this configuration must be reflected in the guest configuration
|
||||||
|
(e.g. QEMU command line arguments).
|
||||||
|
|
||||||
|
The :program:`ovs-appctl` utility also provides a number of commands for
|
||||||
|
querying PMD threads and their respective queues. This, and all of the above,
|
||||||
|
is discussed here.
|
||||||
|
|
||||||
|
.. todo::
|
||||||
|
|
||||||
|
Add an overview of Tx queues including numbers created, how they relate to
|
||||||
|
PMD threads, etc.
|
||||||
|
|
||||||
|
PMD Thread Statistics
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
To show current stats::
|
||||||
|
|
||||||
|
$ ovs-appctl dpif-netdev/pmd-stats-show
|
||||||
|
|
||||||
|
To clear previous stats::
|
||||||
|
|
||||||
|
$ ovs-appctl dpif-netdev/pmd-stats-clear
|
||||||
|
|
||||||
|
Port/Rx Queue Assigment to PMD Threads
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
.. todo::
|
||||||
|
|
||||||
|
This needs a more detailed overview of *why* this should be done, along with
|
||||||
|
the impact on things like NUMA affinity.
|
||||||
|
|
||||||
|
Correct configuration of PMD threads and the Rx queues they utilize is a
|
||||||
|
requirement in order to achieve maximum performance. This is particularly true
|
||||||
|
for enabling things like multiqueue for :ref:`physical <dpdk-phy-multiqueue>`
|
||||||
|
and :ref:`vhost-user <dpdk-vhost-user>` interfaces.
|
||||||
|
|
||||||
|
To show port/Rx queue assignment::
|
||||||
|
|
||||||
|
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
||||||
|
|
||||||
|
Rx queues may be manually pinned to cores. This will change the default Rx
|
||||||
|
queue assignment to PMD threads::
|
||||||
|
|
||||||
|
$ ovs-vsctl set Interface <iface> \
|
||||||
|
other_config:pmd-rxq-affinity=<rxq-affinity-list>
|
||||||
|
|
||||||
|
where:
|
||||||
|
|
||||||
|
- ``<rxq-affinity-list>`` is a CSV list of ``<queue-id>:<core-id>`` values
|
||||||
|
|
||||||
|
For example::
|
||||||
|
|
||||||
|
$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
|
||||||
|
other_config:pmd-rxq-affinity="0:3,1:7,3:8"
|
||||||
|
|
||||||
|
This will ensure there are *4* Rx queues and that these queues are configured
|
||||||
|
like so:
|
||||||
|
|
||||||
|
- Queue #0 pinned to core 3
|
||||||
|
- Queue #1 pinned to core 7
|
||||||
|
- Queue #2 not pinned
|
||||||
|
- Queue #3 pinned to core 8
|
||||||
|
|
||||||
|
PMD threads on cores where Rx queues are *pinned* will become *isolated*. This
|
||||||
|
means that this thread will only poll the *pinned* Rx queues.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not
|
||||||
|
be polled. Also, if the provided ``<core-id>`` is not available (e.g. the
|
||||||
|
``<core-id>`` is not in ``pmd-cpu-mask``), the RX queue will not be polled
|
||||||
|
by any PMD thread.
|
||||||
|
|
||||||
|
If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
|
||||||
|
(cores) automatically. Where known, the processing cycles that have been stored
|
||||||
|
for each Rx queue will be used to assign Rx queue to PMDs based on a round
|
||||||
|
robin of the sorted Rx queues. For example, take the following example, where
|
||||||
|
there are five Rx queues and three cores - 3, 7, and 8 - available and the
|
||||||
|
measured usage of core cycles per Rx queue over the last interval is seen to
|
||||||
|
be:
|
||||||
|
|
||||||
|
- Queue #0: 30%
|
||||||
|
- Queue #1: 80%
|
||||||
|
- Queue #3: 60%
|
||||||
|
- Queue #4: 70%
|
||||||
|
- Queue #5: 10%
|
||||||
|
|
||||||
|
The Rx queues will be assigned to the cores in the following order::
|
||||||
|
|
||||||
|
Core 3: Q1 (80%) |
|
||||||
|
Core 7: Q4 (70%) | Q5 (10%)
|
||||||
|
Core 8: Q3 (60%) | Q0 (30%)
|
||||||
|
|
||||||
|
To see the current measured usage history of PMD core cycles for each Rx
|
||||||
|
queue::
|
||||||
|
|
||||||
|
$ ovs-appctl dpif-netdev/pmd-rxq-show
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
A history of one minute is recorded and shown for each Rx queue to allow for
|
||||||
|
traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
|
||||||
|
due to traffic pattern or reconfig changes, will take one minute to be fully
|
||||||
|
reflected in the stats.
|
||||||
|
|
||||||
|
Rx queue to PMD assignment takes place whenever there are configuration changes
|
||||||
|
or can be triggered by using::
|
||||||
|
|
||||||
|
$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
|
||||||
|
|
||||||
|
.. versionchanged:: 2.8.0
|
||||||
|
|
||||||
|
Automatic assignment of Rx queues to PMDs and the two related commands,
|
||||||
|
``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. Prior
|
||||||
|
to this, behavior was round-robin and processing cycles were not taken into
|
||||||
|
consideration. Tracking for stats was not available.
|
||||||
|
|
||||||
|
.. versionchanged:: 2.9.0
|
||||||
|
|
||||||
|
The output of ``pmd-rxq-show`` was modified to include utilization as a
|
||||||
|
percentage.
|
@@ -130,11 +130,10 @@ an additional set of parameters::
|
|||||||
-netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
|
-netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
|
||||||
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
|
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
|
||||||
|
|
||||||
In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user
|
In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
|
||||||
ports access a virtio-net device's virtual rings and packet buffers mapping the
|
access a virtio-net device's virtual rings and packet buffers mapping the VM's
|
||||||
VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
|
physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory
|
||||||
memory into their process address space, pass the following parameters to
|
into their process address space, pass the following parameters to QEMU::
|
||||||
QEMU::
|
|
||||||
|
|
||||||
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
|
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
|
||||||
-numa node,memdev=mem -mem-prealloc
|
-numa node,memdev=mem -mem-prealloc
|
||||||
@@ -154,18 +153,18 @@ where:
|
|||||||
The number of vectors, which is ``$q`` * 2 + 2
|
The number of vectors, which is ``$q`` * 2 + 2
|
||||||
|
|
||||||
The vhost-user interface will be automatically reconfigured with required
|
The vhost-user interface will be automatically reconfigured with required
|
||||||
number of rx and tx queues after connection of virtio device. Manual
|
number of Rx and Tx queues after connection of virtio device. Manual
|
||||||
configuration of ``n_rxq`` is not supported because OVS will work properly only
|
configuration of ``n_rxq`` is not supported because OVS will work properly only
|
||||||
if ``n_rxq`` will match number of queues configured in QEMU.
|
if ``n_rxq`` will match number of queues configured in QEMU.
|
||||||
|
|
||||||
A least 2 PMDs should be configured for the vswitch when using multiqueue.
|
A least two PMDs should be configured for the vswitch when using multiqueue.
|
||||||
Using a single PMD will cause traffic to be enqueued to the same vhost queue
|
Using a single PMD will cause traffic to be enqueued to the same vhost queue
|
||||||
rather than being distributed among different vhost queues for a vhost-user
|
rather than being distributed among different vhost queues for a vhost-user
|
||||||
interface.
|
interface.
|
||||||
|
|
||||||
If traffic destined for a VM configured with multiqueue arrives to the vswitch
|
If traffic destined for a VM configured with multiqueue arrives to the vswitch
|
||||||
via a physical DPDK port, then the number of rxqs should also be set to at
|
via a physical DPDK port, then the number of Rx queues should also be set to at
|
||||||
least 2 for that physical DPDK port. This is required to increase the
|
least two for that physical DPDK port. This is required to increase the
|
||||||
probability that a different PMD will handle the multiqueue transmission to the
|
probability that a different PMD will handle the multiqueue transmission to the
|
||||||
guest using a different vhost queue.
|
guest using a different vhost queue.
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user