2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 12:58:00 +00:00

30 Commits

Author SHA1 Message Date
Ilya Maximets
4cf89cb074 dpdk: Remove deprecated pdump support.
DPDK pdump was deprecated in 2.13 release and didn't actually
work since 2.11.  Removing it.

More details in commit 4ae8c4617fd3 ("dpdk: Deprecate pdump support.")

Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2020-03-06 12:41:04 +01:00
Ian Stokes
127b6a6eea dpdk: Update to use DPDK 19.11.
This commit adds support for DPDK v19.11, it includes the following
changes.

1. travis: Enable compilation and linkage with dpdk 19.11.

2. sparse: Remove dpdk network headers copies.

   https://patchwork.ozlabs.org/patch/1185256/

3. dpdk: Migrate to new PDUMP API.

   https://patchwork.ozlabs.org/patch/1192971/

4. netdev-dpdk: Prefix network structures with rte_.

   https://patchwork.ozlabs.org/patch/1109733/

5. netdev-dpdk: Update by new color definitions.

   https://patchwork.ozlabs.org/patch/1086089/

6. docs: Update docs to reference 19.11.

7. docs: Add note regarding hotplug and igb_uio requirements.

For credit all authors of the original commits to 'dpdk-latest' with the
above changes been added as co-authors for this commmit.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Co-authored-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Co-authored-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
Co-authored-by: Ophir Munk <ophirmu@mellanox.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-12-04 20:51:57 +00:00
Ilya Maximets
4ae8c4617f dpdk: Deprecate pdump support.
The conventional way for packet dumping in OVS is to use ovs-tcpdump
that works via traffic mirroring.  DPDK pdump could probably be used
for some lower level debugging, but it is not commonly used for
various reasons.

There are lots of limitations for using this functionality in practice.
Most of them connected with running secondary pdump process and
memory layout issues like requirement to disable ASLR in kernel.
More details are available in DPDK guide:
https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Beside the functional limitations it's also hard to use this
functionality correctly.  User must be sure that OVS and pdump utility
are running on different CPU cores, which is hard because non-PMD
threads could float over available CPU cores.  This or any other
misconfiguration will likely lead to crash of the pdump utility
or/and OVS.

Another problem is that the user must actually have this special pdump
utility in a system and it might be not available in distributions.

This change disables pdump support by default introducing special
configuration option '--enable-dpdk-pdump'.  Deprecation warnings will
be shown to users on configuration and in runtime.

Claiming to completely remove this functionality from OVS in one
of the next releases.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-11-19 20:52:46 +00:00
David Marchand
d0d1a76eac dpdk: Remove unneeded log message copy.
No need to duplicate and null-terminate the passed buffer.
We can directly give it to the vlog subsystem using a dynamic precision
in the format string.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-09-26 09:28:25 +01:00
Ilya Maximets
15d8655b1f dpdk: Use ovs-numa provided functions to manage thread affinity.
This allows to decrease code duplication and avoid using Linux-specific
functions (this might be useful in the future if we'll try to allow
running OvS+DPDK on FreeBSD).

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
2019-09-06 11:45:39 +03:00
Ilya Maximets
1276e3db89 dpif-netdev-perf: Fix TSC frequency for non-DPDK case.
Unlike 'rte_get_tsc_cycles()' which doesn't need any specific
initialization, 'rte_get_tsc_hz()' could be used only after successfull
call to 'rte_eal_init()'. 'rte_eal_init()' estimates the TSC frequency
for later use by 'rte_get_tsc_hz()'.  Fairly said, we're not allowed
to use 'rte_get_tsc_cycles()' before initializing DPDK too, but it
works this way for now and provides correct results.

This patch provides TSC frequency estimation code that will be used
in two cases:
  * DPDK is not compiled in, i.e. DPDK_NETDEV not defined.
  * DPDK compiled in but not initialized,
    i.e. other_config:dpdk-init=false

This change is mostly useful for AF_XDP netdev support, i.e. allows
to use dpif-netdev/pmd-perf-show command and various PMD perf metrics.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
2019-09-06 11:45:39 +03:00
Ilya Maximets
4f746d526d netdev-offload: Rename offload providers.
Flow API providers renamed to be consistent with parent module
'netdev-offload' and look more like each other.

'_rte_' replaced with more convenient '_dpdk_'.

We'll have following structure:

  Common code:
    lib/netdev-offload-provider.h
    lib/netdev-offload.c
    lib/netdev-offload.h

  Providers:
    lib/netdev-offload-tc.c
    lib/netdev-offload-dpdk.c

'netdev-offload-dummy' still resides inside netdev-dummy, but it
makes no much sence to move it out of there.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Ilya Maximets
b6cabb8f8f netdev: Split up netdev offloading to separate module.
New module 'netdev-offload' created to manage different flow API
implementations. All the generic and provider independent code moved
there from the 'netdev' module.

Flow API providers further encapsulated.

The only function that was changed is 'netdev_any_oor'.
Now it uses offloading related hmap instead of common 'netdev_shash'.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Ilya Maximets
5fc5c50f3d netdev: Dynamic per-port Flow API.
Current issues with Flow API:

* OVS calls offloading functions regardless of successful
  flow API initialization. (ex. on init_flow_api failure)
* Static initilaization of Flow API for a netdev_class forbids
  having different offloading types for different instances
  of netdev with the same netdev_class. (ex. different vports in
  'system' and 'netdev' datapaths at the same time)

Solution:

* Move Flow API from the netdev_class to netdev instance.
* Make Flow API dynamic, i.e. probe the APIs and choose the
  suitable one.

Side effects:

* Flow API providers localized as possible in their modules.
* Now we have an ability to make runtime checks. For example,
  we could check if particular device supports features we
  need, like if dpdk device supports RSS+MARK action.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Roi Dayan <roid@mellanox.com>
2019-06-11 09:39:36 +03:00
Liliia Butorina
30e834dcb5 netdev-dpdk: Post-copy Live Migration support for vhost-user-client.
Post-copy Live Migration for vHost supported since DPDK 18.11 and
QEMU 2.12. New global config option 'vhost-postcopy-support' added
to control this feature. Ex.:

  ovs-vsctl set Open_vSwitch . other_config:vhost-postcopy-support=true

Changing this value requires restarting the daemon. It's safe to
enable this knob even if QEMU doesn't support post-copy LM.

Feature marked as experimental and disabled by default because it may
cause PMD thread hang on destination host on page fault for the time
of page downloading from the source.

Feature is not compatible with 'mlockall' and 'dequeue zero-copy'.
Support added only for vhost-user-client.

Signed-off-by: Liliia Butorina <l.butorina@partner.samsung.com>
Co-authored-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2019-05-24 15:31:28 +03:00
Ilya Maximets
450ff2bcd7 dpdk: Stop dumping memzones to stdout.
Information about memzones reserved on init is not much useful.
Anyway, we need to log it in more civilized manner, i.e. through
the OVS logging subsystem.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-19 09:20:16 +00:00
Ilya Maximets
6455316bd2 dpdk: Fix case-sensitivity of dpdk-init knob.
Before supporting the DPDK initialization status in DB 'dpdk-init' was
just a boolean and 'smap_get_bool', which is case-insensitive, was used
to get the value.

Current code uses simple 'strcmp' that fails to recognize values like
"True". As a result this breaks different OVS configuration tools.
For example, kolla-ansible uses 'other_config:dpdk-init=True' but OVS
is not able to recognize it leading to broken installations.

'strcasecmp' should be used instead to fix the issue.

CC: Aaron Conole <aconole@redhat.com>
Fixes: 3e52fa5644cd ("dpdk: reflect status and version in the database")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-04 06:55:03 +00:00
Ilya Maximets
8411b6ccec dpdk: Limit DPDK memory usage.
Since 18.05 release, DPDK moved to dynamic memory model in which
hugepages could be allocated on demand. At the same time '--socket-mem'
option was re-defined as a size of pre-allocated memory, i.e. memory
that should be allocated at startup and could not be freed.
So, DPDK with a new memory model could allocate more hugepage memory
than specified in '--socket-mem' or '-m' options.

This change adds new configurable 'other_config:dpdk-socket-limit'
which could be used to limit the ammount of memory DPDK could use.
It uses new DPDK option '--socket-limit'.
Ex.:

  ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-limit="1024,1024"

Also, in order to preserve old behaviour, if '--socket-limit' is not
specified, it will be defaulted to the amount of memory specified by
'--socket-mem' option, i.e. OVS will not be able to allocate more.
This is needed, for example, to disallow OVS to allocate more memory
than reserved for it by Nova in OpenStack installations.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-02-01 12:57:17 +00:00
Ilya Maximets
68c00e3eed dpdk: Use svec instead of re-inventing.
No need to implement dynamic vector to store arguments.
'svec' perfectly covers all the needed functionality.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-01-30 16:09:01 +00:00
Ilya Maximets
9c68ca3432 dpdk: Use dynamic string for socket-mem construction.
No need to allocate memory and use 'strcat' direcly.
'dynamic-string' could do this for us.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-01-28 20:46:40 +00:00
Ian Stokes
43307ad0e2 dpdk: Support both shared and per port mempools.
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.

OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:

https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html

However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.

This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.

This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.

This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.

    ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true

This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.

The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits

c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)

A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.

Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Tiago Lam <tiago.lam@intel.com>
Tested-by: Tiago Lam <tiago.lam@intel.com>
2018-07-06 12:46:26 +01:00
Marcin Rybka
7189d54c54 OVS-DPDK: Change "dpdk-socket-mem" default value.
When "dpdk-socket-mem" and "dpdk-alloc-mem" are not specified,
"dpdk-socket-mem" will be set to allocate 1024MB on each NUMA node.
This change will prevent OVS from failing when NIC is attached on
NUMA node 1 and higher. Patch contains documentation update.

Signed-off-by: Marcin Rybka <marcinx.rybka@intel.com>
Co-authored-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Billy O'Mahony <billy.o.mahony@intel.com>
Tested-by: Hariprasad Govindharajan <hariprasad.govindharajan@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-06-08 17:27:56 +01:00
Aaron Conole
3e52fa5644 dpdk: reflect status and version in the database
The normal way of retrieving the running DPDK status involves parsing
log files and issuing various incantations of ovs-vsctl and ovs-appctl
commands to determine whether the rte_eal_init successfully started.

This commit adds two new records to reflect the dpdk version, and
the dpdk initialization status.

To support this, the other_config:dpdk-init configuration block supports
the 'true' and 'try' keywords now, instead of just 'true'.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-25 09:09:50 +01:00
Aaron Conole
d7e2509e2f dpdk: allow init to fail
It's possible for dpdk initialization to fail either due to an internal
error or an invalid configuration.  When that happens, it's rather
impolite to immediately abort without any details.

With this change, a failed dpdk initialization attempt will continue to
trigger a SIGABRT.  However, the failure details will be logged, and a
user or administrator may have more information to correct the issue.
A restart of OvS would still be required to re-attempt initialization.

The refactor to propagate the init error will be used in an upcoming
commit.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-05-25 09:09:50 +01:00
Ilya Maximets
9fd38f6867 netdev-dpdk: Limit rate of DPDK logs.
DPDK could produce huge amount of logs. For example, in case of
exhausting of a mempool in vhost-user port, following message will be
printed on each call to 'rte_vhost_dequeue_burst()':

    |ERR|VHOST_DATA: Failed to allocate memory for mbuf.

These messages are increasing ovs-vswitchd.log size extremely fast
making it unreadable and non-parsable by a common linux utils like
grep, less etc. Moreover continuously growing log could exhaust the
HDD space in a few hours breaking normal operation of the whole system.

To avoid such issues, DPDK log rate limited to 600 messages per minute.
This value is high, because we still want to see many big logs like
vhost-user configuration sequence. The debug messages are treated
separately to avoid looss of errors/warnings in case of intensive debug
enabled in DPDK.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-03-23 12:53:36 +00:00
Matteo Croce
40c23a57b8 vswitchd: show DPDK version
Show DPDK version if Open vSwitch is compiled with DPDK support.
Version can be retrieved with `ovs-vswitchd --version` or from OVS logs.
Small change in ovs-ctl to avoid breakage on output change.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Mark Kavanagh
a14d1cc8a7 netdev-dpdk: vHost IOMMU support
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.

This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.

This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:

    ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
736ca516f3 dpdk: Redirect DPDK log to OVS logging subsystem.
This should be helpful for have all the logs in one place.
'ovs-appctl vlog' commands for 'dpdk' module can be used
to configure the log level. Lower bound for DPDK logging
(--log-level) still can be passed through 'dpdk-extra' field.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2017-03-09 17:33:26 -08:00
Ben Pfaff
5575908b01 dpdk: Use VLOG_INFO_ONCE instead of open-coding it.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Andy Zhou <azhou@ovn.org>
2017-03-08 19:39:06 -08:00
nickcooper-zhangtonghao
6c4f08e23f dpdk: Fixes memory leak in dpdk_init__().
If users configure the 'vhost-sock-dir' for dpdk, the memory
allocated by xstrdup(ovs_rundir()) is not freed. This patch
allows the process_vhost_flags to xstrdup() for val or
default_val according to configuration and the caller must
free new_val when it is no longer needed.

Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.")
CC: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: nickcooper-zhangtonghao <nic@opencloud.tech>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2017-02-10 14:47:01 -08:00
Daniele Di Proietto
ec2b070143 dpdk: Late initialization.
With this commit, we allow the user to set other_config:dpdk-init=true
after the process is started.  This makes it easier to start Open
vSwitch with DPDK using standard init scripts without restarting the
service.

This is still far from ideal, because initializing DPDK might still
abort the process (e.g. if there not enough memory), so the user must
check the status of the process after setting dpdk-init to true.

Nonetheless, I think this is an improvement, because it doesn't require
restarting the whole unit.

CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Aaron Conole <aconole@redhat.com>
2017-01-10 18:39:14 -08:00
Aaron Conole
71e2a07ad0 lib/dpdk: No more deferred release
DPDK documentation is recently updated to reflect that DPDK does not
hold any references to, nor take ownership of, the argv/argc elements.
With that understanding, let's just release the memory asap.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-12-21 17:50:07 -08:00
Aaron Conole
fe11b9e0be lib/dpdk: fix double free on exit
The DPDK EAL library intents that all argc/argv arguments passed on the
command line will be in the form:

    progname dpdk arguments program arguments

This means the argv array will look something like:
   argv[0] = progname
   argv[1..x] = dpdk arguments
   argv[x..y] = program arguments

When the eal initialization routine completes, it will modify the argv array
to set argv[ret] = progname, such that the arguments can then be passed to
something like getopts for further processing.

When the dpdk arguments rework was initially added, the assignment mentioned
above was not considered.  This means two errors were introduced:
1. Leak of the element at argv[ret]
2. Double-free of the element at argv[0]

Reported-by: Ilya Maximets <i.maximets@samsung.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2016-November/325442.html
Fixes: bab694097133 ("netdev-dpdk: Convert initialization from cmdline to db")
Signed-off-by: Aaron Conole <aconole@redhat.com>
2016-12-12 11:41:42 -08:00
Ciara Loftus
a0cbc627a6 dpdk: Fix DPDK pdump compilation
The rte_pdump header file was not included in the file that requires it.
Fix this.

Fixes: 01961bbdd34a ("dpdk: New module with some code from netdev-dpdk.")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
2016-10-13 11:08:30 -07:00
Daniele Di Proietto
01961bbdd3 dpdk: New module with some code from netdev-dpdk.
There's a lot of code in netdev-dpdk which is not at all related to the
netdev interface, mostly the library initialization code.

This commit moves it to a new 'dpdk' module, to simplify 'netdev-dpdk'.

Also a new module 'dpdk-stub' is introduced to implement some functions
when DPDK is not available.  This replaces the old 'netdev-nodpdk'
module.

Some redundant includes are removed or reorganized as a consequence.

No functional change.

CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Tested-by: Aaron Conole <aconole@redhat.com>
2016-10-12 16:31:06 -07:00