2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-28 21:07:47 +00:00

318 Commits

Author SHA1 Message Date
Ilya Maximets
ac1a9bb93f netdev-dpdk: Fix xstats leak on port destruction.
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ilya Maximets
34eb086342 netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats().
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ilya Maximets
526259f22c netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats().
CC: Michal Weglicki <michalx.weglicki@intel.com>
Fixes: 971f4b394c6e ("netdev: Custom statistics.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Yuanhan Liu
5e75881868 netdev-dpdk: fix port addition for ports sharing same PCI id
Some NICs have only one PCI address associated with multiple ports. This
patch extends the dpdk-devargs option's format to cater for such devices.

To achieve that, this patch uses a new syntax that will be adapted and
implemented in future DPDK release (likely, v18.05):
    http://dpdk.org/ml/archives/dev/2017-December/084234.html

And since it's the DPDK duty to parse the (complete and full) syntax
and this patch is more likely to serve as an intermediate workaround,
here I take a simpler and shorter syntax from it (note it's allowed to
have only one category being provided):
    class=eth,mac=00:11:22:33:44:55:66

Also, old compatibility is kept. Users can still go on with using the
PCI id to add a port (if that's enough for them). Meaning, this patch
will not break anything.

This patch is basically based on the one from Ciara:
    https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html

Cc: Loftus Ciara <ciara.loftus@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-26 20:49:18 +00:00
Ian Stokes
f6f50552a3 netdev-dpdk: Fix requested MTU size validation.
This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu)
in netdev_dpdk_set_mtu(), in order to determine if the total length of
the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN.

When setting an MTU we first check if the requested total frame length
(which includes associated L2 overhead) will exceed the maximum
frame length supported in netdev_dpdk_set_mtu(). The frame length is
calculated by MTU_TO_FRAME_LEN  as MTU + ETHER_HEADER + ETHER_CRC. The MTU
for the device will be set at a later stage in dpdk_eth_dev_init() using
rte_eth_dev_set_mtu(mtu).

However when using rte_eth_dev_set_mtu(mtu) the calculation used to check
that the frame does not exceed the max frame length for that device varies
between DPDK device drivers. For example ixgbe driver calculates the
frame length for a given MTU as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN

i40e driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2

em driver calculates it as

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE

Currently it is possible to set an MTU for a netdev_dpdk device that exceeds
the upper limit MTU for that devices DPDK driver. This leads to a segfault.
This is because the frame length comparison as is, does not take into account
the addition of the vlan tag overhead expected in the drivers. The
netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent
dpdk_eth_dev_init() will fail before the queues have been created for the
DPDK device. This coupled with assumptions regarding reconfiguration
requirements for the netdev will lead to a segfault when the rxq is polled
for this device.

A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when
validating a requested MTU in netdev_dpdk_set_mtu().
MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following:

mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)

By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS
now takes into account the maximum L2 overhead that a DPDK driver could
allow for in its frame size calculation. This allows OVS to flag an error
rather than the DPDK driver if the frame length exceeds the max DPDK frame
length. OVS can fail gracefully at this point and use the default MTU of
1500 to continue to configure the port.

Note: this fix is a work around, a better approach would be if DPDK devices
could report the maximum MTU value that can be requested on a per device
basis. This capability however is not currently available. A downside of
this patch is that the MTU upper limit will be reduced by 8 bytes for
DPDK devices that do not need to account for vlan tags in the frame length
driver calculations e.g. ixgbe devices upper MTU limit is reduced from
the OVS point of view from 9710 to 9702.

CC: Mark Kavanagh <mark.b.kavanagh@intel.com>
Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
2018-01-26 20:49:18 +00:00
Flavio Leitner
b2e8b12f8a netdev-dpdk: add vhost-user get_status.
Expose relevant vhost-user information in status.

Signed-off-by: Flavio Leitner <fbl@sysclose.org>
Tested-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-17 18:12:46 +00:00
zhangliping
4c47ddde34 netdev-dpdk: fix ingress_policer leak on error path
Fix memory leak by freeing the policer if rte_meter_srtcm_config fails.

Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.")
Signed-off-by: zhangliping <zhangliping02@baidu.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-01-17 18:11:28 +00:00
Michal Weglicki
971f4b394c netdev: Custom statistics.
- New get_custom_stats interface function is added to netdev. It
  allows particular netdev implementation to expose custom
  counters in dictionary format (counter name/counter value).
- New statistics are retrieved using experimenter code and
  are printed as a result to ofctl dump-ports.
- New counters are available for OpenFlow 1.4+.
- New statistics are printed to output via ofctl only if those
  are present in reply message.
- New statistics definition is added to include/openflow/intel-ext.h.
- Custom statistics are implemented only for dpdk-physical
  port type.
- DPDK-physical implementation uses xstats to collect statistics.
  Only dropped and error counters are exposed.

Co-authored-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-01-10 15:29:13 -08:00
Ilya Maximets
ad8b0b4fe7 netdev: Remove useless cutlen.
Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
b30896c969 netdev: Remove unused may_steal.
Not needed anymore because 'may_steal' already handled on
dpif-netdev layer and always true.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com
2017-12-20 21:07:46 +00:00
Ilya Maximets
be48173310 netdev-dpdk: Add debug appctl to get mempool information.
New appctl 'netdev-dpdk/get-mempool-info' implemented to get result
of 'rte_mempool_list_dump()' function if no arguments passed and
'rte_mempool_dump()' if DPDK netdev passed as argument.

Could be used for debugging mbuf leaks and other mempool related
issues. Most useful in pair with `grep -v "cache_count.*=0"`.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-20 21:07:46 +00:00
Michal Weglicki
3eb8d4fa0d netdev-dpdk: extend netdev_dpdk_get_status to include if_type and if_descr
This commit extends netdev_dpdk_get_status API to include additional
driver-related information: if_type and if_descr.

v2->v3: Code rebase.
v3->v4: Minor comments applied.
v5->v6: Adds DPDK port specific description in documentation.

Co-authored-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Mark Kavanagh
a14d1cc8a7 netdev-dpdk: vHost IOMMU support
DPDK v17.11 introduces support for the vHost IOMMU feature.
This is a security feature, which restricts the vhost memory
that a virtio device may access.

This feature also enables the vhost REPLY_ACK protocol, the
implementation of which is known to work in newer versions of
QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 -
v2.9.0, inclusive). As such, the feature is disabled by default
in (and should remain so), for the aforementioned older QEMU
verions. Starting with QEMU v2.9.1, vhost-iommu-support can
safely be enabled, even without having an IOMMU device, with
no performance penalty.

This patch adds a new global config option, vhost-iommu-support,
that controls enablement of the vhost IOMMU feature:

    ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

This value defaults to false; to enable IOMMU support, this field
should be set to true when setting other global parameters on init
(such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch daemon.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Mark Kavanagh
5e925ccc2a netdev-dpdk: DPDK v17.11 upgrade
This commit adds support for DPDK v17.11:
- minor updates to accomodate DPDK API changes
- update references to DPDK version in Documentation
- update DPDK version in travis' linux-build script
- document DPDK v17.11 virtio driver bug

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Jan Scheurich <jan.scheurich@ericsson.com>
Tested-by: Guoshuai Li <ligs@dtdream.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Kevin Traynor
255b7bda98 netdev-dpdk: Remove uneeded call to rte_eth_dev_count().
The call to rte_eth_dev_count() was added as workaround
for rte_eth_dev_get_port_by_name() not handling cases
when there was no DPDK ports.

In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name()
does handle this case (DPDK commit f9ae888b1e19).
rte_eth_dev_count() is no longer needed so remove it.

Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
b2e72a9c9d netdev-dpdk: Add comment about variables naming convention.
It'll be nice to document current naming convention for variables of
the following types used in netdev-dpdk:

	* netdev
	* netdev_dpdk
	* netdev_rxq
	* netdev_rxq_dpdk

to be sure that we will not return to chaos which was before
commit d46285a2206f ("netdev-dpdk: Consistent variable naming.").

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
3d0d5ab153 netdev-dpdk: Fix variables naming in set_admin_state function.
Function 'netdev_dpdk_set_admin_state()' was missed while fixing
variables naming according to the following convention:

    'struct netdev':'netdev'
    'struct netdev_dpdk':'dev'
    'struct netdev_rxq':'rxq'
    'struct netdev_rxq_dpdk':'rx'

Fixes: d46285a2206f ("netdev-dpdk: Consistent variable naming.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokess <ian.stokes@intel.com>
2017-12-08 21:42:54 +00:00
Ilya Maximets
af5b0dad30 netdev-dpdk: Fix mempool creation with large MTU.
Currently mempool name size limited to 25 characters by
RTE_MEMPOOL_NAMESIZE. netdev-dpdk tries to create mempool with the
following name pattern: "ovs_%{hash}_%{socket}_%{mtu}_%{n_mbuf}".

We have 3 chars for "ovs" + 4 chars for delimiters + 8 chars for
hash (because it's the 32 bit integer printed in hex) + 1 char for
socket_id (mostly 1, but it could be 2 on some systems; larger?) = 16.

Only 25 - 16 = 9 characters remains for mtu + n_mbufs.
Minimum usual value for mtu is 1500 --> 2030 (4 chars) after
dpdk_buf_size conversion and the minimum value for n_mbufs is 16384
(5 chars). So, all the 9 characters are used.

If we'll try to create port with mtu = 9500, mempool creation will
fail, because FRAME_LEN_TO_MTU(dpdk_buf_size(9500)) = 10222 (5 chars)
and this value will overflow the RTE_MEMPOOL_NAMESIZE limit.

Same issue will happen if we'll try to create port with big enough
number of queues or will try to create big enough number of PMD
threads (number of tx queues will enlarge the mempool requirements).

Fix that by removing the delimiters. To keep the readability (at least
partial) of the mempool names exact field sizes with zero padding
are used.

Following limits should be suitable for now:
 - Hash length: 8 chars (uint32_t in hex)
 - Socket ID  : 2 chars (For systems with up to 10 sockets)
 - MTU        : 5 chars (MTU (10^5 - 1) should be enough for now)
 - n_mbufs    : 7 chars (Up to 10^7 of mbufs)

   Total      : 22 + 3 (for "ovs") = 25

CC: Antonio Fischetti <antonio.fischetti@intel.com>
CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Fixes: f06546a51dd8 ("Fix mempool names to reflect socket id.")
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-17 16:26:33 +00:00
Ilya Maximets
daf22bf7a8 netdev-dpdk: Fix calling vhost API with negative vid.
Currently, rx and tx functions for vhost interfaces always obtain
'vid' twice. First time inside 'is_vhost_running' for checking
the value and the second time in enqueue/dequeue function calls to
send/receive packets. But second time we're not checking the
returned value. If vhost device will be destroyed between
checking and enqueue/dequeue, DPDK API will be called with
'-1' instead of valid 'vid'. DPDK API does not validate the 'vid'.
This leads to getting random memory value as a pointer to internal
device structure inside DPDK. Access by this pointer leads to
segmentation fault. For example:

  |00503|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE
  [New Thread 0x7fb6754910 (LWP 21246)]

  Program received signal SIGSEGV, Segmentation fault.
  rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630
  630             if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))
  (gdb) bt full
  #0  rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630
          dev = 0xffffffff
  #1  __netdev_dpdk_vhost_send at lib/netdev-dpdk.c:1803
          tx_pkts = <optimized out>
          cur_pkts = 0x7f340084f0
          total_pkts = 32
          dropped = 0
          i = <optimized out>
          retries = 0
  ...
  (gdb) p *((struct netdev_dpdk *) netdev)
  $8 = { ... ,
        flags = (NETDEV_UP | NETDEV_PROMISC), ... ,
        vid = {v = -1},
        vhost_reconfigured = false, ... }

Issue can be reproduced by stopping DPDK application (testpmd) inside
guest while heavy traffic flows to this VM.

Fix that by obtaining and checking the 'vid' only once.

CC: Ciara Loftus <ciara.loftus@intel.com>
Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Billy O'Mahony <billy.o.mahony@intel.com>
Acked-by: Billy O'Mahony <billy.o.mahony@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Ilya Maximets
bc57ed901f netdev-dpdk: Remove unused MAX_NB_MBUF.
MAX_NB_MBUF was used as a default mempool size for almost all ports.
Not used since new per-port mempool allocation introduced.
MIN_NB_MBUF still used as a lower limit.

CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Ilya Maximets
24e78f9350 netdev-dpdk: Factor out struct dpdk_mp.
Since commit d555d9bded5f ("netdev-dpdk: Create separate memory pool
for each port."), struct dpdk_mp is redundant because each mempool
can be used by single port only and this port already contains all
the information we store in dpdk_mp.
There is no need to duplicate the information.
Fields of this structure currently used only to generate mempool name.
But it's required only while creation and after that we can use
mp->name directly from the struct rte_mempool.

Let's remove this structure and use struct rte_mempool directly
instead.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Ilya Maximets
173ef76bbf netdev-dpdk: Fix dpdk_mp leak in case of EEXIST.
In case of EEXIST, 'dpdk_mp_create()' will allocate yet another
'struct dpdk_mp' with same 'mp' pointer inside. We need to free
this structure to avoid the leak.

CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
CC: Antonio Fischetti <antonio.fischetti@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Fixes: b6b26021d2e2 ("netdev-dpdk: fix management of pre-existing mempools.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Mark Kavanagh
7ee94cbac8 netdev-dpdk: replace uint8_t with dpdk_port_t
netdev_dpdk_detach() declares a 'port_id' variable, of type uint8_t.
This variable should instead be of type dpdk_port_t.

Fixes: bb37956ac ("netdev-dpdk: Use uint8_t for port_id.")
CC: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-11-16 16:24:11 +00:00
Bhanuprakash Bodireddy
23d4d53f14 netdev-dpdk: Refactor netdev_dpdk structure.
This commit introduces below changes to netdev_dpdk structure.

- Mark cachelines and reorder few member variables.
- Maintain the grouping of related member variables.
- Add comment on the information on pad bytes where ever appropriate, so
  new members can be introduced in the future to fill the gaps.

  Below is how this structure looks with this commit.

                  Member                    size

         OVS_CACHE_LINE_MARKER cacheline0;
             dpdk_port_t port_id;            1
             bool attached;                  1
             ...

         OVS_CACHE_LINE_MARKER cacheline1;
             struct ovs_mutex;              48
             struct dpdk_mp *dpdk_mp;        8
             ...

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 13:36:53 -07:00
Ilya Maximets
ec6edc8cde netdev-dpdk: Fix mp_name leak on snprintf failure.
CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
CC: Antonio Fischetti <antonio.fischetti@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Fixes: 65056fd79694 ("netdev-dpdk: manage failure in mempool name creation.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-10-30 14:58:52 -07:00
antonio.fischetti@intel.com
a08a115d26 netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free.
For readability purposes dpdk_mp_put is renamed as dpdk_mp_free.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:45:33 +01:00
antonio.fischetti@intel.com
ad9b5b9bc7 netdev-dpdk: Reword mp_size as n_mbufs.
For code readability purposes mp_size is renamed as n_mbufs
in dpdk_mp structure.
This parameter is passed to rte mempool creation functions
and is meant to contain the number of elements inside
the requested mempool.

CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:23:41 +01:00
antonio.fischetti@intel.com
65056fd796 netdev-dpdk: manage failure in mempool name creation.
In case a mempool name could not be generated log a message
and return a null mempool pointer to the caller.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:17:27 +01:00
antonio.fischetti@intel.com
837c1761be netdev-dpdk: skip init for existing mempools.
Skip initialization of mempool packet areas if this was already
done in a previous call to dpdk_mp_create.

CC: Darrell Ball <dlu998@gmail.com>
CC: Ciara Loftus <ciara.loftus@intel.com>
CC: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 20:08:01 +01:00
antonio.fischetti@intel.com
f06546a51d Fix mempool names to reflect socket id.
Create mempool names by considering also the NUMA socket number.
So a name reflects what socket the mempool is allocated on.
This change is needed for the NUMA-awareness feature.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Aaron Conole <aconole@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 19:49:29 +01:00
antonio.fischetti@intel.com
b6b26021d2 netdev-dpdk: fix management of pre-existing mempools.
Fix an issue on reconfiguration of pre-existing mempools.
This patch avoids to call dpdk_mp_put() - and erroneously
release the mempool - when it already exists.

CC: Mark B Kavanagh <mark.b.kavanagh@intel.com>
CC: Aaron Conole <aconole@redhat.com>
CC: Darrell Ball <dlu998@gmail.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Reported-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
Reported-by: Róbert Mulik <robert.mulik@ericsson.com>
Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.")
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2017-10-26 19:46:58 +01:00
Zoltan Balogh
75fb914892 netdev-dpdk: reset packet_type for reused dp_packets.
DPDK uses dp-packet pool for storing received packets. The pool is
reused by rxq_recv funcions of the DPDK netdevs. The datapath is
capable to modify the packet_type property of packets. For instance
when encapsulated L3 packets are received on a ptap gre port.
In this case the packet_type property of struct dp_packet can be
modified and later the same dp_packet with the modified packet_type
can be reused in the rxq_rec function, so it can contain corrupted
data.

The dp_packet_batch_init_cutlen() in the rxq_recv functions iterates
over dp_packets and sets their cutlen. So I modified this function
to set packet_type to Ethernet for the dp_packets as well. I also
renamed this function because of the added functionality.

The dp_packet_batch_init_cutlen() iterates over batch->count dp_packet.
Therefore setting of batch->count = nb_rx needs to be done before the
former function is invoked. This is an additional fix.

Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com>
Signed-off-by: Laszlo Suru <laszlo.suru@ericsson.com>
Co-authored-by: Laszlo Suru <laszlo.suru@ericsson.com>
CC: Jan Scheurich <jan.scheurich@ericsson.com>
CC: Sugesh Chandran <sugesh.chandran@intel.com>
CC: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:58:13 -07:00
Bhanuprakash Bodireddy
fd57eebacb netdev-dpdk: Minor cleanup of netdev_dpdk_send__.
The variable 'cnt' is initialized and reused in multiple function calls
inside netdev_dpdk_send__() and is confusing sometimes. Instead introduce
'batch_cnt' to hold the original packet count and 'tx_cnt' to store
the final packet count resulting after filtering and qos operations.

Finally 'tx_cnt' packets gets transmitted on the respective 'qid'.

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:14:59 -07:00
Bhanuprakash Bodireddy
8a14bd7b6b netdev-dpdk: Cleanup dpdk_do_tx_copy.
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:12:22 -07:00
Bhanuprakash Bodireddy
8a543eb03a netdev-dpdk: Use DP_PACKET_BATCH_FOR_EACH in netdev_dpdk_ring_send.
Use DP_PACKET_BATCH_FOR_EACH macro in netdev_dpdk_ring_send().

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-22 02:05:49 -07:00
Gao Zhenyu
3e90f7d753 netdev-dpdk: Execute QoS Checking before copying to mbuf.
In dpdk_do_tx_copy function, all packets were copied to mbuf first,
but QoS checking may drop some of them.
Move the QoS checking in front of copying data to mbuf, it helps to
reduce useless copy.

Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com>
Acked-by: ian.stokes@intel.com<mailto:ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-06 22:27:53 -07:00
Robert Wojciechowicz
d555d9bded netdev-dpdk: Create separate memory pool for each port.
Since it's possible to delete memory pool in DPDK
we can try to estimate better required memory size
when port is reconfigured, e.g. with different number
of rx queues.

CC: Kevin Traynor <ktraynor@redhat.com>
CC: Aaron Conole <aconole@redhat.com>
Signed-off-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
Acked-by: Ian Stokes <ian.stokes@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
wangzhike
894af647a8 netdev-dpdk: update vhost user client port status.
After ovs-vswitchd reboots, vhost user client port status is displayed as
LINK DOWN though the traffic is OK.

The problem is that the port may be udpated while the vhost_reconfigured
is false. Then the vhost_reconfigured is updated to true. As a result,
the vhost port status is kept as LINK-DOWN.

Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-09-05 12:02:25 -07:00
wangzhike
50986e7842 netdev-dpdk: vhost get stats fix.
In netdev_dpdk_vhost_get_stats, '+=' was used in a few places
where '=' was expected.

Signed-off-by: wangzhike <wangzhike@jd.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 14:49:55 -07:00
Lance Richardson
602c86681c netdev-dpdk: use 64-bit arithmetic when converting rates.
Force 64-bit arithmetic to be used when converting uint32_t rate
and burst parameters from kilobits per second to bytes per second,
avoiding incorrect behavior for rates exceeding UINT_MAX bits
per second.

Reported-by: "王志克" <wangzhike@jd.com>
Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-By: Mark Michelson <mmichels@redhat.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-25 14:35:31 -07:00
Darrell Ball
11d4c7a843 dp-packet: Refactor DPDK packet initialization.
DPDK uses dp-packet pools and manages the mbuf portion of
each packet. When a pool is created, partial initialization is
also done on the OVS portion (i.e. non-mbuf).  Since packet
memory is reused, this is not very useful for transient
fields and is also misleading.  Furthermore, some of these
transient fields are properly initialized for DPDK packets
entering OVS anyways, which is the only reasonable way to do this.
Another field, cutlen, is initialized in this manner in the pool
and intended to be reset when cutlen is applied on sending the
packet out. However, if cutlen context is set but the packet is
not sent out for some reason, then the packet header would be
corrupted in the memory pool.  It is better to just reset the
cutlen in the packets when received.  I did not detect a
degradation in performance, however, I would be willing to
have some degradation, since this is a proper way to handle
this.  In addition to initializing cutlen in received packets,
the other OVS transient fields are removed from the DPDK pool
initialization.

Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
2017-08-24 22:09:58 -07:00
Aaron Conole
fc56f5e0f5 netdev-dpdk: include dpdk PCI header directly
As part of a devargs rework in DPDK, the PCI header file was removed, and
needs to be directly included.  This isn't required to build with 17.05 or
earlier, but will be required should a future update happen.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Tested-By: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
2017-08-10 13:44:24 -07:00
Darrell Ball
f8121b3912 dp-packet: Reset DPDK hwol flags on init.
Reset the DPDK hwol flags in dp_packet_init_.  The new hwol bad checksum
flag is uninitialized for non-dpdk ports and this is noticed as test
failures using netdev-dummy ports, when built with the --with-dpdk
flag set. Hence, in this case, packets may be falsely marked as having a
bad checksum. The existing APIs are simplified at the same time by
making them specific to either DPDK or otherwise; they also now
manage a single field.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2017-August/045081.html
Fixes: 7451af618e0d ("dp-packet : Update DPDK rx checksum validation functions.")
CC: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-10 13:34:00 -07:00
Darrell Ball
0ee821c2e6 dpdk: Fix device cleanup.
Commit 5dcde09c80a8 was introduced to make detaching more
automatic without using an additional command beyond
ovs-vsctl del-port <br> <port>.

Sometimes, since commit 5dcde09c80a8, dpdk devices are
not detached when del-port is issued; command example:

sudo ovs-vsctl del-port br0 dpdk1

This can happen when vswitchd is (re)started with an existing
database and devices are already bound to dpdk.

A minimal recipe to reproduce the issue is:

1/ Starting with

darrell@prmh-nsx-perf-server125:~$ sudo ovs-vsctl show
1c50d8ee-b17f-4fac-a595-03b0da8c8275
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.1"}
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:04:00.0"}

darrell@prmh-nsx-perf-server125:~$ /usr/src/dpdk-16.11/tools/dpdk-devbind.py --status

Network devices using DPDK-compatible driver

============================================
0000:04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci
0000:04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=uio_pci_generic unused=ixgbe,vfio-pci

2/ restart vswitchd

3/ run
 sudo ovs-vsctl del-port br0 dpdk1

and find the interface is NOT detached; there is
no info log ‘Device '0000:04:00.1' detached’.

A more verbose discussion is here:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333462.html
along with another possible solution.

Since we are nearing the end of a release, a safe approach is needed,
at this time.
One approach is to revert 5dcde09c80a8.  This patch does not do that
but reinstates the command ovs-appctl netdev-dpdk/detach to handle
cases when del-port will not work.

To detach the device, run the reinstated command
ovs-appctl netdev-dpdk/detach 0000:04:00.1
Observe console output
‘Device '0000:04:00.1' has been detached’

Fixes: 5dcde09c80a8 ("netdev-dpdk: Fix device leak on port deletion.")
CC: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Fischetti, Antonio <antonio.fischetti@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:18:04 -07:00
Michal Weglicki
f3e7ec2547 Update relevant artifacts to add support for DPDK 17.05.1.
Upgrading to DPDK 17.05.1 stable release adds new
significant features relevant to OVS, including,
but not limited to:
- tun/tap PMD,
- VFIO hotplug support,
- Generic flow API.

Following changes are applied:
- netdev-dpdk: Changes required by DPDK API modifications.
- doc: Because of DPDK API changes, backward compatibility
  with previous DPDK releases will be broken, thus all
  relevant documentation entries are updated.
- .travis: DPDK version change from 16.11.1 to 17.05.1.
- rhel/openvswitch-fedora.spec.in: DPDK version change
  from 16.11 to 17.05.1

Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com>
Acked-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Ian Stokes <ian.stokes@intel.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:18:00 -07:00
Mark Kavanagh
67fe6d6351 netdev-dpdk: use rte_eth_dev_set_mtu.
DPDK provides an API to set the MTU of compatible physical devices -
rte_eth_dev_set_mtu(). Prior to DPDK v16.07 however, this API was not
implemented in some DPDK PMDs (i40e, specifically). To allow the use
of jumbo frames with affected NICs in OvS-DPDK, MTU configuration was
achieved by setting the jumbo frame flag, and corresponding maximum
permitted Rx frame size, in an rte_eth_conf structure for the NIC
port, and subsequently invoking rte_eth_dev_configure() with that
configuration.

However, that method does not set the MTU field of the underlying DPDK
structure (rte_eth_dev) for the corresponding physical device;
consequently, rte_eth_dev_get_mtu() reports the incorrect MTU for an
OvS-DPDK phy device with non-standard MTU.

Resolve this issue by invoking rte_eth_dev_set_mtu() when setting up
or modifying the MTU of a DPDK phy port.

Fixes: 0072e93 ("netdev-dpdk: add support for jumbo frames")
Reported-by: Aaron Conole <aconole@redhat.com>
Reported-by: Vipin Varghese <vipin.varghese@intel.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Sugesh Chandran <sugesh.chandran@intel.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-08-02 10:17:58 -07:00
Kevin Traynor
2cfe866fb3 netdev-dpdk: Log Rx checksum offload not supported.
Rx checksum offload is enabled by default on DPDK NICs where
supported. Previously Rx checksum offload not supported was
logged only once. It meant that if multiple NICs did not
support Rx checksum offload, it was only reported for the
first NIC configured.

Fixes: 1a2bb11817a4 ("netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.")
Reported-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-11 22:09:22 -07:00
Kevin Traynor
d4f5282cf1 netdev-dpdk: Remove Rx checksum reconfigure.
Rx checksum offload is enabled by default on DPDK physical NICs
where available, with reconfiguration through
options:rx-checksum-offload. However, changing rx-checksum-offload
did not result in a reconfiguration of the NIC and wrong status is
reported for it.

As there seems to be diminishing reasons why a user would want
to disable Rx checksum offload, just remove the broken reconfiguration
option.

Fixes: 1a2bb11817a4 ("netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.")
Reported-by: Kevin Traynor <ktraynor@redhat.com>
Suggested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Acked-by: Darrell Ball <dlu998@gmail.com>
Tested-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-07-11 22:09:19 -07:00
Ben Pfaff
875ab13020 userspace: Handling of versatile tunnel ports
In netdev_gre_build_header(), GRE protocol and VXLAN next_potocol is set based
on packet_type of flow. If it's about an Ethernet packet, it is set to
ETP_TYPE_TEB. Otherwise, if the name space is OFPHTN_ETHERNET, it is set
according to the name space type.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-06-27 17:28:30 -04:00
Santosh Shukla
31b88c9751 netdev-dpdk: round up mbuf_size to cache_line_size
Some pmd driver(e.g: vNIC thunderx PMD) want mbuf_size to be multiple of
cache_line_size. With out this fix, Netdev-dpdk initialization would
fail for those PMD.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ian Stokes <ian.stokes@intel.com>
2017-06-14 14:04:40 -07:00