mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

Author	SHA1	Message	Date
Róbert Mulik	f8b64a61bc	Configurable Link State Change (LSC) detection mode It is possible to set LSC detection mode to polling or interrupt mode for DPDK interfaces. The default is polling mode. To set interrupt mode, option dpdk-lsc-interrupt has to be set to true. For detailed description and usage see the dpdk install documentation. Signed-off-by: Robert Mulik <robert.mulik@ericsson.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Jan Scheurich	8492adc270	netdev: Add optional qfill output parameter to rxq_recv() If the caller provides a non-NULL qfill pointer and the netdev implemementation supports reading the rx queue fill level, the rxq_recv() function returns the remaining number of packets in the rx queue after reception of the packet burst to the caller. If the implementation does not support this, it returns -ENOTSUP instead. Reading the remaining queue fill level should not substantilly slow down the recv() operation. A first implementation is provided for ethernet and vhostuser DPDK ports in netdev-dpdk.c. This output parameter will be used in the upcoming commit for PMD performance metrics to supervise the rx queue fill level for DPDK vhostuser ports. Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Pablo Cascón	65a87968f4	netdev-dpdk: don't enable scatter for jumbo RX support for nfp Currently to RX jumbo packets fails for NICs not supporting scatter. Scatter is not strictly needed for jumbo RX support. This change fixes the issue by not enabling scatter only for the PMD/NIC known not to need it to support jumbo RX. Note: this change is temporary and not needed for later releases OVS/DPDK Reported-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-05-11 08:08:24 +01:00
Kevin Traynor	91fccdad72	netdev-dpdk: Free mempool only when no in-use mbufs. DPDK mempools are freed when they are no longer needed. This can happen when a port is removed or a port's mtu is reconfigured so that a new mempool is used. It is possible that an mbuf is attempted to be returned to a freed mempool from NIC Tx queues and this can lead to a segfault. In order to prevent this, only free mempools when they are not needed and have no in-use mbufs. As this might not be possible immediately, create a free list of mempools and sweep it anytime a port tries to get a mempool. Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak") Cc: mark.b.kavanagh81@gmail.com Cc: Ilya Maximets <i.maximets@samsung.com> Reported-by: Venkatesan Pradeep <venkatesan.pradeep@ericsson.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-04-21 16:59:45 +01:00
Kevin Traynor	1dfebee971	netdev-dpdk: Remove 'error' from non error log. Presently, if OVS tries to setup more queues than are allowed by a specific NIC, OVS will handle this case by retrying with a lower amount of queues. Rather than reporting initial failed queue setups in the logs as ERROR, they are reported as INFO but contain the word 'error'. Unless a user has detailed knowledge of OVS-DPDK workings, this is confusing. Let's remove 'error' from the INFO log. Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-03-23 11:35:34 +00:00
Ilya Maximets	fa9f4eebd3	netdev-dpdk: Fix print format for dpdk port ids. Since 17.11 release DPDK uses uint16 for port_id. Format strings for printing functions must be updated accordingly. CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 5e925ccc2a6f ("netdev-dpdk: DPDK v17.11 upgrade") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-03-23 11:28:35 +00:00
Justin Pettit	e883448e3f	dp-packet: Add index to DP_PACKET_BATCH_FOR_EACH to prevent shadowing. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2018-02-28 14:53:27 -08:00
Ciara Loftus	10087cba9d	netdev-dpdk: Add support for vHost dequeue zero copy (experimental) Zero copy is disabled by default. To enable it, set the 'dq-zero-copy' option to 'true' when configuring the Interface: ovs-vsctl set Interface dpdkvhostuserclient0 options:vhost-server-path=/tmp/dpdkvhostuserclient0 options:dq-zero-copy=true When packets from a vHost device with zero copy enabled are destined for a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. Due to this requirement, the feature is considered 'experimental'. Testing of the patch showed a ~8% improvement when switching 512B packets between vHost devices on different VMs on the same host when zero copy was enabled on the transmitting device. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-31 14:04:35 +00:00
Ilya Maximets	ac1a9bb93f	netdev-dpdk: Fix xstats leak on port destruction. CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Ilya Maximets	34eb086342	netdev-dpdk: Fix memory leak in netdev_dpdk_configure_xstats(). CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Ilya Maximets	526259f22c	netdev-dpdk: Fix memory leak in netdev_dpdk_get_custom_stats(). CC: Michal Weglicki <michalx.weglicki@intel.com> Fixes: 971f4b394c6e ("netdev: Custom statistics.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Yuanhan Liu	5e75881868	netdev-dpdk: fix port addition for ports sharing same PCI id Some NICs have only one PCI address associated with multiple ports. This patch extends the dpdk-devargs option's format to cater for such devices. To achieve that, this patch uses a new syntax that will be adapted and implemented in future DPDK release (likely, v18.05): http://dpdk.org/ml/archives/dev/2017-December/084234.html And since it's the DPDK duty to parse the (complete and full) syntax and this patch is more likely to serve as an intermediate workaround, here I take a simpler and shorter syntax from it (note it's allowed to have only one category being provided): class=eth,mac=00:11:22:33:44:55:66 Also, old compatibility is kept. Users can still go on with using the PCI id to add a port (if that's enough for them). Meaning, this patch will not break anything. This patch is basically based on the one from Ciara: https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/339496.html Cc: Loftus Ciara <ciara.loftus@intel.com> Cc: Thomas Monjalon <thomas@monjalon.net> Cc: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-26 20:49:18 +00:00
Ian Stokes	f6f50552a3	netdev-dpdk: Fix requested MTU size validation. This commit replaces MTU_TO_FRAME_LEN(mtu) with MTU_TO_MAX_FRAME_LEN(mtu) in netdev_dpdk_set_mtu(), in order to determine if the total length of the L2 frame with an MTU of ’mtu’ exceeds NETDEV_DPDK_MAX_PKT_LEN. When setting an MTU we first check if the requested total frame length (which includes associated L2 overhead) will exceed the maximum frame length supported in netdev_dpdk_set_mtu(). The frame length is calculated by MTU_TO_FRAME_LEN as MTU + ETHER_HEADER + ETHER_CRC. The MTU for the device will be set at a later stage in dpdk_eth_dev_init() using rte_eth_dev_set_mtu(mtu). However when using rte_eth_dev_set_mtu(mtu) the calculation used to check that the frame does not exceed the max frame length for that device varies between DPDK device drivers. For example ixgbe driver calculates the frame length for a given MTU as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN i40e driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + I40E_VLAN_TAG_SIZE * 2 em driver calculates it as mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE Currently it is possible to set an MTU for a netdev_dpdk device that exceeds the upper limit MTU for that devices DPDK driver. This leads to a segfault. This is because the frame length comparison as is, does not take into account the addition of the vlan tag overhead expected in the drivers. The netdev_dpdk_set_mtu() call will incorrectly succeed but the subsequent dpdk_eth_dev_init() will fail before the queues have been created for the DPDK device. This coupled with assumptions regarding reconfiguration requirements for the netdev will lead to a segfault when the rxq is polled for this device. A simple way to avoid this is by using MTU_TO_MAX_FRAME_LEN(mtu) when validating a requested MTU in netdev_dpdk_set_mtu(). MTU_TO_MAX_FRAME_LEN(mtu) is equivalent to the following: mtu + ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN) By using MTU_TO_MAX_FRAME_LEN at the netdev_dpdk_set_mtu() stage, OvS now takes into account the maximum L2 overhead that a DPDK driver could allow for in its frame size calculation. This allows OVS to flag an error rather than the DPDK driver if the frame length exceeds the max DPDK frame length. OVS can fail gracefully at this point and use the default MTU of 1500 to continue to configure the port. Note: this fix is a work around, a better approach would be if DPDK devices could report the maximum MTU value that can be requested on a per device basis. This capability however is not currently available. A downside of this patch is that the MTU upper limit will be reduced by 8 bytes for DPDK devices that do not need to account for vlan tags in the frame length driver calculations e.g. ixgbe devices upper MTU limit is reduced from the OVS point of view from 9710 to 9702. CC: Mark Kavanagh <mark.b.kavanagh@intel.com> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ian Stokes <ian.stokes@intel.com> Co-authored-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2018-01-26 20:49:18 +00:00
Flavio Leitner	b2e8b12f8a	netdev-dpdk: add vhost-user get_status. Expose relevant vhost-user information in status. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:12:46 +00:00
zhangliping	4c47ddde34	netdev-dpdk: fix ingress_policer leak on error path Fix memory leak by freeing the policer if rte_meter_srtcm_config fails. Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.") Signed-off-by: zhangliping <zhangliping02@baidu.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2018-01-17 18:11:28 +00:00
Michal Weglicki	971f4b394c	netdev: Custom statistics. - New get_custom_stats interface function is added to netdev. It allows particular netdev implementation to expose custom counters in dictionary format (counter name/counter value). - New statistics are retrieved using experimenter code and are printed as a result to ofctl dump-ports. - New counters are available for OpenFlow 1.4+. - New statistics are printed to output via ofctl only if those are present in reply message. - New statistics definition is added to include/openflow/intel-ext.h. - Custom statistics are implemented only for dpdk-physical port type. - DPDK-physical implementation uses xstats to collect statistics. Only dropped and error counters are exposed. Co-authored-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2018-01-10 15:29:13 -08:00
Ilya Maximets	ad8b0b4fe7	netdev: Remove useless cutlen. Cutlen already applied while processing OVS_ACTION_ATTR_OUTPUT. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Ilya Maximets	b30896c969	netdev: Remove unused may_steal. Not needed anymore because 'may_steal' already handled on dpif-netdev layer and always true. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com	2017-12-20 21:07:46 +00:00
Ilya Maximets	be48173310	netdev-dpdk: Add debug appctl to get mempool information. New appctl 'netdev-dpdk/get-mempool-info' implemented to get result of 'rte_mempool_list_dump()' function if no arguments passed and 'rte_mempool_dump()' if DPDK netdev passed as argument. Could be used for debugging mbuf leaks and other mempool related issues. Most useful in pair with `grep -v "cache_count.*=0"`. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-20 21:07:46 +00:00
Michal Weglicki	3eb8d4fa0d	netdev-dpdk: extend netdev_dpdk_get_status to include if_type and if_descr This commit extends netdev_dpdk_get_status API to include additional driver-related information: if_type and if_descr. v2->v3: Code rebase. v3->v4: Minor comments applied. v5->v6: Adds DPDK port specific description in documentation. Co-authored-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-08 21:42:54 +00:00
Mark Kavanagh	a14d1cc8a7	netdev-dpdk: vHost IOMMU support DPDK v17.11 introduces support for the vHost IOMMU feature. This is a security feature, which restricts the vhost memory that a virtio device may access. This feature also enables the vhost REPLY_ACK protocol, the implementation of which is known to work in newer versions of QEMU (i.e. v2.10.0), but is buggy in older versions (v2.7.0 - v2.9.0, inclusive). As such, the feature is disabled by default in (and should remain so), for the aforementioned older QEMU verions. Starting with QEMU v2.9.1, vhost-iommu-support can safely be enabled, even without having an IOMMU device, with no performance penalty. This patch adds a new global config option, vhost-iommu-support, that controls enablement of the vhost IOMMU feature: ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true This value defaults to false; to enable IOMMU support, this field should be set to true when setting other global parameters on init (such as "dpdk-socket-mem", for example). Changing the value at runtime is not supported, and requires restarting the vswitch daemon. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-08 21:42:54 +00:00
Mark Kavanagh	5e925ccc2a	netdev-dpdk: DPDK v17.11 upgrade This commit adds support for DPDK v17.11: - minor updates to accomodate DPDK API changes - update references to DPDK version in Documentation - update DPDK version in travis' linux-build script - document DPDK v17.11 virtio driver bug Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Jan Scheurich <jan.scheurich@ericsson.com> Tested-by: Guoshuai Li <ligs@dtdream.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-08 21:42:54 +00:00
Kevin Traynor	255b7bda98	netdev-dpdk: Remove uneeded call to rte_eth_dev_count(). The call to rte_eth_dev_count() was added as workaround for rte_eth_dev_get_port_by_name() not handling cases when there was no DPDK ports. In versions of DPDK >= 17.02 rte_eth_dev_get_port_by_name() does handle this case (DPDK commit f9ae888b1e19). rte_eth_dev_count() is no longer needed so remove it. Acked-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-08 21:42:54 +00:00
Ilya Maximets	b2e72a9c9d	netdev-dpdk: Add comment about variables naming convention. It'll be nice to document current naming convention for variables of the following types used in netdev-dpdk: * netdev * netdev_dpdk * netdev_rxq * netdev_rxq_dpdk to be sure that we will not return to chaos which was before commit d46285a2206f ("netdev-dpdk: Consistent variable naming."). Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-12-08 21:42:54 +00:00
Ilya Maximets	3d0d5ab153	netdev-dpdk: Fix variables naming in set_admin_state function. Function 'netdev_dpdk_set_admin_state()' was missed while fixing variables naming according to the following convention: 'struct netdev':'netdev' 'struct netdev_dpdk':'dev' 'struct netdev_rxq':'rxq' 'struct netdev_rxq_dpdk':'rx' Fixes: d46285a2206f ("netdev-dpdk: Consistent variable naming.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokess <ian.stokes@intel.com>	2017-12-08 21:42:54 +00:00
Ilya Maximets	af5b0dad30	netdev-dpdk: Fix mempool creation with large MTU. Currently mempool name size limited to 25 characters by RTE_MEMPOOL_NAMESIZE. netdev-dpdk tries to create mempool with the following name pattern: "ovs_%{hash}_%{socket}_%{mtu}_%{n_mbuf}". We have 3 chars for "ovs" + 4 chars for delimiters + 8 chars for hash (because it's the 32 bit integer printed in hex) + 1 char for socket_id (mostly 1, but it could be 2 on some systems; larger?) = 16. Only 25 - 16 = 9 characters remains for mtu + n_mbufs. Minimum usual value for mtu is 1500 --> 2030 (4 chars) after dpdk_buf_size conversion and the minimum value for n_mbufs is 16384 (5 chars). So, all the 9 characters are used. If we'll try to create port with mtu = 9500, mempool creation will fail, because FRAME_LEN_TO_MTU(dpdk_buf_size(9500)) = 10222 (5 chars) and this value will overflow the RTE_MEMPOOL_NAMESIZE limit. Same issue will happen if we'll try to create port with big enough number of queues or will try to create big enough number of PMD threads (number of tx queues will enlarge the mempool requirements). Fix that by removing the delimiters. To keep the readability (at least partial) of the mempool names exact field sizes with zero padding are used. Following limits should be suitable for now: - Hash length: 8 chars (uint32_t in hex) - Socket ID : 2 chars (For systems with up to 10 sockets) - MTU : 5 chars (MTU (10^5 - 1) should be enough for now) - n_mbufs : 7 chars (Up to 10^7 of mbufs) Total : 22 + 3 (for "ovs") = 25 CC: Antonio Fischetti <antonio.fischetti@intel.com> CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Fixes: f06546a51dd8 ("Fix mempool names to reflect socket id.") Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Tested-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-11-17 16:26:33 +00:00
Ilya Maximets	daf22bf7a8	netdev-dpdk: Fix calling vhost API with negative vid. Currently, rx and tx functions for vhost interfaces always obtain 'vid' twice. First time inside 'is_vhost_running' for checking the value and the second time in enqueue/dequeue function calls to send/receive packets. But second time we're not checking the returned value. If vhost device will be destroyed between checking and enqueue/dequeue, DPDK API will be called with '-1' instead of valid 'vid'. DPDK API does not validate the 'vid'. This leads to getting random memory value as a pointer to internal device structure inside DPDK. Access by this pointer leads to segmentation fault. For example: \|00503\|dpdk\|INFO\|VHOST_CONFIG: read message VHOST_USER_GET_VRING_BASE [New Thread 0x7fb6754910 (LWP 21246)] Program received signal SIGSEGV, Segmentation fault. rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630 630 if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) (gdb) bt full #0 rte_vhost_enqueue_burst at lib/librte_vhost/virtio_net.c:630 dev = 0xffffffff #1 __netdev_dpdk_vhost_send at lib/netdev-dpdk.c:1803 tx_pkts = <optimized out> cur_pkts = 0x7f340084f0 total_pkts = 32 dropped = 0 i = <optimized out> retries = 0 ... (gdb) p ((struct netdev_dpdk ) netdev) $8 = { ... , flags = (NETDEV_UP \| NETDEV_PROMISC), ... , vid = {v = -1}, vhost_reconfigured = false, ... } Issue can be reproduced by stopping DPDK application (testpmd) inside guest while heavy traffic flows to this VM. Fix that by obtaining and checking the 'vid' only once. CC: Ciara Loftus <ciara.loftus@intel.com> Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Billy O'Mahony <billy.o.mahony@intel.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-11-16 16:24:11 +00:00
Ilya Maximets	bc57ed901f	netdev-dpdk: Remove unused MAX_NB_MBUF. MAX_NB_MBUF was used as a default mempool size for almost all ports. Not used since new per-port mempool allocation introduced. MIN_NB_MBUF still used as a lower limit. CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-11-16 16:24:11 +00:00
Ilya Maximets	24e78f9350	netdev-dpdk: Factor out struct dpdk_mp. Since commit d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port."), struct dpdk_mp is redundant because each mempool can be used by single port only and this port already contains all the information we store in dpdk_mp. There is no need to duplicate the information. Fields of this structure currently used only to generate mempool name. But it's required only while creation and after that we can use mp->name directly from the struct rte_mempool. Let's remove this structure and use struct rte_mempool directly instead. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-11-16 16:24:11 +00:00
Ilya Maximets	173ef76bbf	netdev-dpdk: Fix dpdk_mp leak in case of EEXIST. In case of EEXIST, 'dpdk_mp_create()' will allocate yet another 'struct dpdk_mp' with same 'mp' pointer inside. We need to free this structure to avoid the leak. CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> CC: Antonio Fischetti <antonio.fischetti@intel.com> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Fixes: b6b26021d2e2 ("netdev-dpdk: fix management of pre-existing mempools.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-11-16 16:24:11 +00:00
Mark Kavanagh	7ee94cbac8	netdev-dpdk: replace uint8_t with dpdk_port_t netdev_dpdk_detach() declares a 'port_id' variable, of type uint8_t. This variable should instead be of type dpdk_port_t. Fixes: bb37956ac ("netdev-dpdk: Use uint8_t for port_id.") CC: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-11-16 16:24:11 +00:00
Bhanuprakash Bodireddy	23d4d53f14	netdev-dpdk: Refactor netdev_dpdk structure. This commit introduces below changes to netdev_dpdk structure. - Mark cachelines and reorder few member variables. - Maintain the grouping of related member variables. - Add comment on the information on pad bytes where ever appropriate, so new members can be introduced in the future to fill the gaps. Below is how this structure looks with this commit. Member size OVS_CACHE_LINE_MARKER cacheline0; dpdk_port_t port_id; 1 bool attached; 1 ... OVS_CACHE_LINE_MARKER cacheline1; struct ovs_mutex; 48 struct dpdk_mp *dpdk_mp; 8 ... Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-11-03 13:36:53 -07:00
Ilya Maximets	ec6edc8cde	netdev-dpdk: Fix mp_name leak on snprintf failure. CC: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> CC: Antonio Fischetti <antonio.fischetti@intel.com> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Fixes: 65056fd79694 ("netdev-dpdk: manage failure in mempool name creation.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2017-10-30 14:58:52 -07:00
antonio.fischetti@intel.com	a08a115d26	netdev-dpdk: Rename dpdk_mp_put as dpdk_mp_free. For readability purposes dpdk_mp_put is renamed as dpdk_mp_free. CC: Mark B Kavanagh <mark.b.kavanagh@intel.com> CC: Darrell Ball <dlu998@gmail.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Kevin Traynor <ktraynor@redhat.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-10-26 20:45:33 +01:00
antonio.fischetti@intel.com	ad9b5b9bc7	netdev-dpdk: Reword mp_size as n_mbufs. For code readability purposes mp_size is renamed as n_mbufs in dpdk_mp structure. This parameter is passed to rte mempool creation functions and is meant to contain the number of elements inside the requested mempool. CC: Ciara Loftus <ciara.loftus@intel.com> CC: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-10-26 20:23:41 +01:00
antonio.fischetti@intel.com	65056fd796	netdev-dpdk: manage failure in mempool name creation. In case a mempool name could not be generated log a message and return a null mempool pointer to the caller. CC: Mark B Kavanagh <mark.b.kavanagh@intel.com> CC: Darrell Ball <dlu998@gmail.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Kevin Traynor <ktraynor@redhat.com> CC: Aaron Conole <aconole@redhat.com> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-10-26 20:17:27 +01:00
antonio.fischetti@intel.com	837c1761be	netdev-dpdk: skip init for existing mempools. Skip initialization of mempool packet areas if this was already done in a previous call to dpdk_mp_create. CC: Darrell Ball <dlu998@gmail.com> CC: Ciara Loftus <ciara.loftus@intel.com> CC: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-10-26 20:08:01 +01:00
antonio.fischetti@intel.com	f06546a51d	Fix mempool names to reflect socket id. Create mempool names by considering also the NUMA socket number. So a name reflects what socket the mempool is allocated on. This change is needed for the NUMA-awareness feature. CC: Mark B Kavanagh <mark.b.kavanagh@intel.com> CC: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Reported-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-10-26 19:49:29 +01:00
antonio.fischetti@intel.com	b6b26021d2	netdev-dpdk: fix management of pre-existing mempools. Fix an issue on reconfiguration of pre-existing mempools. This patch avoids to call dpdk_mp_put() - and erroneously release the mempool - when it already exists. CC: Mark B Kavanagh <mark.b.kavanagh@intel.com> CC: Aaron Conole <aconole@redhat.com> CC: Darrell Ball <dlu998@gmail.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Reported-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Ciara Loftus <ciara.loftus@intel.com> Reported-by: Róbert Mulik <robert.mulik@ericsson.com> Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each port.") Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>	2017-10-26 19:46:58 +01:00
Zoltan Balogh	75fb914892	netdev-dpdk: reset packet_type for reused dp_packets. DPDK uses dp-packet pool for storing received packets. The pool is reused by rxq_recv funcions of the DPDK netdevs. The datapath is capable to modify the packet_type property of packets. For instance when encapsulated L3 packets are received on a ptap gre port. In this case the packet_type property of struct dp_packet can be modified and later the same dp_packet with the modified packet_type can be reused in the rxq_rec function, so it can contain corrupted data. The dp_packet_batch_init_cutlen() in the rxq_recv functions iterates over dp_packets and sets their cutlen. So I modified this function to set packet_type to Ethernet for the dp_packets as well. I also renamed this function because of the added functionality. The dp_packet_batch_init_cutlen() iterates over batch->count dp_packet. Therefore setting of batch->count = nb_rx needs to be done before the former function is invoked. This is an additional fix. Signed-off-by: Zoltan Balogh <zoltan.balogh@ericsson.com> Signed-off-by: Laszlo Suru <laszlo.suru@ericsson.com> Co-authored-by: Laszlo Suru <laszlo.suru@ericsson.com> CC: Jan Scheurich <jan.scheurich@ericsson.com> CC: Sugesh Chandran <sugesh.chandran@intel.com> CC: Darrell Ball <dlu998@gmail.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-22 02:58:13 -07:00
Bhanuprakash Bodireddy	fd57eebacb	netdev-dpdk: Minor cleanup of netdev_dpdk_send__. The variable 'cnt' is initialized and reused in multiple function calls inside netdev_dpdk_send__() and is confusing sometimes. Instead introduce 'batch_cnt' to hold the original packet count and 'tx_cnt' to store the final packet count resulting after filtering and qos operations. Finally 'tx_cnt' packets gets transmitted on the respective 'qid'. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-22 02:14:59 -07:00
Bhanuprakash Bodireddy	8a14bd7b6b	netdev-dpdk: Cleanup dpdk_do_tx_copy. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-22 02:12:22 -07:00
Bhanuprakash Bodireddy	8a543eb03a	netdev-dpdk: Use DP_PACKET_BATCH_FOR_EACH in netdev_dpdk_ring_send. Use DP_PACKET_BATCH_FOR_EACH macro in netdev_dpdk_ring_send(). Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-22 02:05:49 -07:00
Gao Zhenyu	3e90f7d753	netdev-dpdk: Execute QoS Checking before copying to mbuf. In dpdk_do_tx_copy function, all packets were copied to mbuf first, but QoS checking may drop some of them. Move the QoS checking in front of copying data to mbuf, it helps to reduce useless copy. Signed-off-by: Zhenyu Gao <sysugaozhenyu@gmail.com> Acked-by: ian.stokes@intel.com<mailto:ian.stokes@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-06 22:27:53 -07:00
Robert Wojciechowicz	d555d9bded	netdev-dpdk: Create separate memory pool for each port. Since it's possible to delete memory pool in DPDK we can try to estimate better required memory size when port is reconfigured, e.g. with different number of rx queues. CC: Kevin Traynor <ktraynor@redhat.com> CC: Aaron Conole <aconole@redhat.com> Signed-off-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-05 12:02:25 -07:00
wangzhike	894af647a8	netdev-dpdk: update vhost user client port status. After ovs-vswitchd reboots, vhost user client port status is displayed as LINK DOWN though the traffic is OK. The problem is that the port may be udpated while the vhost_reconfigured is false. Then the vhost_reconfigured is updated to true. As a result, the vhost port status is kept as LINK-DOWN. Signed-off-by: wangzhike <wangzhike@jd.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-09-05 12:02:25 -07:00
wangzhike	50986e7842	netdev-dpdk: vhost get stats fix. In netdev_dpdk_vhost_get_stats, '+=' was used in a few places where '=' was expected. Signed-off-by: wangzhike <wangzhike@jd.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-08-25 14:49:55 -07:00
Lance Richardson	602c86681c	netdev-dpdk: use 64-bit arithmetic when converting rates. Force 64-bit arithmetic to be used when converting uint32_t rate and burst parameters from kilobits per second to bytes per second, avoiding incorrect behavior for rates exceeding UINT_MAX bits per second. Reported-by: "王志克" <wangzhike@jd.com> Fixes: 9509913aa722 ("netdev-dpdk.c: Add ingress-policing functionality.") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-By: Mark Michelson <mmichels@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-08-25 14:35:31 -07:00
Darrell Ball	11d4c7a843	dp-packet: Refactor DPDK packet initialization. DPDK uses dp-packet pools and manages the mbuf portion of each packet. When a pool is created, partial initialization is also done on the OVS portion (i.e. non-mbuf). Since packet memory is reused, this is not very useful for transient fields and is also misleading. Furthermore, some of these transient fields are properly initialized for DPDK packets entering OVS anyways, which is the only reasonable way to do this. Another field, cutlen, is initialized in this manner in the pool and intended to be reset when cutlen is applied on sending the packet out. However, if cutlen context is set but the packet is not sent out for some reason, then the packet header would be corrupted in the memory pool. It is better to just reset the cutlen in the packets when received. I did not detect a degradation in performance, however, I would be willing to have some degradation, since this is a proper way to handle this. In addition to initializing cutlen in received packets, the other OVS transient fields are removed from the DPDK pool initialization. Acked-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Darrell Ball <dlu998@gmail.com>	2017-08-24 22:09:58 -07:00
Aaron Conole	fc56f5e0f5	netdev-dpdk: include dpdk PCI header directly As part of a devargs rework in DPDK, the PCI header file was removed, and needs to be directly included. This isn't required to build with 17.05 or earlier, but will be required should a future update happen. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Tested-By: Timothy Redaelli <tredaelli@redhat.com> Acked-by: Ciara Loftus <ciara.loftus@intel.com>	2017-08-10 13:44:24 -07:00

1 2 3 4 5 ...

326 Commits