mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

Author	SHA1	Message	Date
Daniele Di Proietto	819f13bd39	netdev-dpdk: Acquire dev->stats_lock only once. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>	2016-10-12 15:30:04 -07:00
Daniele Di Proietto	78bd47cf44	netdev-dpdk: Use RCU for egress QoS. I think it's clearer to use RCU than to check for a pointer twice in the fast path (before and after taking the spinlock). Now the spinlock is integrated into 'qos_conf'. 'qos_conf' objects cannot be modified, so, instead of having 'qos_set()', we now have 'qos_is_equal()', which tells us if an object must be destroyed and recreated. With this patch we also avoid passing the netdev parameter to qos ops, since it was unused most of the times. Lastly, some duplication is removed. CC: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>	2016-10-12 15:25:45 -07:00
Daniele Di Proietto	2ae3d5421b	netdev-dpdk: Refactor dpdk_mp_get(). The error handling path in dpdk_mp_get() is getting complicated, it even requires a boolean variable. Simplify it by extracting the function dpdk_mp_create(). CC: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com>	2016-10-12 15:16:34 -07:00
Ilya Maximets	b614c894ee	netdev-dpdk: Configure flow control only when necessary. It is not necessary to touch the physical device each time, if the configuration has not been changed. Also, few style issues fixed. Thread-safety annotation added to 'dpdk_set_rxq_config()'. It was missed while previous refactoring of the flow control configuration. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: Sugesh Chandran <sugesh.chandran@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-30 10:59:00 -07:00
Ciara Loftus	b685696b8c	netdev-dpdk: Allow configurable queue sizes for 'dpdk' ports The 'options:n_rxq_desc' and 'n_txq_desc' fields allow the number of rx and tx descriptors for dpdk ports to be modified. By default the values are set to 2048, but can be modified to an integer between 1 and 4096 that is a power of two. The values can be modified at runtime, however require the NIC to restart when changed. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Yunhong Jiang <yunhong.jiang@linux.intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-30 10:58:39 -07:00
Mark Kavanagh	58be5c0eec	netdev-dpdk: Fix coding style Coding style violations of the following conventions are present in netdev-dpdk.c: - limit lines to 79 characters - put a space after (but not before) the "sizeof" keyword - put a space between the () used in a cast and the expression whose type is cast: (void *) 0. Resolve occurrences of each, and any other minor style infractions. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-29 11:41:33 -07:00
Mark Kavanagh	2391135ca7	netdev-dpdk: consistent naming for mbuf variables Pointers to struct rte_mbuf are typically denoted within functions as 'pkt'; similarly, arrays of, and pointer-to-pointer to, struct rte_mbuf are denoted by 'pkts'. Update discrepancies to the above convention for consistency. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-29 11:41:07 -07:00
Ilya Maximets	c2adb102e2	netdev-dpdk: Introduce dpdk_mp_mutex. 'dpdk_mutex' protects two independent things: list of dpdk devices and list of memory pools. Let's spit it in two to avoid global blocking inside 'netdev_dpdk.*_reconfigure()' as possible. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-29 11:04:11 -07:00
Ilya Maximets	4196454379	netdev-dpdk: More correct log message on vhost_driver_unregister failure. Current error message incorrect for the client mode. Fixes: c1ff66ac80b5 ("netdev-dpdk: vHost client mode and reconnect") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-23 13:27:21 -07:00
Ilya Maximets	6881885a3b	netdev-dpdk: Add missed lock in set_config for vhost client mode. 'vhost_driver_flags' and 'vhost_id' are mutable and must be protected by 'dev->mutex'. Fixes: 2d24d165d6a5 ("netdev-dpdk: Add new 'dpdkvhostuserclient' port type") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-23 13:23:14 -07:00
Ilya Maximets	5f88de0d9f	netdev-dpdk: Fix memory leak in dpdk_mp_{get, put}(). 'dmp' should be freed on failure and on put. Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.") Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-19 14:32:52 -07:00
Ciara Loftus	2d24d165d6	netdev-dpdk: Add new 'dpdkvhostuserclient' port type The 'dpdkvhostuser' port type no longer supports both server and client mode. Instead, 'dpdkvhostuser' ports are always 'server' mode and 'dpdkvhostuserclient' ports are always 'client' mode. Suggested-by: Daniele Di Proietto <diproiettod@vmware.com> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-09-19 14:02:06 -07:00
Ciara Loftus	5b9bf9e067	netdev-dpdk: Fix occurance of error log If NUMA information can't be derived from a vHost User device, only print an error if the VHOST_NUMA option is enabled in DPDK. Otherwise 'fail' silently. Fixes: 0a0f39df1d5a ("netdev-dpdk: Add support for DPDK 16.07") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Reported-by: Ian Stokes <ian.stokes@intel.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-18 17:04:27 -07:00
Ilya Maximets	6b094bf47b	netdev-dpdk: Simplify send function for ETH devices. 'netdev_dpdk_send__()' function can be greatly simplified by using recently introduced 'netdev_dpdk_filter_packet_len()'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-18 13:15:52 -07:00
Ilya Maximets	c6ec9d176d	netdev-dpdk: Fix vHost stats. This patch introduces function 'netdev_dpdk_filter_packet_len()' which is intended to find and remove all packets with 'pkt_len > max_packet_len' from the Tx batch. It fixes inaccurate counting of 'tx_bytes' in vHost case if there was dropped packets and allows to simplify send function. Fixes: 0072e931b207 ("netdev-dpdk: add support for jumbo frames") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-18 13:15:52 -07:00
Ciara Loftus	c1ff66ac80	netdev-dpdk: vHost client mode and reconnect Until now, vHost ports in OVS have only been able to operate in 'server' mode whereby OVS creates and manages the vHost socket and essentially acts as the vHost 'server'. With this commit a new mode, 'client' mode, is available. In this mode, OVS acts as the vHost 'client' and connects to the socket created and managed by QEMU which now acts as the vHost 'server'. This mode allows for reconnect capability, which allows a vHost port to resume normal connectivity in event of switch reset. By default dpdkvhostuser ports still operate in 'server' mode. That is unless a valid 'vhost-server-path' is specified for a device like so: ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-path=/path/to/socket 'vhost-server-path' represents the full path of the vhost user socket that has been or will be created by QEMU. Once specified, the port stays in 'client' mode for the remainder of its lifetime. QEMU v2.7.0+ is required when using OVS in vHost client mode and QEMU in vHost server mode. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-15 17:29:12 -07:00
Ciara Loftus	53f50d2400	netdev-dpdk: Consistent naming for vhost A mix of vhost_user_ and vhost_ is used when naming vhost functions. The 'user_' has been dropped for consistency. Also remove empty init functions for netdev dpdk classes. Suggested-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Daniele Di Proietto <diproiettod at vmware.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>	2016-08-15 17:29:12 -07:00
Ciara Loftus	4198764443	netdev-dpdk: Remove dpdkvhostcuse ports This commit removes the 'dpdkvhostcuse' port type from the userspace datapath. vhost-cuse ports are quickly becoming obsolete as the vhost-user port type begins to support a greater feature-set thanks to the addition of things like vhost-user multiqueue and potential upcoming features like vhost-user client-mode and vhost-user reconnect. The feature is also expected to be removed from DPDK soon. One potential drawback of the removal of this support is that a userspace vHost port type is not available in OVS for use with older versions of QEMU (pre v2.2). Considering v2.2 is nearly two years old this should however be a low impact change. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-08-15 17:29:12 -07:00
Ciara Loftus	c3d062a777	netdev-dpdk: Do not attempt to initialise flow control for 'dpdkr' ports Only 'dpdk' ports support flow control. This patch stops 'dpdkr' ports from attempting to initialise this feature as this port type does not support it. Fixes: 9fd39370c12c ("netdev-dpdk: Add Flow Control support.") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-15 13:10:18 -07:00
Ciara Loftus	7cd1261d13	netdev-dpdk: Use rte_eth_is_valid_port instead of manual check Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Mauricio Vásquez B <mauricio.vasquez@polito.it> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-15 13:10:18 -07:00
Mark Kavanagh	0072e931b2	netdev-dpdk: add support for jumbo frames Add support for Jumbo Frames to DPDK-enabled port types, using single-segment-mbufs. Using this approach, the amount of memory allocated to each mbuf to store frame data is increased to a value greater than 1518B (typical Ethernet maximum frame length). The increased space available in the mbuf means that an entire Jumbo Frame of a specific size can be carried in a single mbuf, as opposed to partitioning it across multiple mbuf segments. The amount of space allocated to each mbuf to hold frame data is defined dynamically by the user with ovs-vsctl, via the 'mtu_request' parameter. Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> [diproiettod@vmware.com rebased] Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-12 19:33:56 -07:00
Daniele Di Proietto	56abcf497b	vswitchd: Introduce 'mtu_request' column in Interface. The 'mtu_request' column can be used to set the MTU of a specific interface. This column is useful because it will allow changing the MTU of DPDK devices (implemented in a future commit), which are not accessible outside the ovs-vswitchd process, but it can be used for kernel interfaces as well. The current implementation of set_mtu() in netdev-dpdk is removed because it's broken. It will be reintroduced by a subsequent commit on this series. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-08-12 19:32:12 -07:00
Ciara Loftus	4b88d6787d	netdev-dpdk: add DPDK pdump capability This commit provides the ability to 'listen' on DPDK ports and save packets to a pcap file with a DPDK app that uses the librte_pdump library. One such app is the 'pdump' app that can be found in the DPDK 'app' directory. Instructions on how to use this can be found in INSTALL.DPDK-ADVANCED.md Pdump capability in OVS with DPDK will only be initialised if the CONFIG_RTE_LIBRTE_PMD_PCAP=y and CONFIG_RTE_LIBRTE_PDUMP=y options are set in DPDK. libpcap is required if the above configuration is used. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-12 17:56:43 -07:00
Ilya Maximets	dd52de45b7	netdev-dpdk: vhost: Fix double free and use after free with QoS. While using QoS with vHost interfaces 'netdev_dpdk_qos_run__()' will free mbufs while executing 'netdev_dpdk_policer_run()'. After that same mbufs will be freed at the end of '__netdev_dpdk_vhost_send()' if 'may_steal == true'. This behaviour will break mempool. Also 'netdev_dpdk_qos_run__()' will free packets even if we shouldn't do this ('may_steal == false'). This will lead to using of already freed packets by the upper layers. Fix that by copying all packets that we can't steal like it done for DPDK_DEV_ETH devices and freeing only packets not freed by QoS. Fixes: 0bf765f753fd ("netdev_dpdk.c: Add QoS functionality.") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-10 18:42:57 -07:00
Ilya Maximets	7f5f2bd0ce	netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device. Binding/unbinding of virtio driver inside VM leads to reconfiguration of PMD threads. This behaviour may be abused by executing bind/unbind in an infinite loop to break normal networking on all ports attached to the same instance of Open vSwitch. Fix that by avoiding reconfiguration if it's not necessary. Number of queues will not be decreased to 1 on device disconnection but it's not very important in comparison with possible DOS attack from the inside of guest OS. Fixes: 81acebdaaf27 ("netdev-dpdk: Obtain number of queues for vhost ports from attached virtio.") Reported-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-09 17:35:50 -07:00
Ian Stokes	7ea266e953	netdev-dpdk: Fix egress policer error detection bug. When egress policer is set as a QoS type for a port, an error may occur during setup if incorrect parameters are used for the rte_meter. If this occurs the egress policer construct and set functions should free any allocated memory relevant to the policer and set the QoS configuration pointer to null. The netdev_dpdk_set_qos function should check the error value returned for any QoS construct/set calls with an assertion to avoid segfault. Also this commit modifies egress_policer_qos_set() to correctly lock the QoS spinlock while the egress policer configuration is updated to avoid segfault. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-09 15:01:10 -07:00
Bhanuprakash Bodireddy	27373420c8	netdev-dpdk: Fix dead initialization reported by clang. Clang reports that value stored to 'tok' during initialization is never read. Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-09 14:54:20 -07:00
Daniele Di Proietto	3f891bbea6	netdev-dpdk: Fix deadlock in destroy_device(). netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which can trigger the destroy_device() callback. destroy_device() will try to take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a deadlock. This problem can be solved by dropping the mutexes before calling rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and construct() call are already serialized by netdev_mutex. This commit also makes clear that dev->vhost_id is constant and can be accessed without taking any mutexes in the lifetime of the devices. Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak") Reported-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-08-09 11:15:29 -07:00
Ben Pfaff	13c1637f5b	smap: New function smap_get_ullong(). Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>	2016-08-08 11:00:37 -07:00
Maxime Coquelin	d03603c485	netdev-dpdk: When no QoS set, set type to empty string This patch sets *typep to an empty string instead of letting it uninitialized when no QoS configuration is set. It fixes the following vswitchd crash when no QoS has been set on vhost-user interface: $> ovs-appctl -t ovs-vswitchd qos/show vhost-user1 #0 0x00007efcbadf18d7 in raise () from /lib64/libc.so.6 #1 0x00007efcbadf353a in abort () from /lib64/libc.so.6 #2 0x000000000068d5be in ovs_abort_valist at lib/util.c:335 #3 0x0000000000693d90 in vlog_abort_valist at lib/vlog.c:1204 #4 0x0000000000693e17 in vlog_abort at lib/vlog.c:1218 #5 0x000000000068d3ae in ovs_assert_failure at lib/util.c:72 #6 0x000000000060425c in ds_put_format_valist at lib/dynamic-string.c:168 #7 0x00000000006042e7 in ds_put_format at lib/dynamic-string.c:142 #8 0x00000000005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185 #9 0x000000000068cda1 in process_command at lib/unixctl.c:347 #11 unixctl_server_run at lib/unixctl.c:400 #12 0x000000000040a3ff in main at vswitchd/ovs-vswitchd.c:113 Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-04 17:35:04 -07:00
Mark Kavanagh	8d38823bdf	netdev-dpdk: fix memory leak DPDK v16.07 introduces the ability to free memzones. Up until this point, DPDK memory pools created in OVS could not be destroyed, thus incurring a memory leak. Leverage the DPDK v16.07 rte_mempool API to free DPDK mempools when their associated reference count reaches 0 (this indicates that the memory pool is no longer in use). Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-04 15:45:48 -07:00
Ciara Loftus	0a0f39df1d	netdev-dpdk: Add support for DPDK 16.07 This commit introduces support for DPDK 16.07 and consequently breaks compatibility with DPDK 16.04. DPDK 16.07 introduces some changes to various APIs. These have been updated in OVS, including: * xstats API: changes to structure of xstats * vhost API: replace virtio-net references with 'vid' Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-03 18:09:37 -07:00
Sugesh Chandran	9fd39370c1	netdev-dpdk: Add Flow Control support. Add support for flow-control(mac control frame) to DPDK enabled physical port types. By default, the flow-control is OFF on both rx and tx side. The flow control can be enabled/disabled either when adding a port to OVS or at run time. For eg: To enable flow control support at tx side while adding a port, add the 'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below. 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true' Similarly to enable rx flow control, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true' And to enable the flow control auto-negotiation, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true' To turn ON the tx flow control at run time(After the port is being added to OVS), the command-line input will be, 'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true' The flow control parameters can be turned off by setting 'false' to the respective parameter. To dsiable the flow control at tx side, 'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false' Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Acked-by: Bhanuprakash Bodireddy <Bhanuprakash.bodireddy@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-29 17:56:32 -07:00
Ilya Maximets	324c837485	dpif-netdev: XPS (Transmit Packet Steering) implementation. If CPU number in pmd-cpu-mask is not divisible by the number of queues and in a few more complex situations there may be unfair distribution of TX queue-ids between PMD threads. For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask such distribution is possible: <------------------------------------------------------------------------> pmd thread numa_id 0 core_id 13: port: vhost-user1 queue-id: 1 port: dpdk0 queue-id: 3 pmd thread numa_id 0 core_id 14: port: vhost-user1 queue-id: 2 pmd thread numa_id 0 core_id 16: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 17: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 12: port: vhost-user1 queue-id: 0 port: dpdk0 queue-id: 2 pmd thread numa_id 0 core_id 15: port: vhost-user1 queue-id: 3 <------------------------------------------------------------------------> As we can see above dpdk0 port polled by threads on cores: 12, 13, 16 and 17. By design of dpif-netdev, there is only one TX queue-id assigned to each pmd thread. This queue-id's are sequential similar to core-id's. And thread will send packets to queue with exact this queue-id regardless of port. In previous example: pmd thread on core 12 will send packets to tx queue 0 pmd thread on core 13 will send packets to tx queue 1 ... pmd thread on core 17 will send packets to tx queue 5 So, for dpdk0 port after truncating in netdev-dpdk: core 12 --> TX queue-id 0 % 4 == 0 core 13 --> TX queue-id 1 % 4 == 1 core 16 --> TX queue-id 4 % 4 == 0 core 17 --> TX queue-id 5 % 4 == 1 As a result only 2 of 4 queues used. To fix this issue some kind of XPS implemented in following way: * TX queue-ids are allocated dynamically. * When PMD thread first time tries to send packets to new port it allocates less used TX queue for this port. * PMD threads periodically performes revalidation of allocated TX queue-ids. If queue wasn't used in last XPS_TIMEOUT_MS milliseconds it will be freed while revalidation. * XPS is not working if we have enough TX queues. Reported-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
xubinbin	b379037079	netdev-dpdk: remove duplicated code in netdev_dpdk_get_status Put "driver_name" into "args" twice, that's meaninglessness. So need to remove duplicated code. Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-25 12:58:27 -07:00
William Tu	7d6d1a40dc	netdev-dpdk: Apply batch truncation API. Instead of looping into each packet and check whether to truncate, the patch moves it out of the loop and uses batch API. If truncation is not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis can skip the per-packet checking overhead. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-25 12:47:20 -07:00
Terry Wilson	ee89ea7b47	json: Move from lib to include/openvswitch. To easily allow both in- and out-of-tree building of the Python wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to include/openvswitch. This also requires moving lib/{hmap,shash}.h. Both hmap.h and shash.h were #include-ing "util.h" even though the headers themselves did not use anything from there, but rather from include/openvswitch/util.h. Fixing that required including util.h in several C files mostly due to OVS_NOT_REACHED and things like xmalloc. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-07-22 17:09:17 -07:00
William Tu	64839cf432	netdev-provider: Apply batch object to netdev provider. Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces batch process functions and 'struct dp_packet_batch' to associate with batch-level metadata. This patch applies the packet batch object to the netdev provider interface (dummy, Linux, BSD, and DPDK) so that batch APIs can be used in providers. With batch metadata visible in providers, optimizations can be introduced at per-batch level instead of per-packet. Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-21 16:46:32 -07:00
Ilya Maximets	81acebdaaf	netdev-dpdk: Obtain number of queues for vhost ports from attached virtio. Currently, there are few inconsistencies in ways to configure number of queues for netdev device: * dpif-netdev can't know about exact number of queues allocated inside netdev. This leads to constant mapping of queue-ids to 'real' ones. * We are able to configure 'n_rxq' for vhost-user devices, but there is only one sane number of rx queues which must be used and configured manually (number of queues that allocated in QEMU). This patch disables configuration of 'n_rxq' for DPDK vHost devices. Configuration of rx and tx queues now automatically applied from connected virtio device. Standard reconfiguration mechanism was used to apply this changes. Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues in the device. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-08 15:27:21 -07:00
Ilya Maximets	b59cc14e03	netdev-dpdk: Use instant sending instead of queueing of packets. Current implementarion of TX packet's queueing is broken in several ways: * TX queue flushing implemented on receive assumes that all core_id-s are sequential and starts from zero. This may lead to situation when packets will stuck in queue forever and, also, this influences on latency. * For a long time flushing logic depends on uninitialized 'txq_needs_locking', because it usually calculated after 'netdev_dpdk_alloc_txq' but used inside of this function for initialization of 'flush_tx'. Testing shows no performance difference with and without queueing. Lets remove queueing at all because it doesn't work properly now and also does not increase performance. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-06 15:46:35 -07:00
Ilya Maximets	1f5b157ece	netdev-dpdk: Fix using uninitialized link_status. 'rte_eth_link_get_nowait()' works only with physical ports. In case of vhost-user port, 'link' will stay uninitialized and there will be random messages in log about link status. Ex.: \|dpdk(dpdk_watchdog2)\|DBG\|Port -1 Link Up - speed 10000 Mbps - full-duplex Fix that by calling 'check_link_status()' only for physical ports. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-24 14:43:29 -07:00
William Tu	aaca4fe0ce	ofp-actions: Add truncate action. The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2016-06-24 09:17:00 -07:00
Ciara Loftus	db8f13b020	netdev-dpdk: NUMA Aware vHost User This commit allows for vHost User memory from QEMU, DPDK and OVS, as well as the servicing PMD, to all come from the same socket. The socket id of a vhost-user port used to be set to that of the master lcore. Now it is possible to update the socket id if it is detected (during VM boot) that the vhost device memory is not on this node. If this is the case, a new mempool is created from the new node, and the PMD thread currently servicing the port will no longer, in favour of a thread from the new node (if enabled in the pmd-cpu-mask). To avail of this functionality, one must enable the CONFIG_RTE_LIBRTE_VHOST_NUMA DPDK configuration option. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-17 17:11:55 -07:00
Kevin Traynor	31871ee383	netdev-dpdk: Remove vhost send retries when no packets have been sent. If the guest is connected but not servicing the virt queue, this leads to vhost send retries until timeout. This is fine in isolation but if there are other high rate queues also being serviced by the same PMD it can lead to a performance hit on those queues. Change to only retry when at least some packets have been successfully sent on the previous attempt. Also, limit retries to avoid a similar delays if packets are being sent at a very low rate due to few available descriptors. Reported-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-14 18:45:03 -07:00
Daniele Di Proietto	6930c7e01c	ovs-numa: Introduce function to set current thread affinity. This commit moves the code that sets the pmd threads affinity from netdev-dpdk to ovs-numa. There's one small part left in netdev-dpdk, to set the lcore_id. Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when starting a pmd thread. This change will allow having a dummy implementation of the set affinity call, for testing purposes. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-06-07 11:15:01 -07:00
Zoltán Balogh	e543851d21	netdev-dpdk: vhost-user port link state fix OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is DOWN, even when traffic is running through the port between a Virtual Machine and the vSwitch. Changing admin state with the "ovs-ofctl mod-port <BR> <PORT> up/down" command over OpenFlow does affect neither the reported link state nor the traffic. The patch below does the flowing: - Triggers link state change by altering netdev's change_seq member. - Controls sending/receiving of packets through vhost-user port according to the port's current admin state. - Sets admin state of newly created vhost-user port to UP. Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-02 14:26:02 -07:00
Ian Stokes	9509913aa7	netdev-dpdk.c: Add ingress-policing functionality. This patch provides the modifications required in netdev-dpdk.c and vswitch.xml to enable ingress policing for DPDK interfaces. This patch implements the necessary netdev functions to netdev-dpdk.c as well as various helper functions required for ingress policing. The vswitch.xml has been modified to explain the expected parameters and behaviour when using ingress policing. The INSTALL.DPDK.md guide has been modified to provide an example configuration of ingress policing. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-24 13:37:25 -07:00
Ian Stokes	f3926f297b	netdev-dpdk.c: Add generic policer functions. Add generic policer functions to avoid code duplication. Policing can be implemented on both egress and ingress paths. Currently the QoS egress-policer implementation uses it's own specific run and packet handle policer functions. This patch makes the policer functions generic so that they can be used regardless of whether the policer is egress or ingress by just requiring a pointer to the rte_meter used for policing to be passed. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-24 13:34:23 -07:00
Daniele Di Proietto	050c60bfb5	netdev-dpdk: Use ->reconfigure() call to change rx/tx queues. This introduces in dpif-netdev and netdev-dpdk the first use for the newly introduce reconfigure netdev call. When a request to change the number of queues comes, netdev-dpdk will remember this and notify the upper layer via netdev_request_reconfigure(). The datapath, instead of periodically calling netdev_set_multiq(), can detect this and call reconfigure(). This mechanism can also be used to: * Automatically match the number of rxq with the one provided by qemu via the new_device callback. * Provide a way to change the MTU of dpdk devices at runtime. * Move a DPDK vhost device to the proper NUMA socket. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	790fb3b745	netdev: Add reconfigure request mechanism. A netdev provider, especially a PMD provider (like netdev DPDK) might not be able to change some of its parameters (such as MTU, or number of queues) without stopping everything and restarting. This commit introduces a mechanism that allows a netdev provider to request a restart (netdev_request_reconfigure()). The upper layer can be notified via netdev_wait_reconf_required() and netdev_is_reconf_required(). After closing all the rxqs the upper layer can finally call netdev_reconfigure(), to make sure that the new configuration is in place. This will be used by next commit to reconfigure rx and tx queues in netdev-dpdk. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-05-23 10:27:42 -07:00

... 2 3 4 5 6 ...

326 Commits