mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-23 10:28:00 +00:00

Author	SHA1	Message	Date
Daniele Di Proietto	3f891bbea6	netdev-dpdk: Fix deadlock in destroy_device(). netdev_dpdk_vhost_destruct() calls rte_vhost_driver_unregister(), which can trigger the destroy_device() callback. destroy_device() will try to take two mutexes already held by netdev_dpdk_vhost_destruct(), causing a deadlock. This problem can be solved by dropping the mutexes before calling rte_vhost_driver_unregister(). The netdev_dpdk_vhost_destruct() and construct() call are already serialized by netdev_mutex. This commit also makes clear that dev->vhost_id is constant and can be accessed without taking any mutexes in the lifetime of the devices. Fixes: 8d38823bdf8b("netdev-dpdk: fix memory leak") Reported-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-08-09 11:15:29 -07:00
Ben Pfaff	13c1637f5b	smap: New function smap_get_ullong(). Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>	2016-08-08 11:00:37 -07:00
Maxime Coquelin	d03603c485	netdev-dpdk: When no QoS set, set type to empty string This patch sets *typep to an empty string instead of letting it uninitialized when no QoS configuration is set. It fixes the following vswitchd crash when no QoS has been set on vhost-user interface: $> ovs-appctl -t ovs-vswitchd qos/show vhost-user1 #0 0x00007efcbadf18d7 in raise () from /lib64/libc.so.6 #1 0x00007efcbadf353a in abort () from /lib64/libc.so.6 #2 0x000000000068d5be in ovs_abort_valist at lib/util.c:335 #3 0x0000000000693d90 in vlog_abort_valist at lib/vlog.c:1204 #4 0x0000000000693e17 in vlog_abort at lib/vlog.c:1218 #5 0x000000000068d3ae in ovs_assert_failure at lib/util.c:72 #6 0x000000000060425c in ds_put_format_valist at lib/dynamic-string.c:168 #7 0x00000000006042e7 in ds_put_format at lib/dynamic-string.c:142 #8 0x00000000005a9e75 in qos_unixctl_show at vswitchd/bridge.c:3185 #9 0x000000000068cda1 in process_command at lib/unixctl.c:347 #11 unixctl_server_run at lib/unixctl.c:400 #12 0x000000000040a3ff in main at vswitchd/ovs-vswitchd.c:113 Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Acked-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-04 17:35:04 -07:00
Mark Kavanagh	8d38823bdf	netdev-dpdk: fix memory leak DPDK v16.07 introduces the ability to free memzones. Up until this point, DPDK memory pools created in OVS could not be destroyed, thus incurring a memory leak. Leverage the DPDK v16.07 rte_mempool API to free DPDK mempools when their associated reference count reaches 0 (this indicates that the memory pool is no longer in use). Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-04 15:45:48 -07:00
Ciara Loftus	0a0f39df1d	netdev-dpdk: Add support for DPDK 16.07 This commit introduces support for DPDK 16.07 and consequently breaks compatibility with DPDK 16.04. DPDK 16.07 introduces some changes to various APIs. These have been updated in OVS, including: * xstats API: changes to structure of xstats * vhost API: replace virtio-net references with 'vid' Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com> Tested-by: Robert Wojciechowicz <robertx.wojciechowicz@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-08-03 18:09:37 -07:00
Sugesh Chandran	9fd39370c1	netdev-dpdk: Add Flow Control support. Add support for flow-control(mac control frame) to DPDK enabled physical port types. By default, the flow-control is OFF on both rx and tx side. The flow control can be enabled/disabled either when adding a port to OVS or at run time. For eg: To enable flow control support at tx side while adding a port, add the 'tx-flow-ctrl' option to the 'ovs-vsctl add-port' command-line as below. 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true' Similarly to enable rx flow control, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true' And to enable the flow control auto-negotiation, 'ovs-vsctl add-port br0 dpdk0 -- \ set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true' To turn ON the tx flow control at run time(After the port is being added to OVS), the command-line input will be, 'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true' The flow control parameters can be turned off by setting 'false' to the respective parameter. To dsiable the flow control at tx side, 'ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false' Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com> Acked-by: Bhanuprakash Bodireddy <Bhanuprakash.bodireddy@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-29 17:56:32 -07:00
Ilya Maximets	324c837485	dpif-netdev: XPS (Transmit Packet Steering) implementation. If CPU number in pmd-cpu-mask is not divisible by the number of queues and in a few more complex situations there may be unfair distribution of TX queue-ids between PMD threads. For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask such distribution is possible: <------------------------------------------------------------------------> pmd thread numa_id 0 core_id 13: port: vhost-user1 queue-id: 1 port: dpdk0 queue-id: 3 pmd thread numa_id 0 core_id 14: port: vhost-user1 queue-id: 2 pmd thread numa_id 0 core_id 16: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 17: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 12: port: vhost-user1 queue-id: 0 port: dpdk0 queue-id: 2 pmd thread numa_id 0 core_id 15: port: vhost-user1 queue-id: 3 <------------------------------------------------------------------------> As we can see above dpdk0 port polled by threads on cores: 12, 13, 16 and 17. By design of dpif-netdev, there is only one TX queue-id assigned to each pmd thread. This queue-id's are sequential similar to core-id's. And thread will send packets to queue with exact this queue-id regardless of port. In previous example: pmd thread on core 12 will send packets to tx queue 0 pmd thread on core 13 will send packets to tx queue 1 ... pmd thread on core 17 will send packets to tx queue 5 So, for dpdk0 port after truncating in netdev-dpdk: core 12 --> TX queue-id 0 % 4 == 0 core 13 --> TX queue-id 1 % 4 == 1 core 16 --> TX queue-id 4 % 4 == 0 core 17 --> TX queue-id 5 % 4 == 1 As a result only 2 of 4 queues used. To fix this issue some kind of XPS implemented in following way: * TX queue-ids are allocated dynamically. * When PMD thread first time tries to send packets to new port it allocates less used TX queue for this port. * PMD threads periodically performes revalidation of allocated TX queue-ids. If queue wasn't used in last XPS_TIMEOUT_MS milliseconds it will be freed while revalidation. * XPS is not working if we have enough TX queues. Reported-by: Zhihong Wang <zhihong.wang@intel.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-27 12:56:04 -07:00
xubinbin	b379037079	netdev-dpdk: remove duplicated code in netdev_dpdk_get_status Put "driver_name" into "args" twice, that's meaninglessness. So need to remove duplicated code. Signed-off-by: Binbin Xu <xu.binbin1@zte.com.cn> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-25 12:58:27 -07:00
William Tu	7d6d1a40dc	netdev-dpdk: Apply batch truncation API. Instead of looping into each packet and check whether to truncate, the patch moves it out of the loop and uses batch API. If truncation is not set, checking 'trunc' in 'struct dp_packet_batch' at per-batch basis can skip the per-packet checking overhead. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-25 12:47:20 -07:00
Terry Wilson	ee89ea7b47	json: Move from lib to include/openvswitch. To easily allow both in- and out-of-tree building of the Python wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to include/openvswitch. This also requires moving lib/{hmap,shash}.h. Both hmap.h and shash.h were #include-ing "util.h" even though the headers themselves did not use anything from there, but rather from include/openvswitch/util.h. Fixing that required including util.h in several C files mostly due to OVS_NOT_REACHED and things like xmalloc. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-07-22 17:09:17 -07:00
William Tu	64839cf432	netdev-provider: Apply batch object to netdev provider. Commit 1895cc8dbb64 ("dpif-netdev: create batch object") introduces batch process functions and 'struct dp_packet_batch' to associate with batch-level metadata. This patch applies the packet batch object to the netdev provider interface (dummy, Linux, BSD, and DPDK) so that batch APIs can be used in providers. With batch metadata visible in providers, optimizations can be introduced at per-batch level instead of per-packet. Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/145694197 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-21 16:46:32 -07:00
Ilya Maximets	81acebdaaf	netdev-dpdk: Obtain number of queues for vhost ports from attached virtio. Currently, there are few inconsistencies in ways to configure number of queues for netdev device: * dpif-netdev can't know about exact number of queues allocated inside netdev. This leads to constant mapping of queue-ids to 'real' ones. * We are able to configure 'n_rxq' for vhost-user devices, but there is only one sane number of rx queues which must be used and configured manually (number of queues that allocated in QEMU). This patch disables configuration of 'n_rxq' for DPDK vHost devices. Configuration of rx and tx queues now automatically applied from connected virtio device. Standard reconfiguration mechanism was used to apply this changes. Also, now 'n_txq' and 'n_rxq' are always the real numbers of queues in the device. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-08 15:27:21 -07:00
Ilya Maximets	b59cc14e03	netdev-dpdk: Use instant sending instead of queueing of packets. Current implementarion of TX packet's queueing is broken in several ways: * TX queue flushing implemented on receive assumes that all core_id-s are sequential and starts from zero. This may lead to situation when packets will stuck in queue forever and, also, this influences on latency. * For a long time flushing logic depends on uninitialized 'txq_needs_locking', because it usually calculated after 'netdev_dpdk_alloc_txq' but used inside of this function for initialization of 'flush_tx'. Testing shows no performance difference with and without queueing. Lets remove queueing at all because it doesn't work properly now and also does not increase performance. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-07-06 15:46:35 -07:00
Ilya Maximets	1f5b157ece	netdev-dpdk: Fix using uninitialized link_status. 'rte_eth_link_get_nowait()' works only with physical ports. In case of vhost-user port, 'link' will stay uninitialized and there will be random messages in log about link status. Ex.: \|dpdk(dpdk_watchdog2)\|DBG\|Port -1 Link Up - speed 10000 Mbps - full-duplex Fix that by calling 'check_link_status()' only for physical ports. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-24 14:43:29 -07:00
William Tu	aaca4fe0ce	ofp-actions: Add truncate action. The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>	2016-06-24 09:17:00 -07:00
Ciara Loftus	db8f13b020	netdev-dpdk: NUMA Aware vHost User This commit allows for vHost User memory from QEMU, DPDK and OVS, as well as the servicing PMD, to all come from the same socket. The socket id of a vhost-user port used to be set to that of the master lcore. Now it is possible to update the socket id if it is detected (during VM boot) that the vhost device memory is not on this node. If this is the case, a new mempool is created from the new node, and the PMD thread currently servicing the port will no longer, in favour of a thread from the new node (if enabled in the pmd-cpu-mask). To avail of this functionality, one must enable the CONFIG_RTE_LIBRTE_VHOST_NUMA DPDK configuration option. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-17 17:11:55 -07:00
Kevin Traynor	31871ee383	netdev-dpdk: Remove vhost send retries when no packets have been sent. If the guest is connected but not servicing the virt queue, this leads to vhost send retries until timeout. This is fine in isolation but if there are other high rate queues also being serviced by the same PMD it can lead to a performance hit on those queues. Change to only retry when at least some packets have been successfully sent on the previous attempt. Also, limit retries to avoid a similar delays if packets are being sent at a very low rate due to few available descriptors. Reported-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-14 18:45:03 -07:00
Daniele Di Proietto	6930c7e01c	ovs-numa: Introduce function to set current thread affinity. This commit moves the code that sets the pmd threads affinity from netdev-dpdk to ovs-numa. There's one small part left in netdev-dpdk, to set the lcore_id. Now dpif-netdev will call both modules (ovs-numa and netdev-dpdk) when starting a pmd thread. This change will allow having a dummy implementation of the set affinity call, for testing purposes. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-06-07 11:15:01 -07:00
Zoltán Balogh	e543851d21	netdev-dpdk: vhost-user port link state fix OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is DOWN, even when traffic is running through the port between a Virtual Machine and the vSwitch. Changing admin state with the "ovs-ofctl mod-port <BR> <PORT> up/down" command over OpenFlow does affect neither the reported link state nor the traffic. The patch below does the flowing: - Triggers link state change by altering netdev's change_seq member. - Controls sending/receiving of packets through vhost-user port according to the port's current admin state. - Sets admin state of newly created vhost-user port to UP. Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-06-02 14:26:02 -07:00
Ian Stokes	9509913aa7	netdev-dpdk.c: Add ingress-policing functionality. This patch provides the modifications required in netdev-dpdk.c and vswitch.xml to enable ingress policing for DPDK interfaces. This patch implements the necessary netdev functions to netdev-dpdk.c as well as various helper functions required for ingress policing. The vswitch.xml has been modified to explain the expected parameters and behaviour when using ingress policing. The INSTALL.DPDK.md guide has been modified to provide an example configuration of ingress policing. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-24 13:37:25 -07:00
Ian Stokes	f3926f297b	netdev-dpdk.c: Add generic policer functions. Add generic policer functions to avoid code duplication. Policing can be implemented on both egress and ingress paths. Currently the QoS egress-policer implementation uses it's own specific run and packet handle policer functions. This patch makes the policer functions generic so that they can be used regardless of whether the policer is egress or ingress by just requiring a pointer to the rte_meter used for policing to be passed. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-24 13:34:23 -07:00
Daniele Di Proietto	050c60bfb5	netdev-dpdk: Use ->reconfigure() call to change rx/tx queues. This introduces in dpif-netdev and netdev-dpdk the first use for the newly introduce reconfigure netdev call. When a request to change the number of queues comes, netdev-dpdk will remember this and notify the upper layer via netdev_request_reconfigure(). The datapath, instead of periodically calling netdev_set_multiq(), can detect this and call reconfigure(). This mechanism can also be used to: * Automatically match the number of rxq with the one provided by qemu via the new_device callback. * Provide a way to change the MTU of dpdk devices at runtime. * Move a DPDK vhost device to the proper NUMA socket. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>	2016-05-23 10:27:42 -07:00
Daniele Di Proietto	790fb3b745	netdev: Add reconfigure request mechanism. A netdev provider, especially a PMD provider (like netdev DPDK) might not be able to change some of its parameters (such as MTU, or number of queues) without stopping everything and restarting. This commit introduces a mechanism that allows a netdev provider to request a restart (netdev_request_reconfigure()). The upper layer can be notified via netdev_wait_reconf_required() and netdev_is_reconf_required(). After closing all the rxqs the upper layer can finally call netdev_reconfigure(), to make sure that the new configuration is in place. This will be used by next commit to reconfigure rx and tx queues in netdev-dpdk. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-05-23 10:27:42 -07:00
Kevin Traynor	d8e2f4ccf2	netdev-dpdk: Improve pthread_getaffinity_np() fail handling. Prevent pthread_setaffinity_np() being called with a potentially invalid cpu_set_t and add a default (core 0x1). Also, only call pthread_getaffinity_np() if no dpdk-lcore-mask specified. Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-20 18:04:22 -07:00
Kevin Traynor	30149e2973	netdev-dpdk: Fix coremask logic. Only set the thread affinity back to the pre rte_eal_init() value when the user has not specified a coremask. Fixes: 88964e6428dc("netdev-dpdk: Autofill lcore coremask if absent") Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-20 18:04:05 -07:00
Joe Stringer	f925682252	netdev-dpdk: Fix locking during get_stats. Clang complains: lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every path through here [-Werror,-Wthread-safety-analysis] } ^ lib/netdev-dpdk.c:1815:5: note: mutex acquired here ovs_mutex_lock(&dev->mutex); ^ ./include/openvswitch/thread.h:60:9: note: expanded from macro 'ovs_mutex_lock' ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR) ^ Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on RFC 2819.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-05-13 13:44:41 -07:00
Ciara Loftus	6eb51f9a45	netdev-dpdk: Print default vhost-sock-dir value & update documentation When no vhost-sock-dir value is provided, print the default location. Update the documentation to reflect the fact that vhost-sock-dir values are now subdirectory loctions rather than full paths. Fixes: d8a8f353c23e ("netdev-dpdk: Restrict vhost_sock_dir") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-05-06 15:37:50 -07:00
mweglicx	d6e3feb57c	Add support for extended netdev statistics based on RFC 2819. Implementation of new statistics extension for DPDK ports: - Add new counters definition to netdev struct and open flow, based on RFC2819. - Initialize netdev statistics as "filtered out" before passing it to particular netdev implementation (because of that change, statistics which are not collected are reported as filtered out, and some unit tests were modified in this respect). - New statistics are retrieved using experimenter code and are printed as a result to ofctl dump-ports. - New counters are available for OpenFlow 1.4+. - Add new vendor id: INTEL_VENDOR_ID. - New statistics are printed to output via ofctl only if those are present in reply message. - Add new file header: include/openflow/intel-ext.h which contains new statistics definition. - Extended statistics are implemented only for dpdk-physical and dpdk-vhost port types. - Dpdk-physical implementation uses xstats to collect statistics. - Dpdk-vhost implements only part of statistics (RX packet sized based counters). Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> [blp@ovn.org made software devices more consistent] Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-05-06 15:28:56 -07:00
Aaron Conole	52a57e36af	netdev-dpdk: Check dpdk-extra when reading db A previous patch introduced the ability to pass arbitrary EAL command line options via the dpdk_extras database entry. This commit enhances that by warning the user when such a configuration is detected and prefering the value in the database. Suggested-by: Sean K Mooney <sean.k.mooney@intel.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Tested-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-29 15:07:39 -07:00
Aaron Conole	eac84432a4	netdev-dpdk: Allow arbitrary eal arguments A previous change moved some commonly used arguments from commandline to the database, and with it the ability to pass arbitrary arguments to EAL. This change allows arbitrary eal arguments to be provided via a new db entry 'other_config:dpdk-extra' which will tokenize the string and add it to the argument list. The only argument which will not be supported with this change is '--no-huge', which appears to break the system in other ways. Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com> Tested-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-29 15:07:39 -07:00
Aaron Conole	88964e6428	netdev-dpdk: Autofill lcore coremask if absent The user has control over the DPDK internal lcore coremask, but this parameter can be autofilled with a bit more intelligence. If the user does not fill this parameter in, we use the lowest set bit in the current task CPU affinity. Otherwise, we will reassign the current thread to the specified lcore mask, in addition to the dpdk lcore threads. Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com> Tested-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-29 15:07:39 -07:00
Aaron Conole	d8a8f353c2	netdev-dpdk: Restrict vhost_sock_dir Since the vhost-user sockets directory now comes from the database, it is possible for any user with database access to program an arbitrary filesystem location for the sockets directory. This could result in unprivileged users creating or deleting arbitrary filesystem files by using specially crafted names. To prevent this, 'vhost-sock-dir' is now relative to ovs_rundir() and must not contain "..". Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-29 15:07:39 -07:00
Aaron Conole	bab6940971	netdev-dpdk: Convert initialization from cmdline to db Existing DPDK integration is provided by use of command line options which must be split out and passed to librte in a special manner. However, this forces any configuration to be passed by way of a special DPDK flag, and interferes with ovs+dpdk packaging solutions. This commit delays dpdk initialization until after the OVS database connection is established, at which point ovs initializes librte. It pulls all of the config data from the OVS database, and assembles a new argv/argc pair to be passed along. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-29 15:07:39 -07:00
Aaron Conole	563c98d86e	netdev-dpdk: Restore thread affinity after DPDK init When the DPDK init function is called, it changes the executing thread's CPU affinity to a single core specified in -c. This will result in the userspace bridge configuration thread being rebound, even if that is not the intent. This change fixes that behavior by rebinding to the original thread affinity after calling dpdk_init(). Co-authored-by: Kevin Traynor <kevin.traynor@intel.com> Signed-off-by: Kevin Traynor <kevin.traynor@intel.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz@intel.com> Tested-by: Sean K Mooney <sean.k.mooney@intel.com> Acked-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-29 15:07:39 -07:00
mweglicx	362ca39639	Update relevant artifacts to add support for DPDK 16.04. Following changes are applied: - INSTALL.DPDK.md: CONFIG_RTE_BUILD_COMBINE_LIBS step has been removed because it is no longer present in DPDK configuration (combined library is created by default), - INSTALL.DPDK.md: VHost Cuse configuration is updated, - netdev-dpdk.c: Link speed definition is changed in DPDK and netdev_dpdk_get_features is updated accordingly, - netdev-dpdk.c: TSO and checksum offload has been disabled for vhostuser device. - .travis/linux-build.sh: DPDK version is updated and legacy flags have been removed in configuration. Signed-off-by: Michal Weglicki <michalx.weglicki@intel.com> Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-04-15 14:46:58 -07:00
Ben Warren	25d436fbd4	Move lib/ofp-print.h to include/openvswitch directory Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-04-14 16:38:32 -07:00
Daniele Di Proietto	d46285a220	netdev-dpdk: Consistent variable naming. In different functions we use different variable names ('netdev_', 'netdev', 'dev', 'vhost_dev', ...) for the same objects. This commit changes the code to comply with the following convention: 'struct netdev':'netdev' 'struct netdev_dpdk':'dev' 'struct virtio_net':'virtio_dev' 'struct netdev_rxq':'rxq' 'struct netdev_rxq_dpdk':'rx' Also, 'dev->up.' is replaced by 'netdev->', where 'netdev' was already defined. Suggested-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>	2016-04-07 18:59:30 -07:00
Ben Warren	417e7e66e1	list: Rename all functions in list.h with ovs_ prefix. This attempts to prevent namespace collisions with other list libraries Signed-off-by: Ben Warren <ben@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-30 13:04:32 -07:00
Ben Warren	b19bab5b20	list: Remove lib/list.h completely. All code is now in include/openvswitch/list.h. Signed-off-by: Ben Warren <ben@skyportsystems.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>	2016-03-30 13:01:21 -07:00
Ilya Maximets	f3ea2ad27f	netdev-dpdk: vhost: Fix txq enabling in the absence of notifications. According to QEMU documentation (docs/specs/vhost-user.txt) one queue should be enabled initially. More queues are enabled dynamically, by sending message VHOST_USER_SET_VRING_ENABLE. Currently all queues in OVS disabled by default. This breaks above specification. So, queue #0 should be enabled by default to support QEMU versions less than 2.5 and fix probable issues if QEMU will not send VHOST_USER_SET_VRING_ENABLE for queue #0 according to documentation. Also this will fix currently broken vhost-cuse support in OVS. Fixes: 585a5beaa2a4 ("netdev-dpdk: vhost-user: Fix sending packets to queues not enabled by guest.") Reported-by: Mauricio Vasquez B <mauricio.vasquezbernal@studenti.polito.it> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-03-29 13:46:11 -07:00
Pravin B Shelar	6b6e13293e	netdev: remove netdev_get_in4() Since netdev can have multiple IP address use generic api netdev_get_addr_list(). This also make it easier to handle IPv4 and IPv6 address across vswitchd layers. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-03-24 09:30:57 -07:00
Pravin B Shelar	a8704b5027	tunneling: Handle multiple ip address for given device. Device can have multiple IP address but netdev_get_in4/6() returns only one configured IPv6 address. Following patch fixes it. OVS router is also updated to return source ip address for given destination, This is required when interface has multiple IP address configured. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>	2016-03-24 09:30:57 -07:00
Ilya Maximets	c62da69597	netdev-dpdk: Fix crash when changing the vhost-user port. According to netdev-provider API: 'The "destruct" function is not allowed to fail.' netdev-dpdk breaks this restriction for vhost-user ports. This leads to SIGABRT or SIGSEGV in dpdk_watchdog thread because 'dealloc' will be called anyway indifferently to result of 'destruct'. For example, if we call # ovs-vsctl set interface vhost1 ofport_request=5 while QEMU still attached, we'll get: ------------------[cut]------------------ \|dpdk\|ERR\|Can not remove port, vhost device still attached VHOST_CONFIG: socket created, fd:98 VHOST_CONFIG: fail to bind fd:98, remove file:/home/vhost1 and try again. \|dpdk\|ERR\|vhost-user socket device setup failure for socket /home/vhost1 \|bridge\|WARN\|could not open network device vhost1 (Unknown error -1) ovs-vswitchd(dpdk_watchdog1): lib/netdev-dpdk.c:532: ovs_mutex_lock_at() passed uninitialized ovs_mutex Program received signal SIGABRT, Aborted. ------------------[cut]------------------ Fix that by removing port anyway even when guest is still attached. Guest becomes an orphan in that case but OVS will not crash and will continue forwarding for other ports. VM restart required to restore connectivity. Fixes: 58397e6c1e6c ("netdev-dpdk: add dpdk vhost-cuse ports") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Kevin Traynor <kevin.traynor@intel.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-03-22 18:27:11 -07:00
Ilya Maximets	118c77b1a8	netdev: New field 'is_pmd' in netdev_class. Made to simplify creation of derived classes. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-03-16 17:03:07 -07:00
Yuanhan Liu	b00b4a8149	netdev-dpdk: fix mbuf leaks mbufs could be chained (by the "next" field of rte_mbuf struct), when an mbuf is not big enough to hold a big packet, say when TSO is enabled. rte_pktmbuf_free_seg() frees the head mbuf only, leading mbuf leaks. This patch fix it by invoking the right API rte_pktmbuf_free(), to free all mbufs in the chain. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-03-09 17:28:21 -08:00
Ilya Maximets	6019cb6395	netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct(). Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-03-05 20:15:25 -08:00
Ian Stokes	0bf765f753	netdev_dpdk.c: Add QoS functionality. This patch provides the modifications required in netdev-dpdk.c and vswitch.xml to allow for a DPDK user space QoS algorithm. This patch adds a QoS configuration structure for netdev-dpdk and expected QoS operations 'dpdk_qos_ops'. Various helper functions are also supplied. Also included are the modifications required for vswitch.xml to allow a new QoS implementation for netdev-dpdk devices. This includes a new QoS type `egress-policer` as well as its expected QoS table entries. The QoS functionality implemented for DPDK devices is `egress-policer`. This can be used to drop egress packets at a configurable rate. The INSTALL.DPDK.md guide has also been modified to provide an example configuration of `egress-policer` QoS. Signed-off-by: Ian Stokes <ian.stokes@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-03-02 17:46:27 -08:00
Mark Kavanagh	4be4d22c33	netdev-dpdk: clean up mbuf initialization Current mbuf initialization relies on magic numbers and does not accomodate mbufs of different sizes. Resolve this issue by ensuring that mbufs are always aligned to a 1k boundary (a typical DPDK NIC Rx buffer alignment). Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-28 11:19:37 -08:00
Ilya Maximets	585a5beaa2	netdev-dpdk: vhost-user: Fix sending packets to queues not enabled by guest. Currently virtio driver in guest operating system have to be configured to use exactly same number of queues. If number of queues will be less, some packets will get stuck in queues unused by guest and will not be received. Fix that by using new 'vring_state_changed' callback, which is available for vhost-user since DPDK 2.2. Implementation uses additional mapping from configured tx queues to enabled by virtio driver. This requires mandatory locking of TX queues in __netdev_dpdk_vhost_send(), but this locking was almost always anyway because of calling set_multiq with n_txq = 'ovs_numa_get_n_cores() + 1'. OVS_VHOST_MAX_QUEUE_NUM = 1024 chosen based on the fact that this is the maximum number of queues supported by QEMU. Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>	2016-02-24 13:34:10 -08:00
Daniele Di Proietto	1af27e8a4e	netdev-dpdk: Do not add vhost-user ports with '/' or '\' in name. This check prevents an obvious way for a vhost-user socket to escape the intended directory. There might be other ways to escape the directory (none comes to mind at the moment), but this is a problem that should be properly solved by mandatory access control. A similar check is done for a bridge name, since that name is used as part of a socket as well. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Flavio Leitner <fbl@sysclose.org>	2016-02-23 18:34:37 -08:00

1 2 3 4 5

249 Commits