mir/ovs - ovs - Mike's Git repositories

mir/ovs

mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

Author	SHA1	Message	Date
Alex Wang	5a03406477	dpif-netdev: Create multiple tx/rx queues when adding dpdk interface. Before this commit, ovs creates one tx and one rx queue for each dpdk interface and uses only one poll thread for handling I/O of all dpdk interfaces. An upcoming patch will allow multiple poll threads be created. As a preparation, this commit changes the dpif-netdev to create multiple tx/rx queues when the dpdk interface is added. Specifically, the number of rx queues will still be one per-dpdk interface for this commit. But upcoming work will allow user create multiple rx queues. The number of tx queues will be the number of cpu cores on the machine. Although not all the tx queues will be used, each poll thread will have its own queue for transmission on the dpdk interface. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-15 11:43:48 -07:00
Alex Wang	5496878cbf	netdev: Add function for configuring tx and rx queues. This commit adds a new API to the 'struct netdev_class' which allows user to configure the number of tx queues and rx queues of 'netdev'. Upcoming patches will use this function to set multiple tx/rx queues when adding the netdev to dpif-netdev. Currently, only netdev-dpdk module implements this function. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-15 11:43:48 -07:00
Pravin B Shelar	2f9dd77fcd	ofproto: Do not update stats on fake bond interface. There are couple of reasons to remove this support: * This is used in very old OVS use-case. It is much better to read stats directly from OVS. * Forthcoming commit will remove support for setting stats for vport. The stats update depends on stats-set. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-09-15 10:08:56 -07:00
Alex Wang	f00fa8cbad	netdev: Add n_txq to 'struct netdev'. This commit adds new variable n_txq to 'struct netdev' for recording the number of tx queues. Correspondingly, the send_() functions are extended to accept queue id as input argument. All 'netdev-' implementation will ignore the queue id since having multiple tx queues is not supported. Upcomping patches will start using it and create multiple tx queues for dpdk netdev. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-12 11:30:58 -07:00
Alex Wang	7dec44fe1c	netdev: Add function for getting the numa node id of netdev. This commit adds a new API to the 'struct netdev_class' which allows user to query the numa node id the 'netdev' is on. Currently, only netdev-dpdk module implements this function. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-12 11:30:58 -07:00
Alex Wang	e0a801c7fd	netdev-dpdk: Show interface status for dpdk0. This commit fixes a bug which prevents the display of interface status for dpdk0. Found by inspection. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-02 21:48:28 -07:00
Alex Wang	34631d72fb	netdev-dpdk: Make memory pool name contain the socket id. This commit makes the memory pool name contain the socket id. Since dpdk library do not allow creation of memory pool with same name, this commit serves as a simple way of making each name unique. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-09-02 15:46:57 -07:00
Daniele Di Proietto	61a2647e15	packet-dpif: Add dpif_packet_{get, set}_hash() These function are used to stored the packet hash. 'netdev-dpdk' automatically set this value to the RSS hash returned by the NIC. Other 'netdev's set it to 0 (which is an invalid hash value), so that callers can compute the hash on their own. If DPDK support is enabled, struct dpif_packet's member 'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used This commit also configure DPDK devices to compute RSS hash for UDP and IPv6 packets Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-29 16:32:21 -07:00
Daniele Di Proietto	58f7c37b1f	netdev-dpdk: Use different constant for ring size DPDK rings must have a power-of-two size. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-29 15:48:59 -07:00
Daniele Di Proietto	1304f1f8a7	netdev-dpdk: Keep calling rte_eth_tx_burst() until it returns 0 rte_eth_tx_burst() _should_ transmit every packet that it is passed unless the queue is full. Nontheless some implementation of rte_eth_tx_burst (e.g. ixgbe_xmit_pkts_vec()) does not transmit more than a fixed number (32) of packets at a time. With this commit we assume that there's an error only if rte_eth_tx_burst returns 0. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-12 17:38:48 -07:00
Daniele Di Proietto	d731058395	netdev-dpdk: Move to DPDK 1.7.0 With this commit we move our DPDK support to 1.7.0. DPDK binaries (starting with dpdk 1.7.0) should be linked with --whole-archive to include pmd drivers Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-08-12 17:38:48 -07:00
Ethan Jackson	645b893423	style: Replace TODO with XXX. In accordance with CodingStyle. Signed-off-by: Ethan Jackson <ethan@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-08-05 14:13:20 -07:00
Daniele Di Proietto	3a10026527	netdev-dpdk: Increase tx queue size and rx batch size These values has been found to give the best throughput in simple cases (1 flow 64 bytes UDP packets). Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-22 20:06:02 -07:00
Daniele Di Proietto	db73f7166a	netdev-dpdk: Fix race condition with DPDK mempools in non pmd threads DPDK mempools rely on rte_lcore_id() to implement a thread-local cache. Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to the "thread-local" cache, causing crashes. This commit resolves the issue with the following changes: - Every non pmd thread has the same lcore_id (0, for management reasons), which is not shared with any pmd thread (lcore_id for pmd threads now start from 1) - DPDK mbufs must be allocated/freed in pmd threads. When there is the need to use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be held. - The previous change does not allow us anymore to pass DPDK mbufs to handler threads: therefore this commit partially revert 143859ec63d45e. Now packets are copied for upcall processing. We can remove the extra memcpy by processing upcalls in the pmd thread itself. With the introduction of the extra locking, the packet throughput will be lower in the following cases: - When using internal (tap) devices with DPDK devices on the same datapath. Anyway, to support internal devices efficiently, we needed DPDK KNI devices, which will be proper pmd devices and will not need this locking. - When packets are processed in the slow path by non pmd threads. This overhead can be avoided by handling the upcalls directly in pmd threads (a change that has already been proposed by Ryan Wilson) Also, the following two fixes have been introduced: - In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put(). This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option - Do not bulk free mbufs in a transmission queue. They may belong to different mempools Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-20 10:13:22 -07:00
Daniele Di Proietto	033e9df25f	netdev-dpdk: Refactor dpdk_class_init() The following changes were made: - Since we have two dpdk classes, we should split the initial operations needed by both classes from the initialization needed by each class. - The dpdk_ring class does not need an initialization function: it has been removed. This also prevents many testcase from failing, because dpdk_ring_class_init() was printing an unexpected log message (OVS_VSWITCHD_START at tests/ofproto-macros.at:54 check for a specific set of startup log messages) - If the user doesn't pass the --dpdk option we do not register the dpdk* classes - Do not call VLOG_ERR if there are 0 dpdk ethernet device. OVS can now be used with dpdk_ring devices. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-17 11:16:32 -07:00
maryam.tahhan	95fb793ae7	netdev-dpdk: add dpdk rings to netdev-dpdk Shared memory ring patch This patch enables the client dpdk rings within the netdev-dpdk. It adds a new dpdk device called dpdkr (other naming suggestions?). This allows for the use of shared memory to communicate with other dpdk applications, on the host or within a virtual machine. Instructions for use are in INSTALL.DPDK. This has been tested on Intel multi-core platforms and with the client application within the host. Signed-off-by: Gerald Rogers <gerald.rogers@intel.com> Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-07-16 09:43:15 -07:00
Ryan Wilson	f98d78641c	netdev-dpdk: Add OVS_UNLIKELY annotations in dpdk_do_tx_copy(). Since dropped packets due to large packet size or lack of memory are unlikely, it is best to add OVS_UNLIKELY annotations to these conditions. Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-30 13:48:49 -07:00
Ryan Wilson	175cf4de3f	netdev-dpdk: Fix memory leak in dpdk_do_tx_copy(). This patch fixes a bug where rte_pktmbuf_alloc() would fail and packets which succeeded to allocate memory with rte_pktmbuf_alloc() would not be sent and leak memory. Also, as a byproduct of using a local variable to record dropped packets, this reduces the locking of the netdev's mutex when multiple packets are dropped in dpdk_do_tx_copy(). Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-30 10:53:58 -07:00
Ryan Wilson	844f2d749a	netdev-dpdk: Set current timestamp when flushing TX queue. The current timestamp should be set every time the queue is flushed. Thus, if DRAIN_TSC timer cycles have passed since the last timestamp, the send queue should be flushed again. Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-30 10:53:58 -07:00
Ryan Wilson	b170db2aaa	netdev-dpdk: Refactor dpdk_queue_flush(). This patch refactors dpdk_queue_flush() to reuse code in dpdk_queue_pkts(). Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-30 10:53:53 -07:00
Polehn, Mike A	79f5354c4c	dpdk: High speed PMD physical NIC queue size Large TX and RX queues are needed for high speed 10 GbE physical NICS. Observed a 250% zero loss improvement over small NIC queue test for port to port flow test. Signed-off-by: Mike A. Polehn <mike.a.polehn@intel.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-27 09:55:51 -07:00
Daniele Di Proietto	9477751009	netdev-dpdk: Disable NIC offloading and multiseg mbufs We do not use any offloading (now) or multiple segments per packet, so we might as well disable those features while configuring the NIC. This could give performance improvements. For ixgbe, for example, this change allows the driver to use a simpler tx routine, resulting in throuput improvements (~7.5%) Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-25 10:54:04 -07:00
Daniele Di Proietto	a28ddd11c6	netdev-dpdk: Fix coding style in TX/RX conf structs Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-25 10:54:02 -07:00
Daniele Di Proietto	1ebfe1ac52	netdev-dpdk: Count and delete every dropped packet Commit f4fd623c4c25 introduced a bug in netdev_dpdk_send(): if multiple consecutive packets exceed MTU, only the first one is deleted and counted. This should fix the bug Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-25 10:54:00 -07:00
Pravin B Shelar	e381def971	lib: Rename ofp to buf. dpif-packet contains ofpbuf which points to packet data. Here buf is better name rather than ofp. Following patch renames all remaining instances of ofp variable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>	2014-06-25 09:28:42 -07:00
Ben Pfaff	451450fa4b	netdev-dpdk: Coding style improvements. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Pritesh Kothari <pritesh.kothari@cisco.com>	2014-06-24 12:36:48 -07:00
Daniele Di Proietto	f4fd623c4c	netdev: netdev_send accepts multiple packets The netdev_send function has been modified to accept multiple packets, to allow netdev providers to amortize locking and queuing costs. This is especially true for netdev-dpdk. Later commits exploit the new API. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-23 14:41:13 -07:00
Daniele Di Proietto	910885540a	dpif-netdev: use dpif_packet structure for packets This commit introduces a new data structure used for receiving packets from netdevs and passing them to dpifs. The purpose of this change is to allow storing some private data for each packet. The subsequent commits make use of it. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-23 14:41:12 -07:00
Daniele Di Proietto	9441caf372	vswitchd: skip right number of arguments in dpdk_init() rte_eal_init() returns the number of parsed dpdk arguments to skip. dpdk_init() should add 1 to that number, because it has already skipped the "--dpdk" argument itself. This patch also makes sure the program name is ovs-vswitchd in rte_eal_init() and proctitle_init(). Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-23 14:41:09 -07:00
Ryan Wilson	143859ec63	dpif-netdev: Upcall: Remove an extra memcpy of packet data. When a bridge of datatype type netdev receives a packet, it copies the packet from the NIC to a buffer in userspace. Currently, when making an upcall, the packet is again copied to the upcall's buffer. However, this extra copy is not necessary when the datapath exists in userspace as the upcall can directly access the packet data. This patch eliminates this extra copy of the packet data in most cases. In cases where the packet may still be used later by callers of dp_netdev_execute_actions, making a copy of the packet data is still necessary. This patch also adds a dpdk_buf field to 'struct ofpbuf' when using DPDK. This field holds a pointer to the allocated DPDK buffer in the rte_mempool. Thus, an upcall packet ofpbuf allocated on the stack can now share data and free memory of a rte_mempool allocated ofpbuf. Signed-off-by: Ryan Wilson <wryan@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-04 15:48:30 -07:00
Daniele Di Proietto	d221ffa1e1	netdev-dpdk: create queues on configured NUMA node This patch makes sure that the tx and rx queues are allocated on the NUMA socket chosen at device initalization time, instead of the NUMA socket 0. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-04 15:39:49 -07:00
Daniele Di Proietto	7d08d53ed5	netdev-dpdk: receive up to NETDEV_MAX_RX_BATCH As per netdev-provider interface, netdev_dpdk_rxq_recv should receive at most NETDEV_MAX_RX_BATCH. Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-06-04 15:38:45 -07:00
Daniele Di Proietto	a715f600c3	netdev-dpdk: use defined values for queues length Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>	2014-05-24 10:01:22 -07:00
Ben Pfaff	8ba0a5227f	ovs-thread: Make caller provide thread name when creating a thread. Thread names are occasionally very useful for debugging, but from time to time we've forgotten to set one. This commit adds the new thread's name as a parameter to the function to start a thread, to make that mistake impossible. This also simplifies code, since two function calls become only one. This makes a few other changes to the thread creation function: * Since it is no longer a direct wrapper around a pthread function, rename it to avoid giving that impression. * Remove 'pthread_attr_t ' param that every caller supplied as NULL. Change 'pthread *' parameter into a return value, for convenience. The system-stats code hadn't set a thread name, so this fixes that issue. This patch is a prerequisite for making RCU report the name of a thread that is blocking RCU synchronization, because the easiest way to do that is for ovsrcu_quiesce_end() to record the current thread's name. ovsrcu_quiesce_end() is called before the thread function is called, so it won't get a name set within the thread function itself. Setting the thread name earlier, as in this patch, avoids the problem. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>	2014-04-28 15:25:49 -07:00
Alex Wang	045c0d1a77	netdev-dpdk: Indicate the change of etheraddr and mtu. This commit makes the netdev-dpdk module signal the change of etheraddr and mtu by changing the global sequence number and incrementing its 'change_seq'. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-04-10 12:55:28 -07:00
Alex Wang	3e912ffcbb	netdev: Add 'change_seq' back to netdev. This commit can be seen as a partial revert of commit da4a619179d (netdev: Globally track port status changes) by adding the 'change_seq' to 'struct netdev'. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>	2014-04-10 12:55:28 -07:00
Pravin Shelar	b3cd9f9d6a	netdev-dpdk: Remove alloc from packet recv. On DPDK packet recv, ovs is given pointer to mbuf which has information about a packet, for example pointer to data and size. By moving mbuf to ofpbuf we can let dpdk allocate ofpbuf and pass that to ovs for processing the packet. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-03-30 06:26:11 -07:00
Pravin Shelar	1f317cb5c2	ofpbuf: Introduce access api for base, data and size. These functions will be used by later patches. Following patch does not change functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-03-30 06:18:43 -07:00
Pravin	8617affff4	netdev-dpdk: Use multiple core for dpdk IO. DPDK need to set _lcore_id for using multiple core. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com>	2014-03-21 11:48:28 -07:00
Pravin	8a9562d21a	dpif-netdev: Add DPDK netdev. Following patch adds DPDK netdev-class to userspace datapath. Now OVS can use DPDK port for IO by just configuring DPDK port and then adding dpdk type port to userspace datapath. Refer to INSTALL.DPDK doc for further info. This is based a patch from Gerald Rogers. Signed-off-by: Gerald Rogers <gerald.rogers@intel.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com>	2014-03-21 11:48:28 -07:00

... 5 6 7 8 9

440 Commits