2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 18:07:40 +00:00

67 Commits

Author SHA1 Message Date
Ethan Jackson
bce01e3a89 netdev-dpdk: Fix sparse warnings.
These are all minor style issues.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
2015-05-19 14:47:00 -07:00
Kevin Traynor
95e9881f84 netdev-dpdk: Add vhost enqueue retries.
The max allowed burst size for a single vhost enqueue is 32.
This code facilitates trying to send greater than the burst
size of packets to the vhost interface by adding a retry loop
and calling vhost enqueue multiple times. As this could
potentially block, a timeout is added.

Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-05-11 21:58:14 -07:00
Kevin Traynor
4345e1b5bf netdev-dpdk: Change phy rx burst size.
Change phy rx burst size from 192 to 32. This aligns the
burst size with the other dpdk interfaces and significantly
improves performance when forwarding to dpdk vhost ports.

Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-05-11 21:58:12 -07:00
Mark Kavanagh
543342a41c DPDK: add support for v2.0.0
Update relevant artifacts to add support for DPDK v2.0.0
 - INSTALL.DPDK.md
 - travis build script
 - acinclude.m4: add 'mssse3' flag to OVS_CFLAGS
 - netdev-dpdk: fix build with unified offload types in DPDK v2.0.0

Note that this breaks compatibility with DPDK v1.8.0

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-04-29 20:49:37 -07:00
Mark D. Gray
1b99bb0552 netdev-dpdk: Reset RSS hash on transmit
When using DPDK rings (dpdkr port type), packet buffers get shared
to consumers of the rings (e.g. Virtual Machines). The packet buffers
also include the RSS hash. This is a hash of a number of fields
in the packet and is used in order to do a fast lookup in the EMC.

However, if a consumer of the packet modifies the packet without
regenerating the RSS hash, the EMC will use the same hash for lookup
even though the packet may belong to a different flow. This would
cause unnecessary collisions in the EMC reducing performance in the
presence of multiple flows.

To avoid receiving an incorrect RSS hash on reception from a DPDK
ring, the RSS hash needs to be reset on transmission. This will reduce
performance of the forwarding path as the RSS hash will need to
calculated for every packet received from an dpdkr but will behave
correctly in the presence of a large number of flows that get
modified by the consumer of a DPDK ring

Signed-off-by: Mark D. Gray <mark.d.gray@intel.com>
Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-04-22 20:09:52 -07:00
Kevin Traynor
618f44f7a4 netdev-dpdk: Put cuse thread into quiescent state.
ovsrcu_synchronize() is used when setting virtio_dev to NULL.
This results in an ovsrcu_quiesce_end() call which means the
cuse thread may not go into quiescent state again for an
indefinite time. Add an ovsrcu_quiesce_start() call to prevent
this.

Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-03-27 11:12:39 -07:00
Daniele Di Proietto
da79ce2b71 netdev-dpdk: create smaller mempools in case of failure
If rte_mempool_create() fails with ENOMEM, try asking for a smaller
mempools. This patch enables OVS DPDK to run on systems without 1GB
hugepages

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-03-24 14:52:01 -07:00
Kevin Traynor
58397e6c1e netdev-dpdk: add dpdk vhost-cuse ports
This patch adds support for a new port type to userspace datapath
called dpdkvhost. This allows KVM (QEMU) to offload the servicing
of virtio-net devices to its associated dpdkvhost port. Instructions
for use are in INSTALL.DPDK.

This has been tested on Intel multi-core platforms and with clients
that have virtio-net interfaces.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-03-19 20:26:03 -07:00
Mark Kavanagh
b8e57534ec lib: upgrade to DPDK v1.8.0
DPDK v1.8.0 makes significant changes to struct rte_mbuf, including
removal of the 'pkt' and 'data' fields. The latter, formally a
pointer, is now calculated via an offset from the start of the
segment buffer.  So now dp_packet data is also stored as offset
from base pointer.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
Signed-off-by: Rory Sexton <rory.sexton@intel.com>
Signed-off-by: Kevin Traynor <kevin.traynor@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2015-03-04 10:05:39 -08:00
Pravin B Shelar
cf62fa4c70 dp-packet: Remove ofpbuf dependency.
Currently dp-packet make use of ofpbuf for managing packet
buffers. That complicates ofpbuf, by making dp-packet
independent of ofpbuf both libraries can be optimized for
their own use case.
This avoids mapping operation between ofpbuf and dp_packet
in datapath upcalls.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-03-03 13:37:37 -08:00
Pravin B Shelar
e14deea0bd dpif_packet: Rename to dp_packet
dp_packet is short and better name for datapath packet
structure.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-03-03 13:37:34 -08:00
Mark D. Gray
ee32150e7f netdev-dpdk: set_miimon should return EOPNOTSUPP.
According to netdev-provider, this function should return
EOPNOTSUPP if not supported.

Signed-off-by: Mark D. Gray <mark.d.gray@intel.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-02-13 12:32:59 -08:00
Alex Wang
abb5943dbb netdev-dpdk: Allow changing NON_PMD_CORE_ID for testing purpose.
For testing purpose, developers may want to change the NON_PMD_CORE_ID
and use a different core for non-pmd threads.  Since the netdev-dpdk
module is hard-coded to assert the non-pmd threads using core 0, such
change will cause abortion of OVS.

This commit fixes the assertion and allows changing NON_PMD_CORE_ID.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2015-02-10 17:38:03 -08:00
Thomas Graf
e6211adce4 lib: Move vlog.h to <openvswitch/vlog.h>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:19 +01:00
Thomas Graf
55951e15e5 lib: Expose struct ovs_list definition in <openvswitch/list.h>
Expose the struct ovs_list definition in <openvswitch/list.h>. Keep the
list access API private for now.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:16 +01:00
Thomas Graf
ca6ba70092 list: Rename struct list to struct ovs_list
struct list is a common name and can't be used in public headers.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:12 +01:00
Pravin B Shelar
a36de779d7 openvswitch: Userspace tunneling.
Following patch adds support for userspace tunneling. Tunneling
needs three more component first is routing table which is configured by
caching kernel routes and second is ARP cache which build automatically
by snooping arp. And third is tunnel protocol table which list all
listening protocols which is populated by vswitchd as tunnel ports
are added. GRE and VXLAN protocol support is added in this patch.

Tunneling works as follows:
On packet receive vswitchd check if this packet is targeted to tunnel
port. If it is then vswitchd inserts tunnel pop action which pops
header and sends packet to tunnel port.
On packet xmit rather than generating Set tunnel action it generate
tunnel push action which has tunnel header data. datapath can use
tunnel-push action data to generate header for each packet and
forward this packet to output port. Since tunnel-push action
contains most of packet header vswitchd needs to lookup routing
table and arp table to build this action.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-11-12 15:08:33 -08:00
David Verbeiren
7251515ea9 netdev-dpdk: Fix DPDK rings broken by multi queue
DPDK rings don't need one queue per PMD thread and don't support multiple
queues (set_multiq function is undefined). To fix operation with DPDK rings,
this patch ignores EOPNOTSUPP error on netdev_set_multiq() and provides, for
DPDK rings, a netdev send() function that ignores the provided queue id
(= PMD thread core id).

Suggested-by: Maryam Tahhan <maryam.tahhan@intel.com>
Signed-off-by: David Verbeiren <david.verbeiren@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-04 09:21:27 -08:00
Alex Wang
1b7a04e05b netdev-dpdk: Fix crash when there is no pci numa info.
When kernel cannot obtain the pci numa info, the numa_node file
in corresponding pci directory in sysfs will show -1.  Then the
rte_eth_dev_socket_id() function will return it to ovs.  On
current master, ovs assumes rte_eth_dev_socket_id() always
returns non-negative value.  So using this -1 in pmd thread
creation will cause ovs crash.

To fix the above issue, this commit makes ovs always check the
return value of rte_eth_dev_socket_id() and use numa node 0 if
the return value is negative.

Reported-by: Daniel Badea <daniel.badea@windriver.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-25 14:17:54 -07:00
Alex Wang
91968eb096 netdev-dpdk: Fix a bug in netdev_dpdk_set_multiq().
Commit 5a0340 (dpif-netdev: Create multiple tx/rx queues when
adding dpdk interface.) introduced a bug which causes the function
netdev_dpdk_set_multiq() never resetting the tx queues.  This bug
could cause pmd thread accessing unassigned memory, resulting in
segfault.

This commit fixes the bug.

Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-19 11:57:29 -07:00
Alex Wang
ba0358a118 netdev-dpdk: Fix a typo.
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-19 11:57:16 -07:00
Alex Wang
2654cc338b netdev-dpdk: Pass queue id to dpdk_do_tx_copy().
Since dpdk_do_tx_copy() will be called by both pmd and
non-pmd thread, it should take the queue id as input.
The current ovs always uses NON_PMD_THREAD_TX_QUEUE
as queue id, which causes unprotected multi-access
to the same queue.

This commit fixes the issue by passing the queue id
to dpdk_do_tx_copy().

Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-18 17:26:13 -07:00
Alex Wang
b7ccaf673e netdev-dpdk: Fix thread-safety breach.
dpdk_eth_dev_init() must be called with dpdk_mutex.  However,
netdev_dpdk_set_multiq() fails to follow this rule.  This commit
fixes this breach.

Found by clang.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-15 14:35:41 -07:00
Alex Wang
476590621e netdev-dpdk: Make get_config() report correct queue info.
With the separation of tx queue and rx queue configuration
in netdev-dpdk module, the netdev_dpdk_get_config() can no
longer report 'n_rxq' as tx queue configuration.

This commit fixes the above issue.

Reported-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
2014-09-15 14:35:41 -07:00
Alex Wang
65f13b50c5 dpif-netdev: Create multiple pmd threads by default.
With this commit, ovs by default will create one pmd thread
for each numa node and pin the pmd thread to available cpu
core on the numa node.

NON_PMD_CORE_ID (currently 0) is used to reserve a particular
cpu core for the I/O of all non-pmd threads.  No pmd thread
can be pinned to this reserved core.

As side-effects of this commit:

-  pmd thread will not be created, if there is no dpdk interface
   from the corresponding numa node added to ovs.

- the exact-match cache for non-pmd threads is removed from
  'struct dp_netdev'.  Instead, all non-pmd threads will use
  the exact-match cache defined in the 'struct dp_netdev_pmd_thread'
  for NON_PMD_CORE_ID.

- the rx packet processing functions are refactored to use
  'struct dp_netdev_pmd_thread' as input.

- the 'netdev_send()' function will be called with the proper
  queue id.

- both pmd and non-pmd threads can call the dpif_netdev_execute().
  so, use a per-thread key to help recognize the calling thread.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:49 -07:00
Alex Wang
95a596e3d9 netdev-dpdk: Remove the tx queue spinlock.
The previous commit makes OVS create one tx queue for each
cpu core, each pmd thread will use a separate tx queue.
Also, tx of non-pmd threads on dpdk interface is all through
'NON_PMD_THREAD_TX_QUEUE', protected by the 'nonpmd_mempool_mutex'.
Therefore, the spinlock is no longer needed.  And this commit
removes it from 'struct dpdk_tx_queue'.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:49 -07:00
Alex Wang
94143fc41e netdev-dpdk: Add indicator for flushing tx queue.
Previous commit makes OVS create one tx queue for each cpu
core.  An upcoming patch will allow multiple pmd threads be
created and pinned to cpu cores.  So each pmd thread will use
the tx queue corresponding to its core id.

Moreover, the pmd threads running on different numa node than
the dpdk interface (called non-local pmd thread) will not
handle the rx of the interface.  Consequently, there need to
be a way to flush the tx queues of the non-local pmd threads.

To address the queue flushing issue, this commit introduces a
new flag 'flush_tx' in the 'struct dpdk_tx_queue' which is
set if the queue is to be used by a non-local pmd thread.
Then, when enqueueing the tx pkts, if the flag is set, the tx
queue will always be flushed immediately after the enqueue.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:48 -07:00
Alex Wang
5a03406477 dpif-netdev: Create multiple tx/rx queues when adding dpdk interface.
Before this commit, ovs creates one tx and one rx queue for
each dpdk interface and uses only one poll thread for handling
I/O of all dpdk interfaces.  An upcoming patch will allow multiple
poll threads be created.  As a preparation, this commit changes
the dpif-netdev to create multiple tx/rx queues when the dpdk
interface is added.

Specifically, the number of rx queues will still be one per-dpdk
interface for this commit.  But upcoming work will allow user
create multiple rx queues.  The number of tx queues will be the
number of cpu cores on the machine.  Although not all the tx queues
will be used, each poll thread will have its own queue for
transmission on the dpdk interface.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:48 -07:00
Alex Wang
5496878cbf netdev: Add function for configuring tx and rx queues.
This commit adds a new API to the 'struct netdev_class' which
allows user to configure the number of tx queues and rx queues
of 'netdev'.  Upcoming patches will use this function to set
multiple tx/rx queues when adding the netdev to dpif-netdev.

Currently, only netdev-dpdk module implements this function.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 11:43:48 -07:00
Pravin B Shelar
2f9dd77fcd ofproto: Do not update stats on fake bond interface.
There are couple of reasons to remove this support:
*   This is used in very old OVS use-case. It is much better
    to read stats directly from OVS.
*   Forthcoming commit will remove support for setting stats
    for vport. The stats update depends on stats-set.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-09-15 10:08:56 -07:00
Alex Wang
f00fa8cbad netdev: Add n_txq to 'struct netdev'.
This commit adds new variable n_txq to 'struct netdev' for recording
the number of tx queues.  Correspondingly, the send_*() functions are
extended to accept queue id as input argument.

All 'netdev-*' implementation will ignore the queue id since having
multiple tx queues is not supported.  Upcomping patches will start
using it and create multiple tx queues for dpdk netdev.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 11:30:58 -07:00
Alex Wang
7dec44fe1c netdev: Add function for getting the numa node id of netdev.
This commit adds a new API to the 'struct netdev_class' which
allows user to query the numa node id the 'netdev' is on.

Currently, only netdev-dpdk module implements this function.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-12 11:30:58 -07:00
Alex Wang
e0a801c7fd netdev-dpdk: Show interface status for dpdk0.
This commit fixes a bug which prevents the display of interface
status for dpdk0.

Found by inspection.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-02 21:48:28 -07:00
Alex Wang
34631d72fb netdev-dpdk: Make memory pool name contain the socket id.
This commit makes the memory pool name contain the socket id.
Since dpdk library do not allow creation of memory pool with
same name, this commit serves as a simple way of making each
name unique.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-02 15:46:57 -07:00
Daniele Di Proietto
61a2647e15 packet-dpif: Add dpif_packet_{get, set}_hash()
These function are used to stored the packet hash. 'netdev-dpdk'
automatically set this value to the RSS hash returned by the
NIC. Other 'netdev's set it to 0 (which is an invalid hash
value), so that callers can compute the hash on their own.

If DPDK support is enabled, struct dpif_packet's member
'dp_hash' is removed and 'pkt.hash.rss' from DPDK mbuf is used

This commit also configure DPDK devices to compute RSS hash
for UDP and IPv6 packets

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-29 16:32:21 -07:00
Daniele Di Proietto
58f7c37b1f netdev-dpdk: Use different constant for ring size
DPDK rings must have a power-of-two size.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-29 15:48:59 -07:00
Daniele Di Proietto
1304f1f8a7 netdev-dpdk: Keep calling rte_eth_tx_burst() until it returns 0
rte_eth_tx_burst() _should_ transmit every packet that it is passed unless the
queue is full. Nontheless some implementation of rte_eth_tx_burst (e.g.
ixgbe_xmit_pkts_vec()) does not transmit more than a fixed number (32) of
packets at a time.

With this commit we assume that there's an error only if rte_eth_tx_burst
returns 0.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-12 17:38:48 -07:00
Daniele Di Proietto
d731058395 netdev-dpdk: Move to DPDK 1.7.0
With this commit we move our DPDK support to 1.7.0.
DPDK binaries (starting with dpdk 1.7.0) should be linked with --whole-archive
to include pmd drivers

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-08-12 17:38:48 -07:00
Ethan Jackson
645b893423 style: Replace TODO with XXX.
In accordance with CodingStyle.

Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 14:13:20 -07:00
Daniele Di Proietto
3a10026527 netdev-dpdk: Increase tx queue size and rx batch size
These values has been found to give the best throughput
in simple cases (1 flow 64 bytes UDP packets).

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-22 20:06:02 -07:00
Daniele Di Proietto
db73f7166a netdev-dpdk: Fix race condition with DPDK mempools in non pmd threads
DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.

This commit resolves the issue with the following changes:

- Every non pmd thread has the same lcore_id (0, for management reasons), which
  is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
  use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
  held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
  threads: therefore this commit partially revert 143859ec63d45e. Now packets
  are copied for upcall processing. We can remove the extra memcpy by
  processing upcalls in the pmd thread itself.

With the introduction of the extra locking, the packet throughput will be lower
in the following cases:

- When using internal (tap) devices with DPDK devices on the same datapath.
  Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
  which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
  can be avoided by handling the upcalls directly in pmd threads (a change that
  has already been proposed by Ryan Wilson)

Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
  This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
  mempools

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-20 10:13:22 -07:00
Daniele Di Proietto
033e9df25f netdev-dpdk: Refactor dpdk_class_init()
The following changes were made:

- Since we have two dpdk classes, we should split the initial operations needed
  by both classes from the initialization needed by each class.
- The dpdk_ring class does not need an initialization function: it has been
  removed. This also prevents many testcase from failing, because
  dpdk_ring_class_init() was printing an unexpected log message
  (OVS_VSWITCHD_START at tests/ofproto-macros.at:54 check for a specific set of
  startup log messages)
- If the user doesn't pass the --dpdk option we do not register the dpdk*
  classes
- Do not call VLOG_ERR if there are 0 dpdk ethernet device. OVS can now be used
  with dpdk_ring devices.

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-17 11:16:32 -07:00
maryam.tahhan
95fb793ae7 netdev-dpdk: add dpdk rings to netdev-dpdk
Shared memory ring patch

This patch enables the client dpdk rings within the netdev-dpdk.  It adds
a new dpdk device called dpdkr (other naming suggestions?).  This allows
for the use of shared memory to communicate with other dpdk applications,
on the host or within a virtual machine.  Instructions for use are in
INSTALL.DPDK.

This has been tested on Intel multi-core platforms and with the client
application within the host.

Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-16 09:43:15 -07:00
Ryan Wilson
f98d78641c netdev-dpdk: Add OVS_UNLIKELY annotations in dpdk_do_tx_copy().
Since dropped packets due to large packet size or lack of memory
are unlikely, it is best to add OVS_UNLIKELY annotations to these
conditions.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-30 13:48:49 -07:00
Ryan Wilson
175cf4de3f netdev-dpdk: Fix memory leak in dpdk_do_tx_copy().
This patch fixes a bug where rte_pktmbuf_alloc() would fail and
packets which succeeded to allocate memory with rte_pktmbuf_alloc()
would not be sent and leak memory.

Also, as a byproduct of using a local variable to record dropped
packets, this reduces the locking of the netdev's mutex when
multiple packets are dropped in dpdk_do_tx_copy().

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-30 10:53:58 -07:00
Ryan Wilson
844f2d749a netdev-dpdk: Set current timestamp when flushing TX queue.
The current timestamp should be set every time the queue is flushed.
Thus, if DRAIN_TSC timer cycles have passed since the last timestamp,
the send queue should be flushed again.

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-30 10:53:58 -07:00
Ryan Wilson
b170db2aaa netdev-dpdk: Refactor dpdk_queue_flush().
This patch refactors dpdk_queue_flush() to reuse code in
dpdk_queue_pkts().

Signed-off-by: Ryan Wilson <wryan@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-30 10:53:53 -07:00
Polehn, Mike A
79f5354c4c dpdk: High speed PMD physical NIC queue size
Large TX and RX queues are needed for high speed 10 GbE physical NICS.
Observed a 250% zero loss improvement over small NIC queue test for
port to port flow test.

Signed-off-by: Mike A. Polehn <mike.a.polehn@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-27 09:55:51 -07:00
Daniele Di Proietto
9477751009 netdev-dpdk: Disable NIC offloading and multiseg mbufs
We do not use any offloading (now) or multiple segments per packet, so
we might as well disable those features while configuring the NIC.

This could give performance improvements. For ixgbe, for example, this change
allows the driver to use a simpler tx routine, resulting in throuput
improvements (~7.5%)

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-25 10:54:04 -07:00
Daniele Di Proietto
a28ddd11c6 netdev-dpdk: Fix coding style in TX/RX conf structs
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-25 10:54:02 -07:00