2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-30 05:47:55 +00:00

67 Commits

Author SHA1 Message Date
Michael Santana
a5cacea5f9 handlers: Create additional handler threads when using CPU isolation.
Additional threads are required to service upcalls when we have CPU
isolation (in per-cpu dispatch mode). The reason additional threads
are required is because it creates a more fair distribution. With more
threads we decrease the load of each thread as more threads would
decrease the number of cores each threads is assigned.

Adding additional threads also increases the chance OVS utilizes all
cores available to use. Some RPS schemas might make some handler
threads get all the workload while others get no workload. This tends
to happen when the handler thread count is low.

An example would be an RPS that sends traffic on all even cores on a
system with only the lower half of the cores available for OVS to use.
In this example we have as many handlers threads as there are
available cores. In this case 50% of the handler threads get all the
workload while the other 50% get no workload. Not only that, but OVS
is only utilizing half of the cores that it can use. This is the worst
case scenario.

The ideal scenario is to have as many threads as there are cores - in
this case we guarantee that all cores OVS can use are utilized

But, adding as many threads are there are cores could have a performance
hit when the number of active cores (which all threads have to share) is
very low. For this reason we avoid creating as many threads as there
are cores and instead meet somewhere in the middle.

The formula used to calculate the number of handler threads to create
is as follows:

handlers_n = min(next_prime(active_cores+1), total_cores)

Assume default behavior when total_cores <= 2, that is do not create
additional threads when we have less than 2 total cores on the system

Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
Signed-off-by: Michael Santana <msantana@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-08-15 18:51:09 +02:00
Gaetan Rivet
6207205e58 ovs-thread: Fix barrier use-after-free.
When a thread is blocked on a barrier, there is no guarantee
regarding the moment it will resume, only that it will at some point in
the future.

One thread can resume first then proceed to destroy the barrier while
another thread has not yet awoken. When it finally happens, the second
thread will attempt a seq_read() on the barrier seq, while the first
thread have already destroyed it, triggering a use-after-free.

Introduce an additional indirection layer within the barrier.
A internal barrier implementation holds all the necessary elements
for a thread to safely block and destroy. Whenever a barrier is
destroyed, the internal implementation is left available to still
blocking threads if necessary. A reference counter is used to track
threads still using the implementation.

Note that current uses of ovs-barrier are not affected: RCU and
revalidators will not destroy their barrier immediately after blocking
on it.

Fixes: d8043da7182a ("ovs-thread: Implement OVS specific barrier.")
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2022-01-18 15:12:01 +01:00
William Tu
884ca8aceb ovs-thread: Add pthread spin lock support.
The patch adds the basic spin lock functions:
ovs_spin_{lock, try_lock, unlock, init, destroy}.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
2019-07-19 15:14:18 +03:00
Ilya Maximets
f7e4685015 treewide: Clean up inclusions of netdev-dpdk header.
'netdev-dpdk.h' provides only 'netdev_dpdk_register' and
'free_dpdk_buf' which are not used in these files and should
not be used.
Leftovers from the already removed code.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2019-03-14 08:45:05 +00:00
Ilya Maximets
5f361a2a32 ovs-thread: Add thread safety annotation to cond_wait.
This fixes build with clang on FreeBSD:

  lib/ovs-thread.c:266:13: error:

  calling function 'pthread_cond_wait' requires holding mutex \
  'mutex->lock' exclusively [-Werror,-Wthread-safety-analysis]

      error = pthread_cond_wait(cond, &mutex->lock);
              ^

Fixes: 97be153858b4 ("clang: Add annotations for thread safety check.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-12-10 09:52:55 -08:00
Ilya Maximets
6fe27a7103 ovs-thread: Drop xpthread_meutex_{un}lock finctions.
There are no users of these functions.
This change fixes clang build on FreeBSD:

  lib/ovs-thread.c:158:1: error: \
      mutex 'mutex' is still held at the end of function \
      [-Werror,-Wthread-safety-analysis]
  XPTHREAD_FUNC1(pthread_mutex_lock, pthread_mutex_t *);
  ^
  lib/ovs-thread.c:138:5: note: expanded from macro 'XPTHREAD_FUNC1'
      }
      ^

Fixes: 4dff0893c376 ("ovs-atomic-pthreads: Use global shared locks for atomic_flag also.")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2018-12-10 09:51:46 -08:00
Eelco Chaudron
91b8ec6c14 ovs-thread: Fix thread id for threads not started with ovs_thread_create()
When ping-pong'in a live VM migration between two machines running
OVS-DPDK every now and then the ping misses would increase
dramatically. For example:
Acked-by: Ilya Maximets <i.maximets@samsung.com>

===========Stream Rate: 3Mpps===========
No Stream_Rate Downtime Totaltime Ping_Loss Moongen_Loss
 0       3Mpps      128     13974       115      7168374
 1       3Mpps      145     13620        17      1169770
 2       3Mpps      140     14499       116      7141175
 3       3Mpps      142     13358        16      1150606
 4       3Mpps      136     14004        16      1124020
 5       3Mpps      139     15494       214     13170452
 6       3Mpps      136     15610       217     13282413
 7       3Mpps      146     13194        17      1167512
 8       3Mpps      148     12871        16      1162655
 9       3Mpps      137     15615       214     13170656

I identified this issue being introduced in OVS commit,
f3e7ec254738 ("Update relevant artifacts to add support for DPDK 17.05.1.")
and more specific due to DPDK commit,
af1475918124 ("vhost: introduce API to start a specific driver").

The combined changes no longer have OVS start the vhost socket polling
thread at startup, but DPDK will do it on its own when the first vhost
client is started.

Figuring out the reason why this happens kept me puzzled for quite some time...
What happens is that the callbacks called from the vhost thread are
calling ovsrcu_synchronize() as part of destroy_device(). This will
end-up calling seq_wait__().

By default, all created threads outside of OVS will get thread id 0,
which is equal to the main ovs thread. So for example in the
seq_wait__() function above if the main thread is waiting already we
won't add ourselves as a waiter.

The fix below assigns OVSTHREAD_ID_UNSET to none OVS created threads,
which will get updated to a valid ID on the first call to
ovsthread_id_self().

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Fixes: f3e7ec254738 ("Update relevant artifacts to add support for DPDK
                      17.05.1.")
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2018-06-08 17:27:56 +01:00
Justin Pettit
8a7903c632 Update mailing list archive pointers to the current server.
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2017-11-27 14:59:46 -08:00
Xiao Liang
fd016ae3fb lib: Move lib/poll-loop.h to include/openvswitch
Poll-loop is the core to implement main loop. It should be available in
libopenvswitch.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-11-03 10:47:55 -07:00
Alin Serdean
1e4eecb478 ovs-thread: Avoid pthread_rwlockattr_t on Windows.
A recent commit fixed ovs_rwlock_init() to pass the pthread_rwlockattr_t
that it initialized to pthread_rwlock_init().  According to POSIX
documentation this is correct, but on Windows the current implementation of
pthreads does not support a pre-initialized attribute.  Please see a fork
of the implementation
19fd5054b2/pthread_rwlock_init.c (L59-L63)
This is the same implementation as the official version found under:
ftp://sourceware.org/pub/pthreads-win32/)

A short debug output from `vswitch` to confirm the above:

>k
 Index  Function
--------------------------------------------------------------------------------
*1      ovs-vswitchd.exe!ovs_rwlock_init(const ovs_rwlock * l_=0x000001721c7da250)
 2      ovs-vswitchd.exe!open_dpif_backer(const char * type=0x000001721c7d8d60, dpif_backer * * backerp=0x000001721c7d89c0)
 3      ovs-vswitchd.exe!construct(ofproto * ofproto_=0x000001721c7d87d0)
 4      ovs-vswitchd.exe!ofproto_create(const char * datapath_name=0x000001721c7d86e0, const char * datapath_type=0x000001721c7d8750, ofproto * * ofprotop=0x000001721c7d80b8)
 5      ovs-vswitchd.exe!bridge_reconfigure(const ovsrec_open_vswitch * ovs_cfg=0x000001721c7e05b0)
 6      ovs-vswitchd.exe!bridge_run()
 7      ovs-vswitchd.exe!main(int argc=6, char * * argv=0x000001721c729e10)
 8      [External Code]

>? error
22
https://github.com/openvswitch/ovs/blob/master/lib/ovs-thread.c#L243

This patch is critical because the majority (over 800) of the unit tests
are failing.

Fixes: 1a15f390afd6 ("lib/ovs-thread: set prefer writer lock for ovs_rwlock_init()")
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Shashank Ram <rams@vmware.com>
[blp@ovn.org changed the details of the approach]
Signed-off-by: Ben Pfaff <blp@ovn.org>
2017-01-04 09:16:06 -08:00
zangchuanqiang
1a15f390af lib/ovs-thread: set prefer writer lock for ovs_rwlock_init()
An alternative "writer nonrecursive" rwlock allows recursive
read-locks to succeed only if there are no threads waiting for the
write-lock. In the function ovs_rwlock_init(), there exist a problem,
the parameter of 'attr' is not used to set the attributes of ovs_rwlock 'l_',
just because use pthread_rwlock_init(&l->lock, NULL) to init l->lock.

The attr object needs to be passed to the pthread_rwlock_init()
call in order to make use of it.

Signed-off-by: zangchuanqiang <zangchuanqiang@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-12-22 16:21:01 -08:00
Andy Zhou
1b870ac0ca lib: Remove extra API dependency for ovs_thread_create()
When calling ovs_thread_create() without calling fatal_signal_init()
first, ovs_thread_create() some times asserts. This dependency is
subtle and not very obvious.

The root cause seems to be that, within ovs_thread_create(), the
multi-threaded state is declared before all initializations are done.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-07-05 16:32:23 -07:00
David Marchand
be15ec48d7 lib: Use a more accurate value for CPU count (sched_getaffinity).
Relying on /proc/cpuinfo to count the number of available cores is not
the best option:

- The code is x86-specific.
- If the process is started with a different CPU affinity, then it will
  wrongly try to start too many threads (for an example, imagine an OVS
  daemon restricted to 4 CPU threads on a 128 threads system).

This commit removes /proc/cpuinfo parsing. For Linux systems, it
introduces instead a call to sched_getaffinity(), which is
architecture-independant, in order to retrieve the list of CPU threads
available to the current process and to count them. Other UNIX-like
systems only use _SC_NPROCESSORS_ONLN.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Co-authored-by: Liu Xiaofeng <xiaofeng.liu@6wind.com>
Signed-off-by: Liu Xiaofeng <xiaofeng.liu@6wind.com>
Co-authored-by: Quentin Monnet <quentin.monnet@6wind.com>
Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-07-01 14:48:40 -07:00
Daniele Di Proietto
5724fca48c ovs-thread: Do not quiesce in ovs_mutex_cond_wait().
ovs_mutex_cond_wait() is used in many functions in dpif-netdev to
synchronize with pmd threads, but we can't guarantee that the callers do
not hold RCU references, so it's better to avoid quiescing.

In system_stats_thread_func() the code relied on ovs_mutex_cond_wait()
to introduce a quiescent state, so explicit calls to
ovsrcu_quiesce_start() and ovsrcu_quiesce_end() are added there.

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-05-23 10:27:42 -07:00
Ben Warren
417e7e66e1 list: Rename all functions in list.h with ovs_ prefix.
This attempts to prevent namespace collisions with other list libraries

Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 13:04:32 -07:00
Ben Warren
b19bab5b20 list: Remove lib/list.h completely.
All code is now in include/openvswitch/list.h.

Signed-off-by: Ben Warren <ben@skyportsystems.com>
Acked-by: Ryan Moats <rmoats@us.ibm.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-03-30 13:01:21 -07:00
Daniele Di Proietto
13b6d08790 ovs-thread: Do not always end quiescent state in ovs_thread_create().
A new thread must be started in a non quiescent state.  There is a call
to ovsrcu_quiesce_end() in ovsthread_wrapper(), to enforce this.

ovs_thread_create(), instead, is executed in the parent thread. It must
call ovsrcu_quiesce_end() on its first invocation, to put the main
thread in a non quiescent state.  On every other invocation, it doesn't
make sense to alter the calling thread state, so this commits wraps the
call to ovsrcu_quiesce_end() in an ovsthread_once construct.

This fixes a bug in ovs-rcu where the first call in the process to
ovsrcu_quiesce_start() will not be honored, because the calling thread
will need to create the 'urcu' thread (and creating a thread will
wrongly end its quiescent state).

ovsrcu_quiesce_start()
  ovs_rcu_quiesced()
    if (ovsthread_once_start(&once)) {
        ovs_thread_create("urcu") /*This will end the quiescent state*/
    }

This bug affects in particular ovs-vswitchd with DPDK.
In the DPDK case the first threads created are "vhost_thread" and
"dpdk_watchdog".  If dpdk_watchdog is the first to call
ovsrcu_quiesce_start() (via xsleep()), the call is not honored and
the RCU grace period lasts at least for DPDK_PORT_WATCHDOG_INTERVAL
(5s on current master).  If vhost_thread, on the other hand, is the
first to call ovsrcu_quiesce_start(), the call is not honored and the
RCU grace period lasts undefinitely, because no more calls to
ovsrcu_quiesce_start() are issued from vhost_thread.

For some reason (it's a race condition after all), on current master,
dpdk_watchdog will always be the first to call ovsrcu_quiesce_start(),
but with the upcoming DPDK database configuration changes, sometimes
vhost_thread will issue the first call to ovsrcu_quiesce_start().

Sample ovs-vswitchd.log:

2016-03-23T22:34:28.532Z|00004|ovs_rcu(urcu3)|WARN|blocked 8000 ms
waiting for vhost_thread2 to quiesce
2016-03-23T22:34:30.501Z|00118|ovs_rcu|WARN|blocked 8000 ms waiting for
vhost_thread2 to quiesce
2016-03-23T22:34:36.532Z|00005|ovs_rcu(urcu3)|WARN|blocked 16000 ms
waiting for vhost_thread2 to quiesce
2016-03-23T22:34:38.501Z|00119|ovs_rcu|WARN|blocked 16000 ms waiting for
vhost_thread2 to quiesce

The commit also adds a test for the ovs-rcu module to make sure that:
* A new thread is started in a non quiescent state.
* The first call to ovsrcu_quiesce_start() is honored.
* When a process becomes multithreaded the main thread is put in an
  active state

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
2016-03-25 11:15:45 -07:00
Alexandru Ardelean
5ffab04a5b lib/ovs-thread: make use of the pthread_attr object
The pthread_attr object needs to be passed to the pthread_create()
call in order to make use of it.

Fixes: 8147cec9ee (lib/ovs-thread: Ensure that thread stacks are
                   always at least 512 kB.)
Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
Acked-by: Andy Zhou <azhou@ovn.org>
2016-03-10 19:25:38 -08:00
Alexandru Ardelean
8147cec9ee lib/ovs-thread: Ensure that thread stacks are always at least 512 kB.
This makes a difference for libc implementations (such as musl libc) that
have a really small default pthread stack size.

Will reference this discussion:
http://patchwork.ozlabs.org/patch/572340/

Reported-by: Robert McKay <robert@mckay.com>
Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
[blp@ovn.org made style changes]
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-02-05 11:17:39 -08:00
William Tu
fa20477465 ovs-thread: Fix missing space.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Justin Pettit <jpettit@ovn.org>
2016-01-29 05:54:33 -08:00
Ilya Maximets
2f8932e840 poll: Suppress logging for pmd threads.
'Unreasonably long poll interval's are reasonable for PMD threads.
Also reporting of high CPU usage is not necessary.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
2016-01-11 09:59:49 -08:00
Ben Pfaff
5657f68636 ovs-thread: Fix memory leak in thread exit.
'n' is the number of keys, which are grouped into blocks of L2_SIZE
indexes.  Even if only one key in a block is allocated, the whole block has
a pointer to it that must be freed.  Thus, we need to round up instead of
down.

Reported-at: https://github.com/openvswitch/ovs/pull/87
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2015-11-10 14:11:28 -08:00
Ben Pfaff
40e7cf5607 configure: Stop avoiding -Wformat-zero-length.
Debian likes to enable -Wformat-zero-length, even over our code trying to
disable it.  It isn't too hard to make our code warning-free against this
option, so this commit both stops disabling it and fixes the warnings.

The first fix is to change set_subprogram_name() to take a plain string
instead of a format string, and to adjust its few callers.  This fixes one
warning since one of those callers passed in an empty string.

The second fix is to remove a test for ovs_scan() against an empty string.
I couldn't find a way to avoid a warning for this test, and it isn't too
valuable in any case.

This allows us to drop filtering for -Wformat from the Debian rules file,
so this commit removes it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2015-06-10 09:19:39 -07:00
Gurucharan Shetty
d2843eba6d ovs_threads: Avoid running pthread destructors from main thread exit.
Windows uses pthreads-win32 library to provide the Linux pthread
functionality. It is observed that when the main thread calls
a pthread destructor after it exits, undefined behavior is seen
(e.g., junk values in data, causing pthread deadlocks).
Similar behavior has been seen by
other people as seen in the following email thread:
https://sourceware.org/ml/pthreads-win32/2003/msg00001.html

To avoid this, this commit de-registers the thread destructor
when the main thread exits (via the atexit handler).

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2015-05-27 09:43:36 -07:00
Daniele Di Proietto
d5c199ea7f netdev-dpdk: Properly support non pmd threads.
We used to reserve DPDK lcore 0 for non pmd operations, making it
difficult to use core 0 for packet processing.
DPDK 2.0 properly support non EAL threads with lcore LCORE_ID_ANY.

Using non EAL threads for non pmd threads, we do not need to reserve
any core for non pmd operations

Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2015-05-22 11:28:19 -07:00
Thomas Graf
e6211adce4 lib: Move vlog.h to <openvswitch/vlog.h>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:19 +01:00
Thomas Graf
55951e15e5 lib: Expose struct ovs_list definition in <openvswitch/list.h>
Expose the struct ovs_list definition in <openvswitch/list.h>. Keep the
list access API private for now.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:16 +01:00
Thomas Graf
ca6ba70092 list: Rename struct list to struct ovs_list
struct list is a common name and can't be used in public headers.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-12-15 14:15:12 +01:00
Jarno Rajahalme
9230662a87 lib/ovs-thread: Avoid atomic read in ovsthread_once_start().
We can use a normal bool and rely on the mutex_lock/unlock and an
atomic_thread_fence for synchronization.

Also flip the return value of ovsthread_once_start__() to match the
one of ovsthread_once_start().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 16:15:44 -07:00
Jarno Rajahalme
6f0088655a lib/ovs-thread: Use atomic_count.
barrier->count is used as a simple counter and is not expected the
synchronize the state of any other variable, so we can use atomic_count,
which uses relaxed atomics.

Ditto for the 'next_id' within ovsthread_wrapper().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 16:15:44 -07:00
Jarno Rajahalme
ab355e6763 lib/seq: Document acquire-release semantics.
Seq objects would be really hard to use if they did not provide
acquire-release semantics.  Currently they do that via
ovs_mutex_lock()/ovs_mutex_unlock(), respectively.  Document the
behavior so that it is safer to rely on that elsewhere.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 16:15:44 -07:00
Thomas Graf
9bdc2ca4a7 thread: Use explicit wide type when shifting > 32 bits
Without the explicit wide type, the shift operation may be performed
on a int which will result in implementation defined behaviour on a
system with more than 32 CPUs.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:47:19 -07:00
Daniele Di Proietto
db73f7166a netdev-dpdk: Fix race condition with DPDK mempools in non pmd threads
DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.

This commit resolves the issue with the following changes:

- Every non pmd thread has the same lcore_id (0, for management reasons), which
  is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
  use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
  held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
  threads: therefore this commit partially revert 143859ec63d45e. Now packets
  are copied for upcall processing. We can remove the extra memcpy by
  processing upcalls in the pmd thread itself.

With the introduction of the extra locking, the packet throughput will be lower
in the following cases:

- When using internal (tap) devices with DPDK devices on the same datapath.
  Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
  which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
  can be avoided by handling the upcalls directly in pmd threads (a change that
  has already been proposed by Ryan Wilson)

Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
  This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
  mempools

Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-20 10:13:22 -07:00
Alex Wang
d8043da718 ovs-thread: Implement OVS specific barrier.
Non-leader revalidator thread uses pthread_barrier_* functions in their
main loop to synchronize with leader thread.  However, since those threads
only call poll_block() intermittently, the poll interval check in
poll_block() can wrongly take the time since last call as poll interval
and issue the following warnings:

"Unreasonably long XXXXms poll interval".

To prevent it, this commit implements the barrier struct and operations
for OVS which allow thread to block on barrier via poll_block().

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-06-13 17:00:31 -07:00
Ansis Atteka
1481a7551d process: block signals while spawning child processes
Between fork() and execvp() calls in the process_start()
function both child and parent processes share the same
file descriptors.  This means that, if a child process
received a signal during this time interval, then it could
potentially write data to a shared file descriptor.

One such example is fatal signal handler, where, if
child process received SIGTERM signal, then it would
write data into pipe.  Then a read event would occur
on the other end of the pipe where parent process is
listening and this would make parent process to incorrectly
believe that it was the one who received SIGTERM.
Also, since parent process never reads data from this
pipe, then this bug would make parent process to consume
100% CPU by immediately waking up from the event loop.

This patch will help to avoid this problem by blocking
signals until child closes all its file descriptors.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Reported-by: Suganya Ramachandran <suganyar@vmware.com>
Issue: 1255110
2014-05-30 10:06:10 -07:00
Ben Pfaff
6d765f17a8 ovs-thread: Issue better diagnostics for locking uninitialized mutexes.
This makes the message issued refer to the file and line that called
ovs_mutex_lock(), instead of to the file and line *inside*
ovs_mutex_lock().

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-05-08 09:20:09 -07:00
Ben Pfaff
05bf6d3c62 ovs-thread: Add checking for mutex and rwlock initialization.
With glibc, a mutex or rwlock filled with all-zero-bytes is properly
initialized for use, but this is not true for any other libc that OVS
supports.  However, OVS gets a lot more testing with glibc than any other
libc.  This means that developers keep introducing bugs that do not
manifest on the main development platform.

This commit should help avoid the problem, by reusing the existing 'where'
members to indicate whether a mutex or rwlock has been initialized.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-04-28 15:54:45 -07:00
Ben Pfaff
214694add2 ovs-rcu: Log a helpful warning when ovsrcu_synchronize() stalls.
This made it easier for me to find a thread that was causing stalls.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2014-04-28 15:25:49 -07:00
Ben Pfaff
8ba0a5227f ovs-thread: Make caller provide thread name when creating a thread.
Thread names are occasionally very useful for debugging, but from time to
time we've forgotten to set one.  This commit adds the new thread's name
as a parameter to the function to start a thread, to make that mistake
impossible.  This also simplifies code, since two function calls become
only one.

This makes a few other changes to the thread creation function:

    * Since it is no longer a direct wrapper around a pthread function,
      rename it to avoid giving that impression.

    * Remove 'pthread_attr_t *' param that every caller supplied as NULL.

    * Change 'pthread *' parameter into a return value, for convenience.

The system-stats code hadn't set a thread name, so this fixes that issue.

This patch is a prerequisite for making RCU report the name of a thread
that is blocking RCU synchronization, because the easiest way to do that is
for ovsrcu_quiesce_end() to record the current thread's name.
ovsrcu_quiesce_end() is called before the thread function is called, so it
won't get a name set within the thread function itself.  Setting the thread
name earlier, as in this patch, avoids the problem.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2014-04-28 15:25:49 -07:00
Ben Pfaff
595ef8b10a ovs-thread: Quiesce in xpthread_barrier_wait().
Otherwise the udpif revalidator threads can postpone RCU callbacks
essentially forever, especially if there are many revalidator threads and
little network traffic.

Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
2014-04-28 15:25:48 -07:00
Ben Pfaff
51852a57a0 ovs-thread: Replace ovsthread_counter by more general ovsthread_stats.
This allows clients to do more than just increment a counter.  The
following commit will make the first use of that feature.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-19 07:47:12 -07:00
Ben Pfaff
0f2ea84841 ovs-rcu: New library.
RCU allows multiple threads to read objects in parallel without any
performance penalty.  The following commit will introduce the first use.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-18 16:34:28 -07:00
Gurucharan Shetty
40a9237d2b ovs-thread: We don't use fork in Windows.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-13 14:23:48 -07:00
Gurucharan Shetty
fdd73c2366 ovs-thread: count the number of cpu cores.
We use the number of cpu cores to determine the number
of threads that we spawn. We are not yet sure what is
the ideal number of OVS userspace threads that can run
on Hyper-V. Till we figure that out, use the same logic
of counting CPU cores in Windows too.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-03-11 20:52:35 -07:00
Joe Stringer
f0e4e85d19 ovs-thread: Add xpthread_barrier_*() wrappers.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-03-04 13:37:11 -08:00
Ben Pfaff
6b59b543c8 ovs-thread: Use fair (but nonrecursive) rwlocks on glibc.
glibc supports two kinds of rwlocks:

    - The default kind of rwlock always allows recursive read-locks to
      succeed, but threads blocked on acquiring the write-lock are treated
      unfairly, causing them to be delayed indefinitely as long as new
      readers continue to come along.

    - An alternative "writer nonrecursive" rwlock allows recursive
      read-locks to succeed only if there are no threads waiting for the
      write-lock.  Otherwise, recursive read-lock attempts deadlock in
      the presence of blocking write-lock attempts.  However, this kind
      of rwlock is fair to writer.

POSIX allows the latter behavior, which essentially means that any portable
pthread program cannot try to take read-locks recursively.  Since that's
true, we might as well use the latter kind of rwlock with glibc and get the
benefit of fairness of writers.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
2014-02-21 16:27:10 -08:00
Jarno Rajahalme
ea6f3f9a49 ovs-thread: Add support for pthread adaptive mutex
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-02-13 13:12:05 -08:00
Ben Pfaff
e9020da2d2 ovs-thread: Add new support for thread-specific data.
A couple of times I've wanted to create a dynamic data structure that has
thread-specific data, but I've not been able to do that because
PTHREAD_KEYS_MAX is so low (POSIX says at least 128, glibc is only a little
bigger at 1024).  This commit introduces a new form of thread-specific data
that supports a large number of items.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2014-01-14 14:45:10 -08:00
Ben Pfaff
ed27e010b9 dpif-netdev: Use new "ovsthread_counter" to track dp statistics.
ovsthread_counter is an abstract interface that could be implemented
different ways.  The initial implementation is simple but less than
optimally efficient.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:10:32 -08:00
Ben Pfaff
4974b2b811 ovs-thread: Fix crash by making count_cpu_count() return type a signed int.
ofproto_set_threads() uses the calculation MAX(count_cpu_cores() - 2, 1)
to decide on the default thread count.  However, count_cpu_cores() returns
0 if it can't count the number of cores, or 1 if there's only one core,
and that causes the calculation to come out as UINT_MAX-2 or UINT_MAX-1,
respectively, which causes a memory allocation failure later.

There are other ways to fix this problem, too, of course.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2013-12-13 15:06:37 -08:00