A recent commit fixed ovs_rwlock_init() to pass the pthread_rwlockattr_t
that it initialized to pthread_rwlock_init(). According to POSIX
documentation this is correct, but on Windows the current implementation of
pthreads does not support a pre-initialized attribute. Please see a fork
of the implementation
19fd5054b2/pthread_rwlock_init.c (L59-L63)
This is the same implementation as the official version found under:
ftp://sourceware.org/pub/pthreads-win32/)
A short debug output from `vswitch` to confirm the above:
>k
Index Function
--------------------------------------------------------------------------------
*1 ovs-vswitchd.exe!ovs_rwlock_init(const ovs_rwlock * l_=0x000001721c7da250)
2 ovs-vswitchd.exe!open_dpif_backer(const char * type=0x000001721c7d8d60, dpif_backer * * backerp=0x000001721c7d89c0)
3 ovs-vswitchd.exe!construct(ofproto * ofproto_=0x000001721c7d87d0)
4 ovs-vswitchd.exe!ofproto_create(const char * datapath_name=0x000001721c7d86e0, const char * datapath_type=0x000001721c7d8750, ofproto * * ofprotop=0x000001721c7d80b8)
5 ovs-vswitchd.exe!bridge_reconfigure(const ovsrec_open_vswitch * ovs_cfg=0x000001721c7e05b0)
6 ovs-vswitchd.exe!bridge_run()
7 ovs-vswitchd.exe!main(int argc=6, char * * argv=0x000001721c729e10)
8 [External Code]
>? error
22
https://github.com/openvswitch/ovs/blob/master/lib/ovs-thread.c#L243
This patch is critical because the majority (over 800) of the unit tests
are failing.
Fixes: 1a15f390af ("lib/ovs-thread: set prefer writer lock for ovs_rwlock_init()")
Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Acked-by: Shashank Ram <rams@vmware.com>
[blp@ovn.org changed the details of the approach]
Signed-off-by: Ben Pfaff <blp@ovn.org>
An alternative "writer nonrecursive" rwlock allows recursive
read-locks to succeed only if there are no threads waiting for the
write-lock. In the function ovs_rwlock_init(), there exist a problem,
the parameter of 'attr' is not used to set the attributes of ovs_rwlock 'l_',
just because use pthread_rwlock_init(&l->lock, NULL) to init l->lock.
The attr object needs to be passed to the pthread_rwlock_init()
call in order to make use of it.
Signed-off-by: zangchuanqiang <zangchuanqiang@huawei.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
When calling ovs_thread_create() without calling fatal_signal_init()
first, ovs_thread_create() some times asserts. This dependency is
subtle and not very obvious.
The root cause seems to be that, within ovs_thread_create(), the
multi-threaded state is declared before all initializations are done.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
Relying on /proc/cpuinfo to count the number of available cores is not
the best option:
- The code is x86-specific.
- If the process is started with a different CPU affinity, then it will
wrongly try to start too many threads (for an example, imagine an OVS
daemon restricted to 4 CPU threads on a 128 threads system).
This commit removes /proc/cpuinfo parsing. For Linux systems, it
introduces instead a call to sched_getaffinity(), which is
architecture-independant, in order to retrieve the list of CPU threads
available to the current process and to count them. Other UNIX-like
systems only use _SC_NPROCESSORS_ONLN.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Co-authored-by: Liu Xiaofeng <xiaofeng.liu@6wind.com>
Signed-off-by: Liu Xiaofeng <xiaofeng.liu@6wind.com>
Co-authored-by: Quentin Monnet <quentin.monnet@6wind.com>
Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
ovs_mutex_cond_wait() is used in many functions in dpif-netdev to
synchronize with pmd threads, but we can't guarantee that the callers do
not hold RCU references, so it's better to avoid quiescing.
In system_stats_thread_func() the code relied on ovs_mutex_cond_wait()
to introduce a quiescent state, so explicit calls to
ovsrcu_quiesce_start() and ovsrcu_quiesce_end() are added there.
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Tested-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ilya Maximets <i.maximets@samsung.com>
Acked-by: Ben Pfaff <blp@ovn.org>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
A new thread must be started in a non quiescent state. There is a call
to ovsrcu_quiesce_end() in ovsthread_wrapper(), to enforce this.
ovs_thread_create(), instead, is executed in the parent thread. It must
call ovsrcu_quiesce_end() on its first invocation, to put the main
thread in a non quiescent state. On every other invocation, it doesn't
make sense to alter the calling thread state, so this commits wraps the
call to ovsrcu_quiesce_end() in an ovsthread_once construct.
This fixes a bug in ovs-rcu where the first call in the process to
ovsrcu_quiesce_start() will not be honored, because the calling thread
will need to create the 'urcu' thread (and creating a thread will
wrongly end its quiescent state).
ovsrcu_quiesce_start()
ovs_rcu_quiesced()
if (ovsthread_once_start(&once)) {
ovs_thread_create("urcu") /*This will end the quiescent state*/
}
This bug affects in particular ovs-vswitchd with DPDK.
In the DPDK case the first threads created are "vhost_thread" and
"dpdk_watchdog". If dpdk_watchdog is the first to call
ovsrcu_quiesce_start() (via xsleep()), the call is not honored and
the RCU grace period lasts at least for DPDK_PORT_WATCHDOG_INTERVAL
(5s on current master). If vhost_thread, on the other hand, is the
first to call ovsrcu_quiesce_start(), the call is not honored and the
RCU grace period lasts undefinitely, because no more calls to
ovsrcu_quiesce_start() are issued from vhost_thread.
For some reason (it's a race condition after all), on current master,
dpdk_watchdog will always be the first to call ovsrcu_quiesce_start(),
but with the upcoming DPDK database configuration changes, sometimes
vhost_thread will issue the first call to ovsrcu_quiesce_start().
Sample ovs-vswitchd.log:
2016-03-23T22:34:28.532Z|00004|ovs_rcu(urcu3)|WARN|blocked 8000 ms
waiting for vhost_thread2 to quiesce
2016-03-23T22:34:30.501Z|00118|ovs_rcu|WARN|blocked 8000 ms waiting for
vhost_thread2 to quiesce
2016-03-23T22:34:36.532Z|00005|ovs_rcu(urcu3)|WARN|blocked 16000 ms
waiting for vhost_thread2 to quiesce
2016-03-23T22:34:38.501Z|00119|ovs_rcu|WARN|blocked 16000 ms waiting for
vhost_thread2 to quiesce
The commit also adds a test for the ovs-rcu module to make sure that:
* A new thread is started in a non quiescent state.
* The first call to ovsrcu_quiesce_start() is honored.
* When a process becomes multithreaded the main thread is put in an
active state
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Acked-by: Ben Pfaff <blp@ovn.org>
The pthread_attr object needs to be passed to the pthread_create()
call in order to make use of it.
Fixes: 8147cec9ee (lib/ovs-thread: Ensure that thread stacks are
always at least 512 kB.)
Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
Acked-by: Andy Zhou <azhou@ovn.org>
'Unreasonably long poll interval's are reasonable for PMD threads.
Also reporting of high CPU usage is not necessary.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
'n' is the number of keys, which are grouped into blocks of L2_SIZE
indexes. Even if only one key in a block is allocated, the whole block has
a pointer to it that must be freed. Thus, we need to round up instead of
down.
Reported-at: https://github.com/openvswitch/ovs/pull/87
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Debian likes to enable -Wformat-zero-length, even over our code trying to
disable it. It isn't too hard to make our code warning-free against this
option, so this commit both stops disabling it and fixes the warnings.
The first fix is to change set_subprogram_name() to take a plain string
instead of a format string, and to adjust its few callers. This fixes one
warning since one of those callers passed in an empty string.
The second fix is to remove a test for ovs_scan() against an empty string.
I couldn't find a way to avoid a warning for this test, and it isn't too
valuable in any case.
This allows us to drop filtering for -Wformat from the Debian rules file,
so this commit removes it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Windows uses pthreads-win32 library to provide the Linux pthread
functionality. It is observed that when the main thread calls
a pthread destructor after it exits, undefined behavior is seen
(e.g., junk values in data, causing pthread deadlocks).
Similar behavior has been seen by
other people as seen in the following email thread:
https://sourceware.org/ml/pthreads-win32/2003/msg00001.html
To avoid this, this commit de-registers the thread destructor
when the main thread exits (via the atexit handler).
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We used to reserve DPDK lcore 0 for non pmd operations, making it
difficult to use core 0 for packet processing.
DPDK 2.0 properly support non EAL threads with lcore LCORE_ID_ANY.
Using non EAL threads for non pmd threads, we do not need to reserve
any core for non pmd operations
Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
A new function vlog_insert_module() is introduced to avoid using
list_insert() from the vlog.h header.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Expose the struct ovs_list definition in <openvswitch/list.h>. Keep the
list access API private for now.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
struct list is a common name and can't be used in public headers.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Ben Pfaff <blp@nicira.com>
We can use a normal bool and rely on the mutex_lock/unlock and an
atomic_thread_fence for synchronization.
Also flip the return value of ovsthread_once_start__() to match the
one of ovsthread_once_start().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
barrier->count is used as a simple counter and is not expected the
synchronize the state of any other variable, so we can use atomic_count,
which uses relaxed atomics.
Ditto for the 'next_id' within ovsthread_wrapper().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Seq objects would be really hard to use if they did not provide
acquire-release semantics. Currently they do that via
ovs_mutex_lock()/ovs_mutex_unlock(), respectively. Document the
behavior so that it is safer to rely on that elsewhere.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Without the explicit wide type, the shift operation may be performed
on a int which will result in implementation defined behaviour on a
system with more than 32 CPUs.
Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
DPDK mempools rely on rte_lcore_id() to implement a thread-local cache.
Our non pmd threads had rte_lcore_id() == 0. This allowed concurrent access to
the "thread-local" cache, causing crashes.
This commit resolves the issue with the following changes:
- Every non pmd thread has the same lcore_id (0, for management reasons), which
is not shared with any pmd thread (lcore_id for pmd threads now start from 1)
- DPDK mbufs must be allocated/freed in pmd threads. When there is the need to
use mempools in non pmd threads, like in dpdk_do_tx_copy(), a mutex must be
held.
- The previous change does not allow us anymore to pass DPDK mbufs to handler
threads: therefore this commit partially revert 143859ec63. Now packets
are copied for upcall processing. We can remove the extra memcpy by
processing upcalls in the pmd thread itself.
With the introduction of the extra locking, the packet throughput will be lower
in the following cases:
- When using internal (tap) devices with DPDK devices on the same datapath.
Anyway, to support internal devices efficiently, we needed DPDK KNI devices,
which will be proper pmd devices and will not need this locking.
- When packets are processed in the slow path by non pmd threads. This overhead
can be avoided by handling the upcalls directly in pmd threads (a change that
has already been proposed by Ryan Wilson)
Also, the following two fixes have been introduced:
- In dpdk_free_buf() use rte_pktmbuf_free_seg() instead of rte_mempool_put().
This allows OVS to run properly with CONFIG_RTE_LIBRTE_MBUF_DEBUG DPDK option
- Do not bulk free mbufs in a transmission queue. They may belong to different
mempools
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Non-leader revalidator thread uses pthread_barrier_* functions in their
main loop to synchronize with leader thread. However, since those threads
only call poll_block() intermittently, the poll interval check in
poll_block() can wrongly take the time since last call as poll interval
and issue the following warnings:
"Unreasonably long XXXXms poll interval".
To prevent it, this commit implements the barrier struct and operations
for OVS which allow thread to block on barrier via poll_block().
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Between fork() and execvp() calls in the process_start()
function both child and parent processes share the same
file descriptors. This means that, if a child process
received a signal during this time interval, then it could
potentially write data to a shared file descriptor.
One such example is fatal signal handler, where, if
child process received SIGTERM signal, then it would
write data into pipe. Then a read event would occur
on the other end of the pipe where parent process is
listening and this would make parent process to incorrectly
believe that it was the one who received SIGTERM.
Also, since parent process never reads data from this
pipe, then this bug would make parent process to consume
100% CPU by immediately waking up from the event loop.
This patch will help to avoid this problem by blocking
signals until child closes all its file descriptors.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Reported-by: Suganya Ramachandran <suganyar@vmware.com>
Issue: 1255110
This makes the message issued refer to the file and line that called
ovs_mutex_lock(), instead of to the file and line *inside*
ovs_mutex_lock().
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
With glibc, a mutex or rwlock filled with all-zero-bytes is properly
initialized for use, but this is not true for any other libc that OVS
supports. However, OVS gets a lot more testing with glibc than any other
libc. This means that developers keep introducing bugs that do not
manifest on the main development platform.
This commit should help avoid the problem, by reusing the existing 'where'
members to indicate whether a mutex or rwlock has been initialized.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Thread names are occasionally very useful for debugging, but from time to
time we've forgotten to set one. This commit adds the new thread's name
as a parameter to the function to start a thread, to make that mistake
impossible. This also simplifies code, since two function calls become
only one.
This makes a few other changes to the thread creation function:
* Since it is no longer a direct wrapper around a pthread function,
rename it to avoid giving that impression.
* Remove 'pthread_attr_t *' param that every caller supplied as NULL.
* Change 'pthread *' parameter into a return value, for convenience.
The system-stats code hadn't set a thread name, so this fixes that issue.
This patch is a prerequisite for making RCU report the name of a thread
that is blocking RCU synchronization, because the easiest way to do that is
for ovsrcu_quiesce_end() to record the current thread's name.
ovsrcu_quiesce_end() is called before the thread function is called, so it
won't get a name set within the thread function itself. Setting the thread
name earlier, as in this patch, avoids the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
Otherwise the udpif revalidator threads can postpone RCU callbacks
essentially forever, especially if there are many revalidator threads and
little network traffic.
Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
This allows clients to do more than just increment a counter. The
following commit will make the first use of that feature.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
RCU allows multiple threads to read objects in parallel without any
performance penalty. The following commit will introduce the first use.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
We use the number of cpu cores to determine the number
of threads that we spawn. We are not yet sure what is
the ideal number of OVS userspace threads that can run
on Hyper-V. Till we figure that out, use the same logic
of counting CPU cores in Windows too.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
glibc supports two kinds of rwlocks:
- The default kind of rwlock always allows recursive read-locks to
succeed, but threads blocked on acquiring the write-lock are treated
unfairly, causing them to be delayed indefinitely as long as new
readers continue to come along.
- An alternative "writer nonrecursive" rwlock allows recursive
read-locks to succeed only if there are no threads waiting for the
write-lock. Otherwise, recursive read-lock attempts deadlock in
the presence of blocking write-lock attempts. However, this kind
of rwlock is fair to writer.
POSIX allows the latter behavior, which essentially means that any portable
pthread program cannot try to take read-locks recursively. Since that's
true, we might as well use the latter kind of rwlock with glibc and get the
benefit of fairness of writers.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
A couple of times I've wanted to create a dynamic data structure that has
thread-specific data, but I've not been able to do that because
PTHREAD_KEYS_MAX is so low (POSIX says at least 128, glibc is only a little
bigger at 1024). This commit introduces a new form of thread-specific data
that supports a large number of items.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
ovsthread_counter is an abstract interface that could be implemented
different ways. The initial implementation is simple but less than
optimally efficient.
Signed-off-by: Ben Pfaff <blp@nicira.com>
ofproto_set_threads() uses the calculation MAX(count_cpu_cores() - 2, 1)
to decide on the default thread count. However, count_cpu_cores() returns
0 if it can't count the number of cores, or 1 if there's only one core,
and that causes the calculation to come out as UINT_MAX-2 or UINT_MAX-1,
respectively, which causes a memory allocation failure later.
There are other ways to fix this problem, too, of course.
Signed-off-by: Ben Pfaff <blp@nicira.com>
On systems that provide /proc/cpuinfo similar to Linux on x86, this
should allow us to choose a better default value for the number of
upcall handler threads -- in particular, it avoids counting
hyper-thread cores. If /proc/cpuinfo cannot be parsed for any reason,
fall back to using sysconf().
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
I don't see any other way to make Clang realize that these are the real
mutex implementation functions.
I first noticed these warnings with Clang 1:3.4~svn188890-1~exp1.
I previously used version 1:3.4~svn187484-1~exp1.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
We've seen a number of deadlocks in the tree since thread safety was
introduced. So far, all of these are self-deadlocks, that is, a single
thread acquiring a lock and then attempting to re-acquire the same lock
recursively. When this has happened, the process simply hung, and it was
somewhat difficult to find the cause.
POSIX "error-checking" mutexes check for this specific problem (and
others). This commit switches from other types of mutexes to
error-checking mutexes everywhere that we can, that is, everywhere that
we're not using recursive mutexes. This ought to help find problems more
quickly in the future.
There might be performance advantages to other kinds of mutexes in some
cases. However, the existing mutex type choices were just guesses, so I'd
rather go for easy detection of errors until we know that other mutex
types actually perform better in specific cases. Also, I did a quick
microbenchmark of glibc mutex types on my host and found that the
error checking mutexes weren't any slower than the other types, at least
when the mutex is uncontended.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
I foresee a need for possibly large numbers of instances of "struct
seq" (which is introduced in an upcoming patch). Each struct seq
needs some per-thread data. POSIX has pthread_key_t for this, but
the number of keys can be fairly limited, to as few as 128. It is
reasonable to work around this by using a hash table indexed on the
current thread. That only works if one can get a thread identifier
that is hashable (pthread_t is not). This patch introduces a
hashable thread identifier.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Commit 97be153858 (clang: Add
annotations for thread safety check.) defined 'struct ovs_mutex'
variable in 'atomic_flag' in 'ovs-atomic-pthreads.h'. This
casued "mutex: has incomplete type" error in compilation when
'ovs-atomic-pthreads.h' is included.
This commit goes back to use 'pthread_mutex_t' for that variable
and adds test for the 'atomic_flag' related functions.
Reported-by: Gurucharan Shetty <gshetty@nicira.com>
Signed-off-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>