2
0
mirror of https://github.com/openvswitch/ovs synced 2025-10-15 14:17:18 +00:00
Commit Graph

24 Commits

Author SHA1 Message Date
Ben Pfaff
07ece367fb ovs-atomic: Prefer Clang intrinsics over <stdatomic.h>.
On my Debian "jessie" system, <stdatomic.h> provided by GCC 4.9 is busted
when Clang 3.5 tries to use it.  Even a trivial program like this:

    #include <stdatomic.h>

    void
    foo(void)
    {
         _Atomic(int) x;
         atomic_fetch_add(&x, 1);
}

yields:

     atomic.c:7:5: error: address argument to atomic operation must be a
        pointer to integer or pointer ('_Atomic(int) *' invalid)

The Clang-specific version of ovs-atomic.h stills works, though, so this
commit works around the problem.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-11-11 13:55:02 -08:00
Gurucharan Shetty
ec2d2b5f03 ovs-atomics: Add atomic support Windows.
Before this change (i.e., with pthread locks for atomics on Windows),
the benchmark for cmap and hmap was as follows:

$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert:  61070 ms
cmap iterate:  2750 ms
cmap search:  14238 ms
cmap destroy:  8354 ms

hmap insert:   1701 ms
hmap iterate:   985 ms
hmap search:   3755 ms
hmap destroy:  1052 ms

After this change, the benchmark is as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert:   3666 ms
cmap iterate:   365 ms
cmap search:   2016 ms
cmap destroy:  1331 ms

hmap insert:   1495 ms
hmap iterate:  1026 ms
hmap search:   4167 ms
hmap destroy:  1046 ms

So there is clearly a big improvement for cmap.

But the correspondig test on Linux (with gcc 4.6) yeilds the following:

./tests/ovstest test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert:   3917 ms
cmap iterate:   355 ms
cmap search:    871 ms
cmap destroy:  1158 ms

hmap insert:   1988 ms
hmap iterate:  1005 ms
hmap search:   5428 ms
hmap destroy:   980 ms

So for this particular test, except for "cmap search", Windows and
Linux have similar performance. Windows is around 2.5x slower in "cmap search"
compared to Linux. This has to be investigated.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
[With a lot of inputs and help from Jarno]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-09-04 15:57:26 -07:00
Jarno Rajahalme
a36cd3ff27 lib/ovs-atomic: Add atomic_count.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:34:52 -07:00
Jarno Rajahalme
b119717bcd lib/ovs-atomic: Add helpers for relaxed atomic access.
When an atomic variable is not serving to synchronize threads about
the state of other (atomic or non-atomic) variables, no memory barrier
is needed with the atomic operation.  However, the default memory
order for an atomic operation is memory_order_seq_cst, which always
causes a system-wide locking of the memory bus and prevents both the
CPU and the compiler from reordering memory accesses accross the
atomic operation.  This can add considerable stalls as each atomic
operation (regardless of memory order) always includes a memory
access.

In most cases we can let the compiler reorder memory accesses to
minimize the time we spend waiting for the completion of the atomic
memory accesses by using the relaxed memory order.  This patch adds
helpers to make such accesses a little easier on the eye (and the
fingers :-), but does not try to hide them completely.

Following patches make use of these and remove all the (implied)
memory_order_seq_cst use from the OVS code base.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:34:52 -07:00
Jarno Rajahalme
2864e627e5 lib/ovs-atomic: Clarified comments on ovs_refcount_unref().
ovs_refcount_unref() needs to syncronize with the other instances of
itself rather than with ovs_refcount_ref().

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-29 10:34:52 -07:00
Jarno Rajahalme
105a9298e9 lib/ovs-atomic: Native support for 32-bit 586 with GCC.
XenServer runs OVS in dom0, which is a 32-bit VM.  As the build
environment lacks support for atomics, locked pthread atomics were
used with considerable performance hit.

This patch adds native support for ovs-atomic with 32-bit Pentium and
higher CPUs, when compiled with an older GCC.  We use inline asm with
the cmpxchg8b instruction, which was a new instruction to Intel
Pentium processors.  We do not expect anyone to run OVS on 486 or older
processor.

cmap benchmark before the patch on 32-bit XenServer build (uses
ovs-atomic-pthread):

$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:   8835 ms
cmap iterate:   379 ms
cmap search:   6242 ms
cmap destroy:  1145 ms

After:

$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:    711 ms
cmap iterate:    68 ms
cmap search:    353 ms
cmap destroy:   209 ms

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 13:51:19 -07:00
Jarno Rajahalme
f31841d570 lib/ovs-atomic: Native support for x86_64 with GCC.
Some supported XenServer build environments lack compiler support for
atomic operations.  This patch provides native support for x86_64 on
GCC, which covers possible future 64-bit builds on XenServer.

Since this implementation is faster than the existing support prior to
GCC 4.7, especially for cmap inserts, we use this with GCC < 4.7 on
x86_64.

Example numbers with "tests/test-cmap benchmark 2000000 8 0.1" on
quad-core hyperthreaded laptop, built with GCC 4.6 -O2:

Using ovs-atomic-pthreads on x86_64:

Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:   4725 ms
cmap iterate:   329 ms
cmap search:   5945 ms
cmap destroy:   911 ms

Using ovs-atomic-gcc4+ on x86_64:

Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:    845 ms
cmap iterate:    58 ms
cmap search:    308 ms
cmap destroy:   295 ms

With the native support provided by this patch:

Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert:    530 ms
cmap iterate:    59 ms
cmap search:    305 ms
cmap destroy:   232 ms

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 13:51:19 -07:00
Jarno Rajahalme
15ba057e05 lib/ovs-atomic: Require memory_order be constant.
Compiler implementations may provide sub-optimal support for a
memory_order passed in as a run-time value (ref.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).

Document that OVS atomics require the memory order to be passed in as
a compile-time constant.

It should be noted, however, that when inlining is disabled (i.e.,
compiling without optimization) even compile-time constants may be
passed as run-time values to (non-inlined) functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 13:51:19 -07:00
Jarno Rajahalme
5e99c681c8 lib/ovs-atomic: Elaborate memory_order documentation.
The definition of memory_order_relaxed included a compiler barrier,
while it is not necessary, and indeed the following text on
atomic_thread_fence and atomic_signal_fence contradicted that.

memory_order_consume and memory_order_acq_rel are also more thoroughly
described.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-08-05 13:51:19 -07:00
Jarno Rajahalme
6969766b75 lib/ovs-atomic: Add ovs_refcount_unref_relaxed(), ovs_refcount_try_ref_rcu().
When a reference counted object is also RCU protected the deletion of
the object's memory is always postponed.  This allows
memory_order_relaxed to be used also for unreferencing, as RCU
quiescing provides a full memory barrier (it has to, or otherwise
there could be lingering accesses to objects after they are recycled).

Also, when access to the reference counted object is protected via a
mutex or a lock, the locking primitives provide the required memory
barrier functionality.

Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if
the refcount is non-zero and returns true if a reference was taken,
false otherwise.  This can be used in combined RCU/refcount scenarios
where we have an RCU protected reference to an refcounted object, but
which may be unref'ed at any time.  If ovs_refcount_try_ref_rcu()
fails, the object may still be safely used until the current thread
quiesces.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-07-07 13:20:04 -07:00
Jarno Rajahalme
25045d755e lib/ovs-atomic: Add atomic compare_exchange.
Add support for atomic compare_exchange operations.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-07-07 13:19:45 -07:00
Jarno Rajahalme
541bfad20a ovs-atomic: Use explicit memory order for ovs_refcount.
Use explicit variants of atomic operations for the ovs_refcount to
avoid the overhead of the default memory_order_seq_cst.

Adding a reference requires no memory ordering, as the calling thread
is already assumed to have protected access to the object being
reference counted.  Hence, memory_order_relaxed is used for
ovs_refcount_ref().  ovs_refcount_read() does not change the reference
count, so it can also use memory_order_relaxed.

Unreferencing an object needs a release barrier, so that none of the
accesses to the protected object are reordered after the atomic
decrement operation.  Additionally, an explicit acquire barrier is
needed before the object is recycled, to keep the subsequent accesses
to the object's memory from being reordered before the atomic
decrement operation.

This patch follows the memory ordering and argumentation discussed
here:

http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
2014-07-07 13:18:46 -07:00
Jarno Rajahalme
29d204effc INSTALL: Note about compiler atomics support.
OVS is slow when compiled with pthreads atomics.  Add a generic note
in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment
provides additional detail.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-06-04 09:20:46 -07:00
Simon Horman
e09d61c41b ovs-atomic: Remove atomic_uint64_t and atomic_int64_t.
Some concern has been raised by Ben Pfaff that atomic_uint64_t may not
be portable. In particular on 32bit platforms that do not have atomic
64bit integers.

Now that there are no longer any users of atomic_uint64_t remove it
entirely. Also remove atomic_int64_t which has no users.

Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-05-16 09:48:20 -07:00
Ben Pfaff
8917f72cbb ovs-atomic: Delete atomic, atomic_flag, ovs_refcount destroy functions.
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-13 12:45:47 -07:00
Ben Pfaff
6a36690c20 ovs-atomic-types: Move into ovs-atomic.h.
Every implementation used this same code, so it makes sense to centralize
it.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-13 12:45:42 -07:00
Ben Pfaff
1bd2c9edc3 ovs-atomic: Use raw types, not structs, when locks are required.
Until now, the GCC 4+ and pthreads implementations of atomics have used
struct wrappers for their atomic types.  This had the advantage of allowing
a mutex to be wrapped in, in some cases, and of better type-checking by
preventing stray uses of atomic variables other than through one of the
atomic_*() functions or macros.  However, the mutex meant that an
atomic_destroy() function-like macro needed to be used.  The struct wrapper
also made it impossible to define new atomic types that were compatible
with each other without using a typedef.  For example, one could not simply
define a macro like
    #define ATOMIC(TYPE) struct { TYPE value; }
and then have two declarations like:
    ATOMIC(void *) x;
    ATOMIC(void *) y;
and do anything with these objects that require type-compatibility, even
"&x == &y", because the two structs are not compatible.  One can do it
through a typedef:
    typedef ATOMIC(void *) atomic_voidp;
    atomic_voidp x, y;
but that is inconvenient, especially because of the need to invent a name
for the type.

This commit aims to ease the problem by getting rid of the wrapper structs
in the cases where the atomic library used them.  It gets rid of the
mutexes, in the cases where they are still needed, by using a global
array of mutexes instead.

This commit also defines the ATOMIC macro described above and documents
its use in ovs-atomic.h.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-03-13 12:45:22 -07:00
Ben Pfaff
37bec3d330 ovs-atomic: Introduce a new 'struct ovs_refcount'.
This is a thin wrapper around an atomic_uint.  It is useful anyhow because
each ovs_refcount_ref() or ovs_refcount_unref() call saves a few lines of
code.

This commit also changes all the potential direct users over to use the new
data structure.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:30 -08:00
Ben Pfaff
c5f81b20da ovs-atomic: Add atomic_destroy() and use everywhere it is needed.
C11 is able to require that atomics don't need to be destroyed, but some
of the OVS implementations do.

Signed-off-by: Ben Pfaff <blp@nicira.com>
2014-01-08 17:13:30 -08:00
Ben Pfaff
4597d2a572 ovs-atomic: New functions atomic_flag_init(), atomic_flag_destroy().
Standard C11 doesn't need these functions because it is able to require
implementations not to need them.  But we can't construct a portable
implementation that does not need them in every case, so this commit adds
them.

These functions are only needed for atomic_flag objects that are
dynamically allocated (because statically allocated objects can use
ATOMIC_FLAG_INIT).  So far there aren't any of those, but an upcoming
commit will introduce one.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2014-01-08 17:13:28 -08:00
Ben Pfaff
29ab0cf77c ovs-atomic: Add native Clang implementation.
With this implementation I get warnings with Clang on GNU/Linux when the
previous patch is not applied.  This ought to make it easier to avoid
introducing new problems in the future even without building on FreeBSD.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2013-08-26 13:03:02 -07:00
Ben Pfaff
a54667e5d4 ovs-atomic: Fix typo in comment.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ed Maste <emaste@freebsd.org>
2013-08-21 09:58:38 -07:00
Ben Pfaff
15248032ea configure: Add configure-time check for GCC 4.0+ atomic built-ins.
We found out earlier that GCC sometimes produces an error only at link time
for atomic built-ins that are not supported on a platform.  This actually
tries the link at configure time and should thus reliably detect whether
the atomic built-ins are really supported.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2013-07-31 08:56:36 -07:00
Ben Pfaff
31a3fc6e3e ovs-atomic: New library for atomic operations.
This library should prove useful for the threading changes coming up.
The following commit introduces one (very simple) user.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
2013-06-28 16:09:36 -07:00