GCC 14 started to advertise c_atomic extension, older versions didn't
do that. Add check for __clang__, so GCC doesn't include headers
designed for Clang.
Another option would be to prefer stdatomic implementation instead,
but some older versions of Clang are not able to use stdatomic.h
supplied by GCC as described in commit:
07ece367fb5f ("ovs-atomic: Prefer Clang intrinsics over <stdatomic.h>.")
This change fixes OVS build with GCC on Fedora Rawhide (40).
Reported-by: Jakob Meng <code@jakobmeng.de>
Acked-by: Jakob Meng <jmeng@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Simon Horman <horms@ovn.org>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
ATOMIC_VAR_INIT has a trivial definition
`#define ATOMIC_VAR_INIT(value) (value)`,
is deprecated in C17/C++20, and will be removed in newer standards in
newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
The atomic exchange operation is a useful primitive that should be
available as well. Most compilers already expose or offer a way
to use it, but a single symbol needs to be defined.
Signed-off-by: Gaetan Rivet <grive@u256.net>
Reviewed-by: Eli Britstein <elibr@nvidia.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
G++ 5 does not implement the _Atomic keyword, which is part of C11 but not
C++11, so the existing <stdatomic.h> based atomic implementation doesn't
work. This commit adds a new implementation based on the C++11 <atomic>
header.
In this area, C++ is pickier about types than C, so a few of the
definitions in ovs-atomic.h have to be updated to use more precise types
for integer constants.
This updates the code that generates cxxtest.cc to #include <config.h>
(so that HAVE_ATOMIC is defined) and to automatically regenerate when the
program is reconfigured (because otherwise the #include <config.h>) won't
get added without a "make clean" step).
"ovs-atomic.h" is not a public header, but apparently some code was
using it anyway.
Fixes: 9c463631e8145 ("ovs-atomic: Report error for contradictory configuration.")
Reported-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
This is essentially a revert of commit e09d61c41b4f ("ovs-atomic: Remove
atomic_uint64_t and atomic_int64_t.") My fear that some 32-bit platforms
did not support 64-bit integers seems overblown, because OVS 2.6.x uses
the 64-bit atomic_ullong and it is in Debian, which has tons of
architectures.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Simon Horman <simon.horman@netronome.com>
This patch enables atomics on x64 builds.
Reuse the atomics defined for x86 and add atomics for 64 bit reads/writes.
Before this patch the cmap test gives us:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations, batch size 1:
cmap insert: 20100 ms
cmap iterate: 2967 ms
batch search: 10929 ms
cmap destroy: 13489 ms
cmap insert: 20079 ms
cmap iterate: 2953 ms
cmap search: 10559 ms
cmap destroy: 13486 ms
hmap insert: 2021 ms
hmap iterate: 1162 ms
hmap search: 5152 ms
hmap destroy: 1158 ms
After this change we have:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations, batch size 1:
cmap insert: 2953 ms
cmap iterate: 267 ms
batch search: 2193 ms
cmap destroy: 2037 ms
cmap insert: 2909 ms
cmap iterate: 267 ms
cmap search: 2167 ms
cmap destroy: 2087 ms
hmap insert: 1853 ms
hmap iterate: 1086 ms
hmap search: 4395 ms
hmap destroy: 1140 ms
We should probably revisit this file and investigate it further to see if
we can squeeze more performance.
As a side effect fix tests on x64 because usage of `ovs-atomic-pthreads.h`
is currently broken.
Signed-off-by: Alin Gabriel Serdean <aserdean@ovn.org>
Suggested-by: Ben Pfaff <blp@ovn.org>
Acked-by: Ben Pfaff <blp@ovn.org>
A user reported that GCC 5.x was using the atomic fallback for GCC 4.x
because the test
#elif __GNUC__ >= 4 && __GNUC_MINOR__ >= 7
didn't include GCC 5. However, GCC 5+ has <stdatomic.h> and shouldn't use
any of the GCC-specific cases at all. I think that this user was actually
pulling our atomics out into third-party code that probably didn't define
HAVE_STDATOMIC_H properly, so this commit both avoids that problem for
them in the future and clarifies the intent of the ovs-atomic header.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
This attempts to prevent namespace collisions with other list libraries
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
There was an unwanted change to ovs-atomic.h header made by the
recirculation patch, ee25964a60c6b2c6e60a4c5fbfc9e90cf304f970 commit.
This patch reverts that change.
Signed-off-by: Sorin Vinturis <svinturis@cloudbasesolutions.com>
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
Recirculation support for the OVS extension.
Tested using PING and iperf with Driver Verifier enabled.
Signed-off-by: Sorin Vinturis <svinturis@cloudbasesolutions.com>
Co-authored-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
Reported-by: Sorin Vinturis <svinturis@cloudbasesolutions.com>
Reported-at: https://github.com/openvswitch/ovs-issues/issues/104
Acked-by: Nithin Raju <nithin@vmware.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
On my Debian "jessie" system, <stdatomic.h> provided by GCC 4.9 is busted
when Clang 3.5 tries to use it. Even a trivial program like this:
#include <stdatomic.h>
void
foo(void)
{
_Atomic(int) x;
atomic_fetch_add(&x, 1);
}
yields:
atomic.c:7:5: error: address argument to atomic operation must be a
pointer to integer or pointer ('_Atomic(int) *' invalid)
The Clang-specific version of ovs-atomic.h stills works, though, so this
commit works around the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Before this change (i.e., with pthread locks for atomics on Windows),
the benchmark for cmap and hmap was as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 61070 ms
cmap iterate: 2750 ms
cmap search: 14238 ms
cmap destroy: 8354 ms
hmap insert: 1701 ms
hmap iterate: 985 ms
hmap search: 3755 ms
hmap destroy: 1052 ms
After this change, the benchmark is as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 3666 ms
cmap iterate: 365 ms
cmap search: 2016 ms
cmap destroy: 1331 ms
hmap insert: 1495 ms
hmap iterate: 1026 ms
hmap search: 4167 ms
hmap destroy: 1046 ms
So there is clearly a big improvement for cmap.
But the correspondig test on Linux (with gcc 4.6) yeilds the following:
./tests/ovstest test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 3917 ms
cmap iterate: 355 ms
cmap search: 871 ms
cmap destroy: 1158 ms
hmap insert: 1988 ms
hmap iterate: 1005 ms
hmap search: 5428 ms
hmap destroy: 980 ms
So for this particular test, except for "cmap search", Windows and
Linux have similar performance. Windows is around 2.5x slower in "cmap search"
compared to Linux. This has to be investigated.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
[With a lot of inputs and help from Jarno]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
When an atomic variable is not serving to synchronize threads about
the state of other (atomic or non-atomic) variables, no memory barrier
is needed with the atomic operation. However, the default memory
order for an atomic operation is memory_order_seq_cst, which always
causes a system-wide locking of the memory bus and prevents both the
CPU and the compiler from reordering memory accesses accross the
atomic operation. This can add considerable stalls as each atomic
operation (regardless of memory order) always includes a memory
access.
In most cases we can let the compiler reorder memory accesses to
minimize the time we spend waiting for the completion of the atomic
memory accesses by using the relaxed memory order. This patch adds
helpers to make such accesses a little easier on the eye (and the
fingers :-), but does not try to hide them completely.
Following patches make use of these and remove all the (implied)
memory_order_seq_cst use from the OVS code base.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
ovs_refcount_unref() needs to syncronize with the other instances of
itself rather than with ovs_refcount_ref().
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
XenServer runs OVS in dom0, which is a 32-bit VM. As the build
environment lacks support for atomics, locked pthread atomics were
used with considerable performance hit.
This patch adds native support for ovs-atomic with 32-bit Pentium and
higher CPUs, when compiled with an older GCC. We use inline asm with
the cmpxchg8b instruction, which was a new instruction to Intel
Pentium processors. We do not expect anyone to run OVS on 486 or older
processor.
cmap benchmark before the patch on 32-bit XenServer build (uses
ovs-atomic-pthread):
$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 8835 ms
cmap iterate: 379 ms
cmap search: 6242 ms
cmap destroy: 1145 ms
After:
$ tests/ovstest test-cmap benchmark 2000000 8 0.1
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 711 ms
cmap iterate: 68 ms
cmap search: 353 ms
cmap destroy: 209 ms
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Some supported XenServer build environments lack compiler support for
atomic operations. This patch provides native support for x86_64 on
GCC, which covers possible future 64-bit builds on XenServer.
Since this implementation is faster than the existing support prior to
GCC 4.7, especially for cmap inserts, we use this with GCC < 4.7 on
x86_64.
Example numbers with "tests/test-cmap benchmark 2000000 8 0.1" on
quad-core hyperthreaded laptop, built with GCC 4.6 -O2:
Using ovs-atomic-pthreads on x86_64:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 4725 ms
cmap iterate: 329 ms
cmap search: 5945 ms
cmap destroy: 911 ms
Using ovs-atomic-gcc4+ on x86_64:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 845 ms
cmap iterate: 58 ms
cmap search: 308 ms
cmap destroy: 295 ms
With the native support provided by this patch:
Benchmarking with n=2000000, 8 threads, 0.10% mutations:
cmap insert: 530 ms
cmap iterate: 59 ms
cmap search: 305 ms
cmap destroy: 232 ms
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Compiler implementations may provide sub-optimal support for a
memory_order passed in as a run-time value (ref.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html).
Document that OVS atomics require the memory order to be passed in as
a compile-time constant.
It should be noted, however, that when inlining is disabled (i.e.,
compiling without optimization) even compile-time constants may be
passed as run-time values to (non-inlined) functions.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
The definition of memory_order_relaxed included a compiler barrier,
while it is not necessary, and indeed the following text on
atomic_thread_fence and atomic_signal_fence contradicted that.
memory_order_consume and memory_order_acq_rel are also more thoroughly
described.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
When a reference counted object is also RCU protected the deletion of
the object's memory is always postponed. This allows
memory_order_relaxed to be used also for unreferencing, as RCU
quiescing provides a full memory barrier (it has to, or otherwise
there could be lingering accesses to objects after they are recycled).
Also, when access to the reference counted object is protected via a
mutex or a lock, the locking primitives provide the required memory
barrier functionality.
Also, add ovs_refcount_try_ref_rcu(), which takes a reference only if
the refcount is non-zero and returns true if a reference was taken,
false otherwise. This can be used in combined RCU/refcount scenarios
where we have an RCU protected reference to an refcounted object, but
which may be unref'ed at any time. If ovs_refcount_try_ref_rcu()
fails, the object may still be safely used until the current thread
quiesces.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Use explicit variants of atomic operations for the ovs_refcount to
avoid the overhead of the default memory_order_seq_cst.
Adding a reference requires no memory ordering, as the calling thread
is already assumed to have protected access to the object being
reference counted. Hence, memory_order_relaxed is used for
ovs_refcount_ref(). ovs_refcount_read() does not change the reference
count, so it can also use memory_order_relaxed.
Unreferencing an object needs a release barrier, so that none of the
accesses to the protected object are reordered after the atomic
decrement operation. Additionally, an explicit acquire barrier is
needed before the object is recycled, to keep the subsequent accesses
to the object's memory from being reordered before the atomic
decrement operation.
This patch follows the memory ordering and argumentation discussed
here:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
OVS is slow when compiled with pthreads atomics. Add a generic note
in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment
provides additional detail.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Some concern has been raised by Ben Pfaff that atomic_uint64_t may not
be portable. In particular on 32bit platforms that do not have atomic
64bit integers.
Now that there are no longer any users of atomic_uint64_t remove it
entirely. Also remove atomic_int64_t which has no users.
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Pfaff <blp@nicira.com>
None of the atomic implementations need a destroy function anymore, so it's
"more standard" and more convenient for users to get rid of them.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Every implementation used this same code, so it makes sense to centralize
it.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Until now, the GCC 4+ and pthreads implementations of atomics have used
struct wrappers for their atomic types. This had the advantage of allowing
a mutex to be wrapped in, in some cases, and of better type-checking by
preventing stray uses of atomic variables other than through one of the
atomic_*() functions or macros. However, the mutex meant that an
atomic_destroy() function-like macro needed to be used. The struct wrapper
also made it impossible to define new atomic types that were compatible
with each other without using a typedef. For example, one could not simply
define a macro like
#define ATOMIC(TYPE) struct { TYPE value; }
and then have two declarations like:
ATOMIC(void *) x;
ATOMIC(void *) y;
and do anything with these objects that require type-compatibility, even
"&x == &y", because the two structs are not compatible. One can do it
through a typedef:
typedef ATOMIC(void *) atomic_voidp;
atomic_voidp x, y;
but that is inconvenient, especially because of the need to invent a name
for the type.
This commit aims to ease the problem by getting rid of the wrapper structs
in the cases where the atomic library used them. It gets rid of the
mutexes, in the cases where they are still needed, by using a global
array of mutexes instead.
This commit also defines the ATOMIC macro described above and documents
its use in ovs-atomic.h.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
This is a thin wrapper around an atomic_uint. It is useful anyhow because
each ovs_refcount_ref() or ovs_refcount_unref() call saves a few lines of
code.
This commit also changes all the potential direct users over to use the new
data structure.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Standard C11 doesn't need these functions because it is able to require
implementations not to need them. But we can't construct a portable
implementation that does not need them in every case, so this commit adds
them.
These functions are only needed for atomic_flag objects that are
dynamically allocated (because statically allocated objects can use
ATOMIC_FLAG_INIT). So far there aren't any of those, but an upcoming
commit will introduce one.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
With this implementation I get warnings with Clang on GNU/Linux when the
previous patch is not applied. This ought to make it easier to avoid
introducing new problems in the future even without building on FreeBSD.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
We found out earlier that GCC sometimes produces an error only at link time
for atomic built-ins that are not supported on a platform. This actually
tries the link at configure time and should thus reliably detect whether
the atomic built-ins are really supported.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>
This library should prove useful for the threading changes coming up.
The following commit introduces one (very simple) user.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Ethan Jackson <ethan@nicira.com>