MSVC converts 64 bit read/writes into two instructions (uses 'mov' as
seen through cl //FAs). So there is a possibility that an interrupt can
make a 64 bit read/write non-atomic even when 8 byte aligned. So we cannot
use a simple assignment. Use a full memory barrier function instead.
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
MSVC does not support c11 style atomics for the C compiler.
Windows has different InterLocked* functions for different data
sizes. ovs-atomic-msvc.h maps the api in ovs-atomic.h (which is similar
to c11 atomics) to the available atomic functions in Windows. In some
cases, this causes compiler warnings about mismatched data sizes because
the generated code has 'if else' conditions on different data sizes and
proper casting is not possible.
In current OVS code base, we get one compiler warning through ovs-rcu.h
which says "‘void *’ differs in levels of indirection from LONGLONG."
This comes from the following in ovs-atomic-msvc.h for atomic_read64():
*(DST) = InterlockedOr64((int64_t volatile *) (SRC), 0);
when *DST is a void pointer (because InterLockedOr64 returns LONGLONG).
But this code path is only every hit for 64 bit data. So it should be safe to
disable the warning. (Any real bugs in api calls would hopefully be caught
while compiling on Linux using gcc/clang).
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Eitan Eliahu <eliahue@vmware.com>
Before this change (i.e., with pthread locks for atomics on Windows),
the benchmark for cmap and hmap was as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 61070 ms
cmap iterate: 2750 ms
cmap search: 14238 ms
cmap destroy: 8354 ms
hmap insert: 1701 ms
hmap iterate: 985 ms
hmap search: 3755 ms
hmap destroy: 1052 ms
After this change, the benchmark is as follows:
$ ./tests/ovstest.exe test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 3666 ms
cmap iterate: 365 ms
cmap search: 2016 ms
cmap destroy: 1331 ms
hmap insert: 1495 ms
hmap iterate: 1026 ms
hmap search: 4167 ms
hmap destroy: 1046 ms
So there is clearly a big improvement for cmap.
But the correspondig test on Linux (with gcc 4.6) yeilds the following:
./tests/ovstest test-cmap benchmark 10000000 3 1
Benchmarking with n=10000000, 3 threads, 1.00% mutations:
cmap insert: 3917 ms
cmap iterate: 355 ms
cmap search: 871 ms
cmap destroy: 1158 ms
hmap insert: 1988 ms
hmap iterate: 1005 ms
hmap search: 5428 ms
hmap destroy: 980 ms
So for this particular test, except for "cmap search", Windows and
Linux have similar performance. Windows is around 2.5x slower in "cmap search"
compared to Linux. This has to be investigated.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
[With a lot of inputs and help from Jarno]
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>