Use generic names hash_add() and hash_finish() instead of mhash_*
equivalents. This makes future changes to hash implentations more
localized.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
ovsrcu_init() was named after the atomic_init(), but the semantics are
different enough to warrant a different name. Basically C11
atomic_init is defined in a way that allows the implementation to
assign the value without any syncronization, so in theory stores via
atomic_init could be seen by others after stores via atomic_set, even
if the atomic_init appeared earlier in the code.
ovsrcu_set_hidden() can be used to set an RCU protected variable when
it is not yet accessible by any active reader, but will be made
visible later via an ovsrcu_set call on some other pointer.
This patch also adds a new ovsrcu_init() that can be used to initilize
RCU protected variables when the readers are not yet executing. The
new ovsrcu_init() is implemented with atomic_init(), so it does not
provide any kind of syncronization.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
The documentation of the memory order argument of atomic operations
states that memory loads after an atomic load with a
memory_order_acquire cannot be moved before the atomic operation.
This still leaves open the possibility that non-atomic loads before
the atomic load could be moved after it, hence breaking down the
synchronization used for cmap_find().
This patch fixes this my using atomics (with appropriate memory_order)
also for the struct cmap_node pointer and hash values.
struct cmap_node itself is used for the pointer, since it already
contains an RCU pointer (which is atomic) for a struct cmap_node.
This also helps simplify some of the code previously dealing
separately with the cmap_node pointer in the bucket v.s. a cmap_node.
Hash values are also accessed using atomics. Otherwise it might be
legal for a compiler to read the hash values once, and keep using the
same values through all the retries.
These should have no effect on performance.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Add cmap_replace() and cmap_first(), as well as CMAP_FOR_EACH_SAFE and
CMAP_FOR_EACH_CONTINUE to make porting existing hmap using code a bit
easier.
CMAP_FOR_EACH_SAFE is useful in RCU postponed destructors, when it is
known that additional postponing is not needed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Release memory ordering only affects visibility of stores, and is not
allowed on a memory read. Some compilers enforce this, making this code
fail to compile.
Reported-by: Alex Wang <alexw@nicira.com>
Reported-by: Kmindg G <kmindg@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>