cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
/*
|
2016-04-22 16:51:03 -07:00
|
|
|
* Copyright (c) 2014, 2016 Nicira, Inc.
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
* You may obtain a copy of the License at:
|
|
|
|
*
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
*
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
* limitations under the License.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef CMAP_H
|
|
|
|
#define CMAP_H 1
|
|
|
|
|
|
|
|
#include <stdbool.h>
|
|
|
|
#include <stdint.h>
|
|
|
|
#include "ovs-rcu.h"
|
|
|
|
#include "util.h"
|
|
|
|
|
|
|
|
/* Concurrent hash map
|
|
|
|
* ===================
|
|
|
|
*
|
|
|
|
* A single-writer, multiple-reader hash table that efficiently supports
|
|
|
|
* duplicates.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Thread-safety
|
|
|
|
* =============
|
|
|
|
*
|
|
|
|
* The general rules are:
|
|
|
|
*
|
2014-05-28 16:56:29 -07:00
|
|
|
* - Only a single thread may safely call into cmap_insert(),
|
|
|
|
* cmap_remove(), or cmap_replace() at any given time.
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*
|
|
|
|
* - Any number of threads may use functions and macros that search or
|
|
|
|
* iterate through a given cmap, even in parallel with other threads
|
2014-05-28 16:56:29 -07:00
|
|
|
* calling cmap_insert(), cmap_remove(), or cmap_replace().
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*
|
|
|
|
* There is one exception: cmap_find_protected() is only safe if no thread
|
2014-05-28 16:56:29 -07:00
|
|
|
* is currently calling cmap_insert(), cmap_remove(), or cmap_replace().
|
|
|
|
* (Use ordinary cmap_find() if that is not guaranteed.)
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*
|
|
|
|
* - See "Iteration" below for additional thread safety rules.
|
|
|
|
*
|
|
|
|
* Writers must use special care to ensure that any elements that they remove
|
|
|
|
* do not get freed or reused until readers have finished with them. This
|
|
|
|
* includes inserting the element back into its original cmap or a different
|
|
|
|
* one. One correct way to do this is to free them from an RCU callback with
|
|
|
|
* ovsrcu_postpone().
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* A concurrent hash map node, to be embedded inside the data structure being
|
|
|
|
* mapped.
|
|
|
|
*
|
|
|
|
* All nodes linked together on a chain have exactly the same hash value. */
|
|
|
|
struct cmap_node {
|
|
|
|
OVSRCU_TYPE(struct cmap_node *) next; /* Next node with same hash. */
|
|
|
|
};
|
|
|
|
|
|
|
|
static inline struct cmap_node *
|
|
|
|
cmap_node_next(const struct cmap_node *node)
|
|
|
|
{
|
|
|
|
return ovsrcu_get(struct cmap_node *, &node->next);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct cmap_node *
|
|
|
|
cmap_node_next_protected(const struct cmap_node *node)
|
|
|
|
{
|
|
|
|
return ovsrcu_get_protected(struct cmap_node *, &node->next);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Concurrent hash map. */
|
|
|
|
struct cmap {
|
|
|
|
OVSRCU_TYPE(struct cmap_impl *) impl;
|
|
|
|
};
|
|
|
|
|
2016-04-22 16:51:03 -07:00
|
|
|
/* Initializer for an empty cmap. */
|
|
|
|
#define CMAP_INITIALIZER { \
|
|
|
|
.impl = OVSRCU_INITIALIZER((struct cmap_impl *) &empty_cmap) \
|
|
|
|
}
|
|
|
|
extern OVS_ALIGNED_VAR(CACHE_LINE_SIZE) const struct cmap_impl empty_cmap;
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
/* Initialization. */
|
|
|
|
void cmap_init(struct cmap *);
|
|
|
|
void cmap_destroy(struct cmap *);
|
|
|
|
|
|
|
|
/* Count. */
|
|
|
|
size_t cmap_count(const struct cmap *);
|
|
|
|
bool cmap_is_empty(const struct cmap *);
|
|
|
|
|
2014-09-30 12:37:52 -07:00
|
|
|
/* Insertion and deletion. Return the current count after the operation. */
|
|
|
|
size_t cmap_insert(struct cmap *, struct cmap_node *, uint32_t hash);
|
|
|
|
static inline size_t cmap_remove(struct cmap *, struct cmap_node *,
|
|
|
|
uint32_t hash);
|
|
|
|
size_t cmap_replace(struct cmap *, struct cmap_node *old_node,
|
|
|
|
struct cmap_node *new_node, uint32_t hash);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
|
|
/* Search.
|
|
|
|
*
|
|
|
|
* These macros iterate NODE over all of the nodes in CMAP that have hash value
|
|
|
|
* equal to HASH. MEMBER must be the name of the 'struct cmap_node' member
|
|
|
|
* within NODE.
|
|
|
|
*
|
|
|
|
* CMAP and HASH are evaluated only once. NODE is evaluated many times.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Thread-safety
|
|
|
|
* =============
|
|
|
|
*
|
2014-09-24 10:39:20 -07:00
|
|
|
* CMAP_NODE_FOR_EACH will reliably visit each of the nodes starting with
|
|
|
|
* CMAP_NODE, even with concurrent insertions and deletions. (Of
|
|
|
|
* course, if nodes are being inserted or deleted, it might or might not visit
|
|
|
|
* the nodes actually being inserted or deleted.)
|
|
|
|
*
|
|
|
|
* CMAP_NODE_FOR_EACH_PROTECTED may only be used if the containing CMAP is
|
|
|
|
* guaranteed not to change during iteration. It may be only slightly faster.
|
|
|
|
*
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
* CMAP_FOR_EACH_WITH_HASH will reliably visit each of the nodes with the
|
|
|
|
* specified hash in CMAP, even with concurrent insertions and deletions. (Of
|
|
|
|
* course, if nodes with the given HASH are being inserted or deleted, it might
|
|
|
|
* or might not visit the nodes actually being inserted or deleted.)
|
|
|
|
*
|
|
|
|
* CMAP_FOR_EACH_WITH_HASH_PROTECTED may only be used if CMAP is guaranteed not
|
|
|
|
* to change during iteration. It may be very slightly faster.
|
|
|
|
*/
|
2014-09-24 10:39:20 -07:00
|
|
|
#define CMAP_NODE_FOR_EACH(NODE, MEMBER, CMAP_NODE) \
|
|
|
|
for (INIT_CONTAINER(NODE, CMAP_NODE, MEMBER); \
|
|
|
|
(NODE) != OBJECT_CONTAINING(NULL, NODE, MEMBER); \
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
ASSIGN_CONTAINER(NODE, cmap_node_next(&(NODE)->MEMBER), MEMBER))
|
2014-09-24 10:39:20 -07:00
|
|
|
#define CMAP_NODE_FOR_EACH_PROTECTED(NODE, MEMBER, CMAP_NODE) \
|
|
|
|
for (INIT_CONTAINER(NODE, CMAP_NODE, MEMBER); \
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
(NODE) != OBJECT_CONTAINING(NULL, NODE, MEMBER); \
|
|
|
|
ASSIGN_CONTAINER(NODE, cmap_node_next_protected(&(NODE)->MEMBER), \
|
|
|
|
MEMBER))
|
2014-09-24 10:39:20 -07:00
|
|
|
#define CMAP_FOR_EACH_WITH_HASH(NODE, MEMBER, HASH, CMAP) \
|
|
|
|
CMAP_NODE_FOR_EACH(NODE, MEMBER, cmap_find(CMAP, HASH))
|
|
|
|
#define CMAP_FOR_EACH_WITH_HASH_PROTECTED(NODE, MEMBER, HASH, CMAP) \
|
2018-02-24 15:34:39 +08:00
|
|
|
CMAP_NODE_FOR_EACH_PROTECTED(NODE, MEMBER, cmap_find_protected(CMAP, HASH))
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
const struct cmap_node *cmap_find(const struct cmap *, uint32_t hash);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
struct cmap_node *cmap_find_protected(const struct cmap *, uint32_t hash);
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
/* Looks up multiple 'hashes', when the corresponding bit in 'map' is 1,
|
|
|
|
* and sets the corresponding pointer in 'nodes', if the hash value was
|
|
|
|
* found from the 'cmap'. In other cases the 'nodes' values are not changed,
|
|
|
|
* i.e., no NULL pointers are stored there.
|
|
|
|
* Returns a map where a bit is set to 1 if the corresponding 'nodes' pointer
|
|
|
|
* was stored, 0 otherwise.
|
|
|
|
* Generally, the caller wants to use CMAP_NODE_FOR_EACH to verify for
|
|
|
|
* hash collisions. */
|
|
|
|
unsigned long cmap_find_batch(const struct cmap *cmap, unsigned long map,
|
|
|
|
uint32_t hashes[],
|
|
|
|
const struct cmap_node *nodes[]);
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
/* Iteration.
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Thread-safety
|
|
|
|
* =============
|
|
|
|
*
|
|
|
|
* Iteration is safe even in a cmap that is changing concurrently. However:
|
|
|
|
*
|
|
|
|
* - In the presence of concurrent calls to cmap_insert(), any given
|
|
|
|
* iteration might skip some nodes and might visit some nodes more than
|
|
|
|
* once. If this is a problem, then the iterating code should lock the
|
|
|
|
* data structure (a rwlock can be used to allow multiple threads to
|
|
|
|
* iterate in parallel).
|
|
|
|
*
|
|
|
|
* - Concurrent calls to cmap_remove() don't have the same problem. (A
|
|
|
|
* node being deleted may be visited once or not at all. Other nodes
|
|
|
|
* will be visited once.)
|
|
|
|
*
|
2016-02-02 20:55:48 -08:00
|
|
|
* - If the cmap is changing, it is not safe to quiesce while iterating.
|
|
|
|
* Even if the changes are done by the same thread that's performing the
|
|
|
|
* iteration (Corollary: it is not safe to call cmap_remove() and quiesce
|
|
|
|
* in the loop body).
|
|
|
|
*
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*
|
|
|
|
* Example
|
|
|
|
* =======
|
|
|
|
*
|
|
|
|
* struct my_node {
|
|
|
|
* struct cmap_node cmap_node;
|
|
|
|
* int extra_data;
|
|
|
|
* };
|
|
|
|
*
|
|
|
|
* struct cmap my_map;
|
2018-02-27 22:51:47 -08:00
|
|
|
* struct my_node *my_node;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*
|
2018-02-27 22:51:47 -08:00
|
|
|
* cmap_init(&my_map);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
* ...add data...
|
2018-02-27 22:51:47 -08:00
|
|
|
* CMAP_FOR_EACH (my_node, cmap_node, &my_map) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
* ...operate on my_node...
|
|
|
|
* }
|
|
|
|
*
|
2014-07-29 09:02:23 -07:00
|
|
|
* CMAP_FOR_EACH is "safe" in the sense of HMAP_FOR_EACH_SAFE. That is, it is
|
|
|
|
* safe to free the current node before going on to the next iteration. Most
|
|
|
|
* of the time, though, this doesn't matter for a cmap because node
|
|
|
|
* deallocation has to be postponed until the next grace period. This means
|
|
|
|
* that this guarantee is useful only in deallocation code already executing at
|
|
|
|
* postponed time, when it is known that the RCU grace period has already
|
|
|
|
* expired.
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
*/
|
2014-06-11 11:07:43 -07:00
|
|
|
|
2014-07-29 09:02:23 -07:00
|
|
|
#define CMAP_CURSOR_FOR_EACH__(NODE, CURSOR, MEMBER) \
|
|
|
|
((CURSOR)->node \
|
2014-09-09 14:23:07 -07:00
|
|
|
? (INIT_CONTAINER(NODE, (CURSOR)->node, MEMBER), \
|
2014-07-29 09:02:23 -07:00
|
|
|
cmap_cursor_advance(CURSOR), \
|
|
|
|
true) \
|
|
|
|
: false)
|
|
|
|
|
|
|
|
#define CMAP_CURSOR_FOR_EACH(NODE, MEMBER, CURSOR, CMAP) \
|
|
|
|
for (*(CURSOR) = cmap_cursor_start(CMAP); \
|
|
|
|
CMAP_CURSOR_FOR_EACH__(NODE, CURSOR, MEMBER); \
|
2014-07-21 21:00:04 -07:00
|
|
|
)
|
|
|
|
|
2014-07-29 09:02:23 -07:00
|
|
|
#define CMAP_CURSOR_FOR_EACH_CONTINUE(NODE, MEMBER, CURSOR) \
|
|
|
|
while (CMAP_CURSOR_FOR_EACH__(NODE, CURSOR, MEMBER))
|
2014-05-28 16:56:29 -07:00
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
struct cmap_cursor {
|
|
|
|
const struct cmap_impl *impl;
|
|
|
|
uint32_t bucket_idx;
|
|
|
|
int entry_idx;
|
2014-07-21 21:00:04 -07:00
|
|
|
struct cmap_node *node;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
};
|
|
|
|
|
2014-07-21 21:00:04 -07:00
|
|
|
struct cmap_cursor cmap_cursor_start(const struct cmap *);
|
|
|
|
void cmap_cursor_advance(struct cmap_cursor *);
|
2014-06-11 11:07:43 -07:00
|
|
|
|
2014-07-29 09:02:23 -07:00
|
|
|
#define CMAP_FOR_EACH(NODE, MEMBER, CMAP) \
|
2014-07-21 21:00:04 -07:00
|
|
|
for (struct cmap_cursor cursor__ = cmap_cursor_start(CMAP); \
|
2014-07-29 09:02:23 -07:00
|
|
|
CMAP_CURSOR_FOR_EACH__(NODE, &cursor__, MEMBER); \
|
2014-07-21 21:00:04 -07:00
|
|
|
)
|
2014-06-11 11:07:43 -07:00
|
|
|
|
2014-05-28 16:56:29 -07:00
|
|
|
static inline struct cmap_node *cmap_first(const struct cmap *);
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
/* Another, less preferred, form of iteration, for use in situations where it
|
|
|
|
* is difficult to maintain a pointer to a cmap_node. */
|
|
|
|
struct cmap_position {
|
|
|
|
unsigned int bucket;
|
|
|
|
unsigned int entry;
|
|
|
|
unsigned int offset;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct cmap_node *cmap_next_position(const struct cmap *,
|
|
|
|
struct cmap_position *);
|
|
|
|
|
2014-05-28 16:56:29 -07:00
|
|
|
/* Returns the first node in 'cmap', in arbitrary order, or a null pointer if
|
|
|
|
* 'cmap' is empty. */
|
|
|
|
static inline struct cmap_node *
|
|
|
|
cmap_first(const struct cmap *cmap)
|
|
|
|
{
|
|
|
|
struct cmap_position pos = { 0, 0, 0 };
|
|
|
|
|
|
|
|
return cmap_next_position(cmap, &pos);
|
|
|
|
}
|
|
|
|
|
2014-09-30 12:37:52 -07:00
|
|
|
/* Removes 'node' from 'cmap'. The caller must ensure that 'cmap' cannot
|
|
|
|
* change concurrently (from another thread).
|
|
|
|
*
|
|
|
|
* 'node' must not be destroyed or modified or inserted back into 'cmap' or
|
|
|
|
* into any other concurrent hash map while any other thread might be accessing
|
|
|
|
* it. One correct way to do this is to free it from an RCU callback with
|
|
|
|
* ovsrcu_postpone().
|
|
|
|
*
|
|
|
|
* Returns the current number of nodes in the cmap after the removal. */
|
|
|
|
static inline size_t
|
|
|
|
cmap_remove(struct cmap *cmap, struct cmap_node *node, uint32_t hash)
|
|
|
|
{
|
|
|
|
return cmap_replace(cmap, node, NULL, hash);
|
|
|
|
}
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
#endif /* cmap.h */
|