cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
/*
|
2017-05-26 21:50:36 -07:00
|
|
|
|
* Copyright (c) 2008, 2009, 2010, 2013, 2014, 2016, 2017 Nicira, Inc.
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
*
|
|
|
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
* you may not use this file except in compliance with the License.
|
|
|
|
|
* You may obtain a copy of the License at:
|
|
|
|
|
*
|
|
|
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
*
|
|
|
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
* See the License for the specific language governing permissions and
|
|
|
|
|
* limitations under the License.
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
/* A non-exhaustive test for some of the functions and macros declared in
|
|
|
|
|
* cmap.h. */
|
|
|
|
|
|
|
|
|
|
#include <config.h>
|
2014-10-29 11:34:40 -07:00
|
|
|
|
#undef NDEBUG
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
#include "cmap.h"
|
2014-10-29 11:34:40 -07:00
|
|
|
|
#include <assert.h>
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
#include <getopt.h>
|
|
|
|
|
#include <string.h>
|
2014-10-29 11:34:40 -07:00
|
|
|
|
#include "bitmap.h"
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
#include "command-line.h"
|
|
|
|
|
#include "fat-rwlock.h"
|
|
|
|
|
#include "hash.h"
|
2016-07-12 16:37:34 -05:00
|
|
|
|
#include "openvswitch/hmap.h"
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
#include "ovstest.h"
|
|
|
|
|
#include "ovs-thread.h"
|
|
|
|
|
#include "random.h"
|
|
|
|
|
#include "timeval.h"
|
|
|
|
|
#include "util.h"
|
|
|
|
|
|
|
|
|
|
/* Sample cmap element. */
|
|
|
|
|
struct element {
|
|
|
|
|
int value;
|
|
|
|
|
struct cmap_node node;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
typedef size_t hash_func(int value);
|
|
|
|
|
|
|
|
|
|
static int
|
|
|
|
|
compare_ints(const void *a_, const void *b_)
|
|
|
|
|
{
|
|
|
|
|
const int *a = a_;
|
|
|
|
|
const int *b = b_;
|
|
|
|
|
return *a < *b ? -1 : *a > *b;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Verifies that 'cmap' contains exactly the 'n' values in 'values'. */
|
|
|
|
|
static void
|
|
|
|
|
check_cmap(struct cmap *cmap, const int values[], size_t n,
|
|
|
|
|
hash_func *hash)
|
|
|
|
|
{
|
2014-07-16 10:46:20 -07:00
|
|
|
|
int *sort_values, *cmap_values, *cmap_values2;
|
2014-06-11 11:07:43 -07:00
|
|
|
|
const struct element *e;
|
2014-09-24 10:39:20 -07:00
|
|
|
|
size_t i, batch_size;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
2014-07-16 10:46:20 -07:00
|
|
|
|
struct cmap_position pos = { 0, 0, 0 };
|
|
|
|
|
struct cmap_node *node;
|
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
/* Check that all the values are there in iteration. */
|
|
|
|
|
sort_values = xmalloc(sizeof *sort_values * n);
|
|
|
|
|
cmap_values = xmalloc(sizeof *sort_values * n);
|
2014-07-16 10:46:20 -07:00
|
|
|
|
cmap_values2 = xmalloc(sizeof *sort_values * n);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
2014-07-16 10:46:20 -07:00
|
|
|
|
/* Here we test cursor iteration */
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
i = 0;
|
2014-06-11 11:07:43 -07:00
|
|
|
|
CMAP_FOR_EACH (e, node, cmap) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
assert(i < n);
|
|
|
|
|
cmap_values[i++] = e->value;
|
|
|
|
|
}
|
|
|
|
|
assert(i == n);
|
2022-03-23 12:56:18 +01:00
|
|
|
|
assert(e == NULL);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
2014-07-16 10:46:20 -07:00
|
|
|
|
/* Here we test iteration with cmap_next_position() */
|
|
|
|
|
i = 0;
|
|
|
|
|
while ((node = cmap_next_position(cmap, &pos))) {
|
2014-09-15 10:10:34 -07:00
|
|
|
|
e = OBJECT_CONTAINING(node, e, node);
|
2014-07-16 10:46:20 -07:00
|
|
|
|
|
|
|
|
|
assert(i < n);
|
|
|
|
|
cmap_values2[i++] = e->value;
|
|
|
|
|
}
|
|
|
|
|
assert(i == n);
|
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
memcpy(sort_values, values, sizeof *sort_values * n);
|
|
|
|
|
qsort(sort_values, n, sizeof *sort_values, compare_ints);
|
|
|
|
|
qsort(cmap_values, n, sizeof *cmap_values, compare_ints);
|
2014-07-16 10:46:20 -07:00
|
|
|
|
qsort(cmap_values2, n, sizeof *cmap_values2, compare_ints);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
|
assert(sort_values[i] == cmap_values[i]);
|
2014-07-16 10:46:20 -07:00
|
|
|
|
assert(sort_values[i] == cmap_values2[i]);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
}
|
|
|
|
|
|
2014-07-16 10:46:20 -07:00
|
|
|
|
free(cmap_values2);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
free(cmap_values);
|
|
|
|
|
free(sort_values);
|
|
|
|
|
|
|
|
|
|
/* Check that all the values are there in lookup. */
|
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
|
size_t count = 0;
|
|
|
|
|
|
|
|
|
|
CMAP_FOR_EACH_WITH_HASH (e, node, hash(values[i]), cmap) {
|
|
|
|
|
count += e->value == values[i];
|
|
|
|
|
}
|
|
|
|
|
assert(count == 1);
|
2022-03-23 12:56:18 +01:00
|
|
|
|
assert(e == NULL);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
}
|
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
|
/* Check that all the values are there in batched lookup. */
|
|
|
|
|
batch_size = n % BITMAP_ULONG_BITS + 1;
|
|
|
|
|
for (i = 0; i < n; i += batch_size) {
|
|
|
|
|
unsigned long map;
|
|
|
|
|
uint32_t hashes[sizeof map * CHAR_BIT];
|
|
|
|
|
const struct cmap_node *nodes[sizeof map * CHAR_BIT];
|
|
|
|
|
size_t count = 0;
|
|
|
|
|
int k, j;
|
|
|
|
|
|
|
|
|
|
j = MIN(n - i, batch_size); /* Actual batch size. */
|
|
|
|
|
map = ~0UL >> (BITMAP_ULONG_BITS - j);
|
|
|
|
|
|
|
|
|
|
for (k = 0; k < j; k++) {
|
|
|
|
|
hashes[k] = hash(values[i + k]);
|
|
|
|
|
}
|
|
|
|
|
map = cmap_find_batch(cmap, map, hashes, nodes);
|
|
|
|
|
|
2015-06-25 09:18:38 -07:00
|
|
|
|
ULLONG_FOR_EACH_1(k, map) {
|
2014-09-24 10:39:20 -07:00
|
|
|
|
CMAP_NODE_FOR_EACH (e, node, nodes[k]) {
|
|
|
|
|
count += e->value == values[i + k];
|
|
|
|
|
}
|
2022-03-23 12:56:18 +01:00
|
|
|
|
assert(e == NULL);
|
2014-09-24 10:39:20 -07:00
|
|
|
|
}
|
|
|
|
|
assert(count == j); /* j elements in a batch. */
|
|
|
|
|
}
|
|
|
|
|
|
2014-05-28 16:56:29 -07:00
|
|
|
|
/* Check that cmap_first() returns NULL only when cmap_is_empty(). */
|
|
|
|
|
assert(!cmap_first(cmap) == cmap_is_empty(cmap));
|
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
/* Check counters. */
|
|
|
|
|
assert(cmap_is_empty(cmap) == !n);
|
|
|
|
|
assert(cmap_count(cmap) == n);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
shuffle(int *p, size_t n)
|
|
|
|
|
{
|
|
|
|
|
for (; n > 1; n--, p++) {
|
|
|
|
|
int *q = &p[random_range(n)];
|
|
|
|
|
int tmp = *p;
|
|
|
|
|
*p = *q;
|
|
|
|
|
*q = tmp;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Prints the values in 'cmap', plus 'name' as a title. */
|
|
|
|
|
static void OVS_UNUSED
|
|
|
|
|
print_cmap(const char *name, struct cmap *cmap)
|
|
|
|
|
{
|
|
|
|
|
struct cmap_cursor cursor;
|
|
|
|
|
struct element *e;
|
|
|
|
|
|
|
|
|
|
printf("%s:", name);
|
2014-06-11 11:07:43 -07:00
|
|
|
|
CMAP_CURSOR_FOR_EACH (e, node, &cursor, cmap) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
printf(" %d", e->value);
|
|
|
|
|
}
|
|
|
|
|
printf("\n");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Prints the 'n' values in 'values', plus 'name' as a title. */
|
|
|
|
|
static void OVS_UNUSED
|
|
|
|
|
print_ints(const char *name, const int *values, size_t n)
|
|
|
|
|
{
|
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
|
|
printf("%s:", name);
|
|
|
|
|
for (i = 0; i < n; i++) {
|
|
|
|
|
printf(" %d", values[i]);
|
|
|
|
|
}
|
|
|
|
|
printf("\n");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static size_t
|
|
|
|
|
identity_hash(int value)
|
|
|
|
|
{
|
|
|
|
|
return value;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static size_t
|
|
|
|
|
good_hash(int value)
|
|
|
|
|
{
|
|
|
|
|
return hash_int(value, 0x1234abcd);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static size_t
|
|
|
|
|
constant_hash(int value OVS_UNUSED)
|
|
|
|
|
{
|
|
|
|
|
return 123;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Tests basic cmap insertion and deletion. */
|
|
|
|
|
static void
|
2014-05-28 16:56:29 -07:00
|
|
|
|
test_cmap_insert_replace_delete(hash_func *hash)
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
{
|
|
|
|
|
enum { N_ELEMS = 1000 };
|
|
|
|
|
|
|
|
|
|
struct element elements[N_ELEMS];
|
2014-05-28 16:56:29 -07:00
|
|
|
|
struct element copies[N_ELEMS];
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
int values[N_ELEMS];
|
2016-04-22 16:51:03 -07:00
|
|
|
|
struct cmap cmap = CMAP_INITIALIZER;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
|
|
for (i = 0; i < N_ELEMS; i++) {
|
|
|
|
|
elements[i].value = i;
|
|
|
|
|
cmap_insert(&cmap, &elements[i].node, hash(i));
|
|
|
|
|
values[i] = i;
|
|
|
|
|
check_cmap(&cmap, values, i + 1, hash);
|
|
|
|
|
}
|
|
|
|
|
shuffle(values, N_ELEMS);
|
|
|
|
|
for (i = 0; i < N_ELEMS; i++) {
|
2014-05-28 16:56:29 -07:00
|
|
|
|
copies[values[i]].value = values[i];
|
|
|
|
|
cmap_replace(&cmap, &elements[values[i]].node,
|
|
|
|
|
&copies[values[i]].node, hash(values[i]));
|
|
|
|
|
check_cmap(&cmap, values, N_ELEMS, hash);
|
|
|
|
|
}
|
|
|
|
|
shuffle(values, N_ELEMS);
|
|
|
|
|
for (i = 0; i < N_ELEMS; i++) {
|
|
|
|
|
cmap_remove(&cmap, &copies[values[i]].node, hash(values[i]));
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
check_cmap(&cmap, values + (i + 1), N_ELEMS - (i + 1), hash);
|
|
|
|
|
}
|
|
|
|
|
cmap_destroy(&cmap);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
run_test(void (*function)(hash_func *))
|
|
|
|
|
{
|
|
|
|
|
hash_func *hash_funcs[] = { identity_hash, good_hash, constant_hash };
|
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
|
|
for (i = 0; i < ARRAY_SIZE(hash_funcs); i++) {
|
|
|
|
|
function(hash_funcs[i]);
|
|
|
|
|
printf(".");
|
|
|
|
|
fflush(stdout);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
2015-03-17 10:35:26 -04:00
|
|
|
|
run_tests(struct ovs_cmdl_context *ctx)
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
{
|
|
|
|
|
int n;
|
|
|
|
|
int i;
|
|
|
|
|
|
2015-03-17 10:35:26 -04:00
|
|
|
|
n = ctx->argc >= 2 ? atoi(ctx->argv[1]) : 100;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
for (i = 0; i < n; i++) {
|
2014-05-28 16:56:29 -07:00
|
|
|
|
run_test(test_cmap_insert_replace_delete);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
}
|
|
|
|
|
printf("\n");
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static int n_elems; /* Number of elements to insert. */
|
|
|
|
|
static int n_threads; /* Number of threads to search and mutate. */
|
|
|
|
|
static uint32_t mutation_frac; /* % mutations, as fraction of UINT32_MAX. */
|
2014-09-24 10:39:20 -07:00
|
|
|
|
static int n_batch; /* Number of elements in each batch. */
|
|
|
|
|
|
|
|
|
|
#define N_BATCH_MAX BITMAP_ULONG_BITS
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
|
|
|
|
static void benchmark_cmap(void);
|
2014-09-24 10:39:20 -07:00
|
|
|
|
static void benchmark_cmap_batched(void);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
static void benchmark_hmap(void);
|
|
|
|
|
|
|
|
|
|
static int
|
|
|
|
|
elapsed(const struct timeval *start)
|
|
|
|
|
{
|
|
|
|
|
struct timeval end;
|
|
|
|
|
|
|
|
|
|
xgettimeofday(&end);
|
|
|
|
|
return timeval_to_msec(&end) - timeval_to_msec(start);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
2015-03-17 10:35:26 -04:00
|
|
|
|
run_benchmarks(struct ovs_cmdl_context *ctx)
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
{
|
2015-03-17 10:35:26 -04:00
|
|
|
|
n_elems = strtol(ctx->argv[1], NULL, 10);
|
|
|
|
|
n_threads = strtol(ctx->argv[2], NULL, 10);
|
|
|
|
|
mutation_frac = strtod(ctx->argv[3], NULL) / 100.0 * UINT32_MAX;
|
|
|
|
|
n_batch = ctx->argc > 4 ? strtol(ctx->argv[4], NULL, 10) : 1;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
|
if (n_batch > N_BATCH_MAX) {
|
|
|
|
|
n_batch = N_BATCH_MAX;
|
|
|
|
|
}
|
|
|
|
|
printf("Benchmarking with n=%d, %d threads, %.2f%% mutations, batch size %d:\n",
|
|
|
|
|
n_elems, n_threads, (double) mutation_frac / UINT32_MAX * 100.,
|
|
|
|
|
n_batch);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
|
if (n_batch > 0) {
|
|
|
|
|
benchmark_cmap_batched();
|
|
|
|
|
}
|
|
|
|
|
putchar('\n');
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
benchmark_cmap();
|
|
|
|
|
putchar('\n');
|
|
|
|
|
benchmark_hmap();
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* cmap benchmark. */
|
|
|
|
|
|
|
|
|
|
static struct element *
|
|
|
|
|
find(const struct cmap *cmap, int value)
|
|
|
|
|
{
|
|
|
|
|
struct element *e;
|
|
|
|
|
|
|
|
|
|
CMAP_FOR_EACH_WITH_HASH (e, node, hash_int(value, 0), cmap) {
|
|
|
|
|
if (e->value == value) {
|
|
|
|
|
return e;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
struct cmap_aux {
|
|
|
|
|
struct ovs_mutex mutex;
|
|
|
|
|
struct cmap *cmap;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static void *
|
|
|
|
|
search_cmap(void *aux_)
|
|
|
|
|
{
|
|
|
|
|
struct cmap_aux *aux = aux_;
|
|
|
|
|
size_t i;
|
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
|
if (mutation_frac) {
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
|
|
|
|
struct element *e;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
|
if (random_uint32() < mutation_frac) {
|
|
|
|
|
ovs_mutex_lock(&aux->mutex);
|
|
|
|
|
e = find(aux->cmap, i);
|
|
|
|
|
if (e) {
|
|
|
|
|
cmap_remove(aux->cmap, &e->node, hash_int(e->value, 0));
|
|
|
|
|
}
|
|
|
|
|
ovs_mutex_unlock(&aux->mutex);
|
|
|
|
|
} else {
|
|
|
|
|
ignore(find(aux->cmap, i));
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
}
|
2014-09-24 10:39:20 -07:00
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
ignore(find(aux->cmap, i));
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
benchmark_cmap(void)
|
|
|
|
|
{
|
|
|
|
|
struct element *elements;
|
|
|
|
|
struct cmap cmap;
|
|
|
|
|
struct element *e;
|
|
|
|
|
struct timeval start;
|
|
|
|
|
pthread_t *threads;
|
|
|
|
|
struct cmap_aux aux;
|
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
|
|
elements = xmalloc(n_elems * sizeof *elements);
|
|
|
|
|
|
|
|
|
|
/* Insertions. */
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
cmap_init(&cmap);
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
|
|
|
|
elements[i].value = i;
|
|
|
|
|
cmap_insert(&cmap, &elements[i].node, hash_int(i, 0));
|
|
|
|
|
}
|
|
|
|
|
printf("cmap insert: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Iteration. */
|
|
|
|
|
xgettimeofday(&start);
|
2014-06-11 11:07:43 -07:00
|
|
|
|
CMAP_FOR_EACH (e, node, &cmap) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
ignore(e);
|
|
|
|
|
}
|
|
|
|
|
printf("cmap iterate: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Search and mutation. */
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
aux.cmap = &cmap;
|
|
|
|
|
ovs_mutex_init(&aux.mutex);
|
|
|
|
|
threads = xmalloc(n_threads * sizeof *threads);
|
|
|
|
|
for (i = 0; i < n_threads; i++) {
|
|
|
|
|
threads[i] = ovs_thread_create("search", search_cmap, &aux);
|
|
|
|
|
}
|
|
|
|
|
for (i = 0; i < n_threads; i++) {
|
|
|
|
|
xpthread_join(threads[i], NULL);
|
|
|
|
|
}
|
|
|
|
|
free(threads);
|
|
|
|
|
printf("cmap search: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Destruction. */
|
|
|
|
|
xgettimeofday(&start);
|
2014-06-11 11:07:43 -07:00
|
|
|
|
CMAP_FOR_EACH (e, node, &cmap) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
cmap_remove(&cmap, &e->node, hash_int(e->value, 0));
|
|
|
|
|
}
|
|
|
|
|
cmap_destroy(&cmap);
|
|
|
|
|
printf("cmap destroy: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
free(elements);
|
|
|
|
|
}
|
2014-09-24 10:39:20 -07:00
|
|
|
|
|
|
|
|
|
static size_t
|
|
|
|
|
find_batch(const struct cmap *cmap, const int value)
|
|
|
|
|
{
|
|
|
|
|
size_t i, ret;
|
|
|
|
|
const size_t end = MIN(n_batch, n_elems - value);
|
|
|
|
|
uint32_t hashes[N_BATCH_MAX];
|
|
|
|
|
const struct cmap_node *nodes[N_BATCH_MAX];
|
|
|
|
|
|
|
|
|
|
if (mutation_frac) {
|
|
|
|
|
for (i = 0; i < end; i++) {
|
|
|
|
|
if (random_uint32() < mutation_frac) {
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
hashes[i] = hash_int(value + i, 0);
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
for (i = 0; i < end; i++) {
|
|
|
|
|
hashes[i] = hash_int(value + i, 0);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
ret = i;
|
|
|
|
|
|
2017-05-26 21:50:36 -07:00
|
|
|
|
unsigned long map = i ? ~0UL >> (BITMAP_ULONG_BITS - i) : 0;
|
2014-09-24 10:39:20 -07:00
|
|
|
|
map = cmap_find_batch(cmap, map, hashes, nodes);
|
|
|
|
|
|
2015-06-25 09:18:38 -07:00
|
|
|
|
ULLONG_FOR_EACH_1(i, map) {
|
2014-09-24 10:39:20 -07:00
|
|
|
|
struct element *e;
|
|
|
|
|
|
|
|
|
|
CMAP_NODE_FOR_EACH (e, node, nodes[i]) {
|
|
|
|
|
if (OVS_LIKELY(e->value == value + i)) {
|
|
|
|
|
ignore(e); /* Found result. */
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return ret;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void *
|
|
|
|
|
search_cmap_batched(void *aux_)
|
|
|
|
|
{
|
|
|
|
|
struct cmap_aux *aux = aux_;
|
|
|
|
|
size_t i = 0, j;
|
|
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
|
struct element *e;
|
|
|
|
|
|
|
|
|
|
j = find_batch(aux->cmap, i);
|
|
|
|
|
i += j;
|
|
|
|
|
if (i >= n_elems) {
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
if (j < n_batch) {
|
|
|
|
|
ovs_mutex_lock(&aux->mutex);
|
|
|
|
|
e = find(aux->cmap, i);
|
|
|
|
|
if (e) {
|
|
|
|
|
cmap_remove(aux->cmap, &e->node, hash_int(e->value, 0));
|
|
|
|
|
}
|
|
|
|
|
ovs_mutex_unlock(&aux->mutex);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
benchmark_cmap_batched(void)
|
|
|
|
|
{
|
|
|
|
|
struct element *elements;
|
|
|
|
|
struct cmap cmap;
|
|
|
|
|
struct element *e;
|
|
|
|
|
struct timeval start;
|
|
|
|
|
pthread_t *threads;
|
|
|
|
|
struct cmap_aux aux;
|
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
|
|
elements = xmalloc(n_elems * sizeof *elements);
|
|
|
|
|
|
|
|
|
|
/* Insertions. */
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
cmap_init(&cmap);
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
|
|
|
|
elements[i].value = i;
|
|
|
|
|
cmap_insert(&cmap, &elements[i].node, hash_int(i, 0));
|
|
|
|
|
}
|
|
|
|
|
printf("cmap insert: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Iteration. */
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
CMAP_FOR_EACH (e, node, &cmap) {
|
|
|
|
|
ignore(e);
|
|
|
|
|
}
|
|
|
|
|
printf("cmap iterate: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Search and mutation. */
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
aux.cmap = &cmap;
|
|
|
|
|
ovs_mutex_init(&aux.mutex);
|
|
|
|
|
threads = xmalloc(n_threads * sizeof *threads);
|
|
|
|
|
for (i = 0; i < n_threads; i++) {
|
|
|
|
|
threads[i] = ovs_thread_create("search", search_cmap_batched, &aux);
|
|
|
|
|
}
|
|
|
|
|
for (i = 0; i < n_threads; i++) {
|
|
|
|
|
xpthread_join(threads[i], NULL);
|
|
|
|
|
}
|
|
|
|
|
free(threads);
|
|
|
|
|
printf("batch search: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Destruction. */
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
CMAP_FOR_EACH (e, node, &cmap) {
|
|
|
|
|
cmap_remove(&cmap, &e->node, hash_int(e->value, 0));
|
|
|
|
|
}
|
|
|
|
|
cmap_destroy(&cmap);
|
|
|
|
|
printf("cmap destroy: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
free(elements);
|
|
|
|
|
}
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
|
|
|
|
|
/* hmap benchmark. */
|
|
|
|
|
struct helement {
|
|
|
|
|
int value;
|
|
|
|
|
struct hmap_node node;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static struct helement *
|
|
|
|
|
hfind(const struct hmap *hmap, int value)
|
|
|
|
|
{
|
|
|
|
|
struct helement *e;
|
|
|
|
|
|
|
|
|
|
HMAP_FOR_EACH_WITH_HASH (e, node, hash_int(value, 0), hmap) {
|
|
|
|
|
if (e->value == value) {
|
|
|
|
|
return e;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
struct hmap_aux {
|
|
|
|
|
struct hmap *hmap;
|
|
|
|
|
struct fat_rwlock fatlock;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static void *
|
|
|
|
|
search_hmap(void *aux_)
|
|
|
|
|
{
|
|
|
|
|
struct hmap_aux *aux = aux_;
|
|
|
|
|
size_t i;
|
|
|
|
|
|
2014-09-24 10:39:20 -07:00
|
|
|
|
if (mutation_frac) {
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
if (random_uint32() < mutation_frac) {
|
|
|
|
|
struct helement *e;
|
|
|
|
|
|
|
|
|
|
fat_rwlock_wrlock(&aux->fatlock);
|
|
|
|
|
e = hfind(aux->hmap, i);
|
|
|
|
|
if (e) {
|
|
|
|
|
hmap_remove(aux->hmap, &e->node);
|
|
|
|
|
}
|
|
|
|
|
fat_rwlock_unlock(&aux->fatlock);
|
|
|
|
|
} else {
|
|
|
|
|
fat_rwlock_rdlock(&aux->fatlock);
|
|
|
|
|
ignore(hfind(aux->hmap, i));
|
|
|
|
|
fat_rwlock_unlock(&aux->fatlock);
|
|
|
|
|
}
|
2014-09-24 10:39:20 -07:00
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
|
|
|
|
ignore(hfind(aux->hmap, i));
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static void
|
|
|
|
|
benchmark_hmap(void)
|
|
|
|
|
{
|
|
|
|
|
struct helement *elements;
|
|
|
|
|
struct hmap hmap;
|
2022-03-23 12:56:17 +01:00
|
|
|
|
struct helement *e;
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
struct timeval start;
|
|
|
|
|
pthread_t *threads;
|
|
|
|
|
struct hmap_aux aux;
|
|
|
|
|
size_t i;
|
|
|
|
|
|
|
|
|
|
elements = xmalloc(n_elems * sizeof *elements);
|
|
|
|
|
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
hmap_init(&hmap);
|
|
|
|
|
for (i = 0; i < n_elems; i++) {
|
|
|
|
|
elements[i].value = i;
|
|
|
|
|
hmap_insert(&hmap, &elements[i].node, hash_int(i, 0));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
printf("hmap insert: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
HMAP_FOR_EACH (e, node, &hmap) {
|
|
|
|
|
ignore(e);
|
|
|
|
|
}
|
|
|
|
|
printf("hmap iterate: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
xgettimeofday(&start);
|
|
|
|
|
aux.hmap = &hmap;
|
|
|
|
|
fat_rwlock_init(&aux.fatlock);
|
|
|
|
|
threads = xmalloc(n_threads * sizeof *threads);
|
|
|
|
|
for (i = 0; i < n_threads; i++) {
|
|
|
|
|
threads[i] = ovs_thread_create("search", search_hmap, &aux);
|
|
|
|
|
}
|
|
|
|
|
for (i = 0; i < n_threads; i++) {
|
|
|
|
|
xpthread_join(threads[i], NULL);
|
|
|
|
|
}
|
|
|
|
|
free(threads);
|
|
|
|
|
printf("hmap search: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
/* Destruction. */
|
|
|
|
|
xgettimeofday(&start);
|
2022-03-23 12:56:17 +01:00
|
|
|
|
HMAP_FOR_EACH_SAFE (e, node, &hmap) {
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
hmap_remove(&hmap, &e->node);
|
|
|
|
|
}
|
|
|
|
|
hmap_destroy(&hmap);
|
|
|
|
|
printf("hmap destroy: %5d ms\n", elapsed(&start));
|
|
|
|
|
|
|
|
|
|
free(elements);
|
|
|
|
|
}
|
|
|
|
|
|
2015-03-16 12:01:55 -04:00
|
|
|
|
static const struct ovs_cmdl_command commands[] = {
|
2016-08-15 18:47:29 +00:00
|
|
|
|
{"check", NULL, 0, 1, run_tests, OVS_RO},
|
|
|
|
|
{"benchmark", NULL, 3, 4, run_benchmarks, OVS_RO},
|
|
|
|
|
{NULL, NULL, 0, 0, NULL, OVS_RO},
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
static void
|
2015-03-17 10:35:26 -04:00
|
|
|
|
test_cmap_main(int argc, char *argv[])
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
{
|
2015-03-17 10:35:26 -04:00
|
|
|
|
struct ovs_cmdl_context ctx = {
|
|
|
|
|
.argc = argc - optind,
|
|
|
|
|
.argv = argv + optind,
|
|
|
|
|
};
|
|
|
|
|
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
set_program_name(argv[0]);
|
2015-03-17 10:35:26 -04:00
|
|
|
|
ovs_cmdl_run_command(&ctx, commands);
|
cmap: New module for cuckoo hash table.
This implements an "optimistic concurrent cuckoo hash", a single-writer,
multiple-reader hash table data structure. The point of this data
structure is performance, so this commit message focuses on performance.
I tested the performance of cmap with the test-cmap utility included in
this commit. It takes three parameters for benchmarking:
- n, the number of elements to insert.
- n_threads, the number of threads to use for searching and
mutating the hash table.
- mutations, the percentage of operations that should modify the
hash table, from 0% to 100%.
e.g. "test-cmap 1000000 16 1" inserts one million elements, uses 16
threads, and 1% of the operations modify the hash table.
Any given run does the following for both hmap and cmap
implementations:
- Inserts n elements into a hash table.
- Iterates over all of the elements.
- Spawns n_threads threads, each of which searches for each of the
elements in the hash table, once, and removes the specified
percentage of them.
- Removes each of the (remaining) elements and destroys the hash
table.
and reports the time taken by each step,
The tables below report results for various parameters with a draft version
of this library. The tests were not formally rerun for the final version,
but the intermediate changes should only have improved performance, and
this seemed to be the case in some informal testing.
n_threads=16 was used each time, on a 16-core x86-64 machine. The compiler
used was Clang 3.5. (GCC yields different numbers but similar relative
results.)
The results show:
- Insertion is generally 3x to 5x faster in an hmap.
- Iteration is generally about 3x faster in a cmap.
- Search and mutation is 4x faster with .1% mutations and the
advantage grows as the fraction of mutations grows. This is
because a cmap does not require locking for read operations,
even in the presence of a writer.
With no mutations, however, no locking is required in the hmap
case, and the hmap is somewhat faster. This is because raw hmap
search is somewhat simpler and faster than raw cmap search.
- Destruction is faster, usually by less than 2x, in an hmap.
n=10,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 6132 2182 6136 2178 6111 2174 6124 2180
iterate: 370 1203 378 1201 370 1200 371 1202
search: 1375 8692 2393 28197 18402 80379 1281 1097
destroy: 1382 1187 1197 1034 324 241 1405 1205
n=1,000,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 311 25 310 60 311 59 310 60
iterate: 25 62 25 64 25 57 25 60
search: 115 637 197 2266 1803 7284 101 67
destroy: 103 64 90 59 25 13 104 66
n=100,000:
.1% mutations 1% mutations 10% mutations no mutations
cmap hmap cmap hmap cmap hmap cmap hmap
insert: 25 6 26 5 25 5 25 5
iterate: 1 3 1 3 1 3 2 3
search: 12 57 27 219 164 735 10 5
destroy: 5 3 6 3 2 1 6 4
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
2014-05-20 16:51:42 -07:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
OVSTEST_REGISTER("test-cmap", test_cmap_main);
|