lib/cmap: cmap_find_batch().

Batching the cmap find improves the memory behavior with large cmaps and can make searches twice as fast: $ tests/ovstest test-cmap benchmark 2000000 8 0.1 16 Benchmarking with n=2000000, 8 threads, 0.10% mutations, batch size 16: cmap insert: 533 ms cmap iterate: 57 ms batch search: 146 ms cmap destroy: 233 ms cmap insert: 552 ms cmap iterate: 56 ms cmap search: 299 ms cmap destroy: 229 ms hmap insert: 222 ms hmap iterate: 198 ms hmap search: 2061 ms hmap destroy: 209 ms Batch size 1 has small performance penalty, but all other batch sizes are faster than non-batched cmap_find(). The batch size 16 was experimentally found better than 8 or 32, so now classifier_lookup_miniflow_batch() performs the cmap find operations in batches of 16. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
2025-08-31 06:15:47 +00:00 · 2014-09-24 10:39:20 -07:00
parent 55847abee8
commit 52a524eb20
7 changed files with 366 additions and 56 deletions
--- a/lib/bitmap.h
+++ b/lib/bitmap.h
@@ -114,4 +114,13 @@ bool bitmap_is_all_zeros(const unsigned long *, size_t n);
    for ((IDX) = bitmap_scan(BITMAP, 1, 0, SIZE); (IDX) < (SIZE);    \
         (IDX) = bitmap_scan(BITMAP, 1, (IDX) + 1, SIZE))

+/* More efficient access to a map of single ulong. */
+#define ULONG_FOR_EACH_1(IDX, MAP)                  \
+    for (unsigned long map__ = (MAP);               \
+         map__ && (((IDX) = raw_ctz(map__)), true); \
+         map__ = zero_rightmost_1bit(map__))
+
+#define ULONG_SET0(MAP, OFFSET) ((MAP) &= ~(1UL << (OFFSET)))
+#define ULONG_SET1(MAP, OFFSET) ((MAP) |= 1UL << (OFFSET))
+
 #endif /* bitmap.h */