lib/cmap: cmap_find_batch().

Batching the cmap find improves the memory behavior with large cmaps and can make searches twice as fast: $ tests/ovstest test-cmap benchmark 2000000 8 0.1 16 Benchmarking with n=2000000, 8 threads, 0.10% mutations, batch size 16: cmap insert: 533 ms cmap iterate: 57 ms batch search: 146 ms cmap destroy: 233 ms cmap insert: 552 ms cmap iterate: 56 ms cmap search: 299 ms cmap destroy: 229 ms hmap insert: 222 ms hmap iterate: 198 ms hmap search: 2061 ms hmap destroy: 209 ms Batch size 1 has small performance penalty, but all other batch sizes are faster than non-batched cmap_find(). The batch size 16 was experimentally found better than 8 or 32, so now classifier_lookup_miniflow_batch() performs the cmap find operations in batches of 16. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
2025-10-11 13:57:52 +00:00 · 2014-09-24 10:39:20 -07:00
parent 55847abee8
commit 52a524eb20
7 changed files with 366 additions and 56 deletions
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2644,7 +2644,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
    enum { PKT_ARRAY_SIZE = NETDEV_MAX_RX_BATCH };
 #endif
    struct packet_batch batches[PKT_ARRAY_SIZE];
-    const struct miniflow *mfs[PKT_ARRAY_SIZE]; /* NULL at bad packets. */
+    const struct miniflow *mfs[PKT_ARRAY_SIZE]; /* May NOT be NULL. */
    struct cls_rule *rules[PKT_ARRAY_SIZE];
    struct dp_netdev *dp = pmd->dp;
    struct emc_cache *flow_cache = &pmd->flow_cache;
@@ -2652,7 +2652,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
    bool any_miss;

    for (i = 0; i < cnt; i++) {
-        mfs[i] = &keys[i].flow;
+        mfs[i] = &keys[i].flow; /* No bad packets! */
    }
    any_miss = !classifier_lookup_miniflow_batch(&dp->cls, mfs, rules, cnt);
    if (OVS_UNLIKELY(any_miss) && !fat_rwlock_tryrdlock(&dp->upcall_rwlock)) {