dpif-netdev: Add specialized generic scalar functions

This commit adds a number of specialized functions, that handle common miniflow fingerprints. This enables compiler optimization, resulting in higher performance. Below a quick description of how this optimization actually works; "Specialized functions" are "instances" of the generic implementation, but the compiler is given extra context when compiling. In the case of iterating miniflow datastructures, the most interesting value to enable compile time optimizations is the loop trip count per unit. In order to create a specialized function, there is a generic implementation, which uses a for() loop without the compiler knowing the loop trip count at compile time. The loop trip count is passed in as an argument to the function: uint32_t miniflow_impl_generic(struct miniflow *mf, uint32_t loop_count) { for(uint32_t i = 0; i < loop_count; i++) // do work } In order to "specialize" the function, we call the generic implementation with hard-coded numbers - these are compile time constants! uint32_t miniflow_impl_loop5(struct miniflow *mf, uint32_t loop_count) { // use hard coded constant for compile-time constant-propogation return miniflow_impl_generic(mf, 5); } Given the compiler is aware of the loop trip count at compile time, it can perform an optimization known as "constant propogation". Combined with inlining of the miniflow_impl_generic() function, the compiler is now enabled to *compile time* unroll the loop 5x, and produce "flat" code. The last step to using the specialized functions is to utilize a function-pointer to choose the specialized (or generic) implementation. The selection of the function pointer is performed at subtable creation time, when miniflow fingerprint of the subtable is known. This technique is known as "multiple dispatch" in some literature, as it uses multiple items of information (miniflow bit counts) to select the dispatch function. By pointing the function pointer at the optimized implementation, OvS benefits from the compile time optimizations at runtime. Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com> Tested-by: Malvika Gupta <malvika.gupta@arm.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
2025-09-05 08:45:23 +00:00 · 2019-07-18 14:03:06 +01:00
parent a0b36b3924
commit f54d8f004f
4 changed files with 68 additions and 2 deletions
--- a/lib/dpif-netdev-private.h
+++ b/lib/dpif-netdev-private.h
@@ -67,6 +67,14 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
                              const struct netdev_flow_key *keys[],
                              struct dpcls_rule **rules);

+/* Probe function to select a specialized version of the generic lookup
+ * implementation. This provides performance benefit due to compile-time
+ * optimizations such as loop-unrolling. These are enabled by the compile-time
+ * constants in the specific function implementations.
+ */
+dpcls_subtable_lookup_func
+dpcls_subtable_generic_probe(uint32_t u0_bit_count, uint32_t u1_bit_count);
+
 /* A set of rules that all have the same fields wildcarded. */
 struct dpcls_subtable {
    /* The fields are only used by writers. */