dpif-netdev: Allow PMD auto load balance with cross-numa.

Previously auto load balance did not trigger a reassignment when there was any cross-numa polling as an rxq could be polled from a different numa after reassign and it could impact estimates. In the case where there is only one numa with pmds available, the same numa will always poll before and after reassignment, so estimates are valid. Allow PMD auto load balance to trigger a reassignment in this case. Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: David Marchand <david.marchand@redhat.com> Tested-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2025-08-22 01:51:26 +00:00 · 2021-03-18 11:34:04 +00:00 · 2021-03-18 11:34:04 +00:00 · ec68a877db
commit ec68a877db
parent edcfd7176f
3 changed files with 22 additions and 4 deletions
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@ -239,7 +239,9 @@ If not set, the default variance improvement threshold is 25%.

    PMD Auto Load Balancing doesn't currently work if queues are assigned
    cross NUMA as actual processing load could get worse after assignment
-    as compared to what dry run predicts.
+    as compared to what dry run predicts. The only exception is when all
+    PMD threads are running on cores from a single NUMA node.  In this case
+    Auto Load Balancing is still possible.

 The minimum time between 2 consecutive PMD auto load balancing iterations can
 also be configured by::
--- a/3
+++ b/3
@ -2,6 +2,9 @@ Post-v2.15.0
 ---------------------
   - In ovs-vsctl and vtep-ctl, the "find" command now accept new
     operators {in} and {not-in}.
+   - Userspace datapath:
+     * Auto load balancing of PMDs now partially supports cross-NUMA polling
+       cases, e.g if all PMD threads are running on the same NUMA node.
   - ovs-ctl:
     * New option '--no-record-hostname' to disable hostname configuration
       in ovsdb on startup.
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@ -4887,6 +4887,12 @@ struct rr_numa {
    bool idx_inc;
 };

+static size_t
+rr_numa_list_count(struct rr_numa_list *rr)
+{
+    return hmap_count(&rr->numas);
+}
+
 static struct rr_numa *
 rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
 {
@ -5599,10 +5605,17 @@ get_dry_run_variance(struct dp_netdev *dp, uint32_t *core_list,
    for (int i = 0; i < n_rxqs; i++) {
        int numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
        numa = rr_numa_list_lookup(&rr, numa_id);
+        /* If there is no available pmd on the local numa but there is only one
+         * numa for cross-numa polling, we can estimate the dry run. */
+        if (!numa && rr_numa_list_count(&rr) == 1) {
+            numa = rr_numa_list_next(&rr, NULL);
+        }
        if (!numa) {
-            /* Abort if cross NUMA polling. */
-            VLOG_DBG("PMD auto lb dry run."
-                     " Aborting due to cross-numa polling.");
+            VLOG_DBG("PMD auto lb dry run: "
+                     "There's no available (non-isolated) PMD thread on NUMA "
+                     "node %d for port '%s' and there are PMD threads on more "
+                     "than one NUMA node available for cross-NUMA polling. "
+                     "Aborting.", numa_id, netdev_rxq_get_name(rxqs[i]->rx));
            goto cleanup;
        }