2
0
mirror of https://github.com/openvswitch/ovs synced 2025-08-22 01:51:26 +00:00

netdev-dpdk: Add shared mempool config.

Mempools may currently be shared between DPDK ports based
on port MTU and NUMA. With some hint from the user we can
increase the sharing on MTU and hence reduce memory
consumption in many cases.

For example, a port with MTU 9000, uses a mempool with an
mbuf size based on 9000 MTU. A port with MTU 1500, uses a
different mempool with an mbuf size based on 1500 MTU.

In this case, assuming same NUMA, both these ports could
share the 9000 MTU mempool.

The user must give a hint as order of creation of ports and
setting of MTUs may vary and we need to ensure that upgrades
from older OVS versions do not require more memory.

This scheme can also prevent multiple mempools being created
for cases where a port is added picking up a default MTU and
an appropriate mempool, but later has it's MTU changed to a
different value requiring a different mempool.

Example usage:

 $ ovs-vsctl --no-wait set Open_vSwitch . \
   other_config:shared-mempool-config=9000,1500:1,6000:1

Port added on NUMA 0:
* MTU 1500, use mempool based on 9000 MTU
* MTU 5000, use mempool based on 9000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

Port added on NUMA 1:
* MTU 1500, use mempool based on 1500 MTU
* MTU 5000, use mempool based on 6000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

Default behaviour is unchanged and mempools are still only created
when needed.

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Sunil Pai G <sunil.pai.g@intel.com>
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
This commit is contained in:
Kevin Traynor 2022-06-24 11:13:23 +01:00 committed by Ian Stokes
parent eacc544c4d
commit 3757e9f8e9
6 changed files with 195 additions and 5 deletions

View File

@ -213,3 +213,47 @@ Example 3: (2 rxq, 2 PMD, 9000 MTU)
Number of mbufs = (2 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 26656
Mbuf size = 10176 Bytes
Memory required = 26656 * 10176 = 271 MB
Shared Mempool Configuration
----------------------------
In order to increase sharing of mempools, a user can configure the MTUs which
mempools are based on by using ``shared-mempool-config``.
An MTU configured by the user is adjusted to an mbuf size used for mempool
creation and stored. If a port is subsequently added that has an MTU which can
be accommodated by this mbuf size, it will be used for mempool creation/reuse.
This can increase sharing by consolidating mempools for ports with different
MTUs which would otherwise use separate mempools. It can also help to remove
the need for mempools being created after a port is added but before it's MTU
is changed to a different value.
For example, on a 2 NUMA system::
$ ovs-vsctl ovs-vsctl --no-wait set Open_vSwitch . \
other_config:shared-mempool-config=9000,1500:1,6000:1
In this case, OVS stores the mbuf sizes based on the following MTUs.
* NUMA 0: 9000
* NUMA 1: 1500, 6000, 9000
Ports added will use mempools with the mbuf sizes based on the above MTUs where
possible. If there is more than one suitable, the one closest to the MTU will
be selected.
Port added on NUMA 0:
* MTU 1500, use mempool based on 9000 MTU
* MTU 6000, use mempool based on 9000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)
Port added on NUMA 1:
* MTU 1500, use mempool based on 1500 MTU
* MTU 6000, use mempool based on 6000 MTU
* MTU 9000, use mempool based on 9000 MTU
* MTU 9300, use mempool based on 9300 MTU (existing behaviour)

3
NEWS
View File

@ -47,6 +47,9 @@ Post-v2.17.0
* Delay creating or reusing a mempool for vhost ports until the VM
is started. A failure to create a mempool will now be logged only
when the VM is started.
* New configuration knob 'other_config:shared-mempool-config' to set MTU
that shared mempool mbuf size is based on. This allows interfaces with
different MTU sizes to share mempools.
- Userspace datapath:
* Improved multi-thread scalability of the userspace connection tracking.
* 'dpif-netdev/subtable-lookup-prio-get' appctl command renamed to

View File

@ -518,7 +518,7 @@ dpdk_init__(const struct smap *ovs_other_config)
RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
/* Finally, register the dpdk classes */
netdev_dpdk_register();
netdev_dpdk_register(ovs_other_config);
netdev_register_flow_api_provider(&netdev_offload_dpdk);
return true;
}

View File

@ -53,6 +53,7 @@
#include "openvswitch/dynamic-string.h"
#include "openvswitch/list.h"
#include "openvswitch/match.h"
#include "openvswitch/ofp-parse.h"
#include "openvswitch/ofp-print.h"
#include "openvswitch/shash.h"
#include "openvswitch/vlog.h"
@ -370,7 +371,15 @@ struct dpdk_mp {
int socket_id;
int refcount;
struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
};
};
struct user_mempool_config {
int adj_mtu;
int socket_id;
};
static struct user_mempool_config *user_mempools = NULL;
static int n_user_mempools;
/* There should be one 'struct dpdk_tx_queue' created for
* each netdev tx queue. */
@ -572,6 +581,44 @@ dpdk_buf_size(int mtu)
+ RTE_PKTMBUF_HEADROOM;
}
static int
dpdk_get_user_adjusted_mtu(int port_adj_mtu, int port_mtu, int port_socket_id)
{
int best_adj_user_mtu = INT_MAX;
for (unsigned i = 0; i < n_user_mempools; i++) {
int user_adj_mtu, user_socket_id;
user_adj_mtu = user_mempools[i].adj_mtu;
user_socket_id = user_mempools[i].socket_id;
if (port_adj_mtu > user_adj_mtu
|| (user_socket_id != INT_MAX
&& user_socket_id != port_socket_id)) {
continue;
}
if (user_adj_mtu < best_adj_user_mtu) {
/* This is the is the lowest valid user MTU. */
best_adj_user_mtu = user_adj_mtu;
if (best_adj_user_mtu == port_adj_mtu) {
/* Found an exact fit, no need to keep searching. */
break;
}
}
}
if (best_adj_user_mtu == INT_MAX) {
VLOG_DBG("No user configured shared mempool mbuf sizes found "
"suitable for port with MTU %d, NUMA %d.", port_mtu,
port_socket_id);
best_adj_user_mtu = port_adj_mtu;
} else {
VLOG_DBG("Found user configured shared mempool with mbufs "
"of size %d, suitable for port with MTU %d, NUMA %d.",
MTU_TO_FRAME_LEN(best_adj_user_mtu), port_mtu,
port_socket_id);
}
return best_adj_user_mtu;
}
/* Allocates an area of 'sz' bytes from DPDK. The memory is zero'ed.
*
* Unlike xmalloc(), this function can return NULL on failure. */
@ -795,6 +842,10 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
/* Check if shared memory is being used, if so check existing mempools
* to see if reuse is possible. */
if (!per_port_mp) {
/* If user has provided defined mempools, check if one is suitable
* and get new buffer size.*/
mtu = dpdk_get_user_adjusted_mtu(mtu, dev->requested_mtu,
dev->requested_socket_id);
LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
if (dmp->socket_id == dev->requested_socket_id
&& dmp->mtu == mtu) {
@ -5337,6 +5388,56 @@ netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev,
#endif /* ALLOW_EXPERIMENTAL_API */
static void
parse_user_mempools_list(const char *mtus)
{
char *list, *copy, *key, *value;
int error = 0;
if (!mtus) {
return;
}
n_user_mempools = 0;
list = copy = xstrdup(mtus);
while (ofputil_parse_key_value(&list, &key, &value)) {
int socket_id, mtu, adj_mtu;
if (!str_to_int(key, 0, &mtu) || mtu < 0) {
error = EINVAL;
VLOG_WARN("Invalid user configured shared mempool MTU.");
break;
}
if (!str_to_int(value, 0, &socket_id)) {
/* No socket specified. It will apply for all numas. */
socket_id = INT_MAX;
} else if (socket_id < 0) {
error = EINVAL;
VLOG_WARN("Invalid user configured shared mempool NUMA.");
break;
}
user_mempools = xrealloc(user_mempools, (n_user_mempools + 1) *
sizeof(struct user_mempool_config));
adj_mtu = FRAME_LEN_TO_MTU(dpdk_buf_size(mtu));
user_mempools[n_user_mempools].adj_mtu = adj_mtu;
user_mempools[n_user_mempools].socket_id = socket_id;
n_user_mempools++;
VLOG_INFO("User configured shared mempool set for: MTU %d, NUMA %s.",
mtu, socket_id == INT_MAX ? "ALL" : value);
}
if (error) {
VLOG_WARN("User configured shared mempools will not be used.");
n_user_mempools = 0;
free(user_mempools);
user_mempools = NULL;
}
free(copy);
}
#define NETDEV_DPDK_CLASS_COMMON \
.is_pmd = true, \
.alloc = netdev_dpdk_alloc, \
@ -5420,8 +5521,12 @@ static const struct netdev_class dpdk_vhost_client_class = {
};
void
netdev_dpdk_register(void)
netdev_dpdk_register(const struct smap *ovs_other_config)
{
const char *mempoolcfg = smap_get(ovs_other_config,
"shared-mempool-config");
parse_user_mempools_list(mempoolcfg);
netdev_register_provider(&dpdk_class);
netdev_register_provider(&dpdk_vhost_class);
netdev_register_provider(&dpdk_vhost_client_class);

View File

@ -20,6 +20,7 @@
#include <config.h>
#include "openvswitch/compiler.h"
#include "smap.h"
struct dp_packet;
struct netdev;
@ -28,7 +29,7 @@ struct netdev;
#include <rte_flow.h>
void netdev_dpdk_register(void);
void netdev_dpdk_register(const struct smap *);
void free_dpdk_buf(struct dp_packet *);
bool netdev_dpdk_flow_api_supported(struct netdev *);
@ -150,7 +151,7 @@ netdev_dpdk_rte_flow_tunnel_item_release(
#else
static inline void
netdev_dpdk_register(void)
netdev_dpdk_register(const struct smap *ovs_other_config OVS_UNUSED)
{
/* Nothing */
}

View File

@ -490,6 +490,43 @@
</p>
</column>
<column name="other_config" key="shared-mempool-config">
<p>Specifies dpdk shared mempool config.</p>
<p>Value should be set in the following form:</p>
<p>
<code>other_config:shared-mempool-config=&lt;
user-shared-mempool-mtu-list&gt;</code>
</p>
<p>where</p>
<p>
<ul>
<li>
&lt;user-shared-mempool-mtu-list&gt; ::=
NULL | &lt;non-empty-list&gt;
</li>
<li>
&lt;non-empty-list&gt; ::= &lt;user-mtus&gt; |
&lt;user-mtus&gt; ,
&lt;non-empty-list&gt;
</li>
<li>
&lt;user-mtus&gt; ::= &lt;mtu-all-socket&gt; |
&lt;mtu-socket-pair&gt;
</li>
<li>
&lt;mtu-all-socket&gt; ::= &lt;mtu&gt;
</li>
<li>
&lt;mtu-socket-pair&gt; ::= &lt;mtu&gt; : &lt;socket-id&gt;
</li>
</ul>
</p>
<p>
Changing this value requires restarting the daemon if dpdk-init has
already been set to true.
</p>
</column>
<column name="other_config" key="tx-flush-interval"
type='{"type": "integer",
"minInteger": 0, "maxInteger": 1000000}'>