Memory pool in DPDK twenty-five DPDK series

1. Memory pool

Memory pool, that is too much to analyze, almost all frameworks must have this thing. Where there is a computer framework that does not use memory, and where there is memory allocation, it is directly new and malloc. Let’s not talk about whether it looks too low, the key is that memory is really difficult to manage, and the management of the memory pool needs to be adjusted for different application scenarios. You can see that there is a single size, this is the entry-level version; there are multiple sizes that adapt dynamically, this is the advanced version; there are dynamic generation and static group management, this is the expert version.
In short, it is to achieve a balance between efficiency and resources between memory allocation and management recovery. This may be different in different scenarios, and the focus may be different, so everything is based on the principle of practical application.

2. Memory pool in DPDK

As a management method of fixed-size memory allocation, the memory pool mempool is actually prepared to quickly allocate and reclaim memory. But in general, the size of the memory cannot be accurately determined, so the memory pool often results in wasting a small amount of memory, which is also a compromise. In the memory pool of DPDK, it is implemented by three parts:
1. The node object of the memory pool. These objects are stored in the global queue and can be accessed through a unique identifier. Of course, it is just a pointer structure and not a real memory area.
2. The actual storage area of ​​the memory. It is used to store related memory pool objects in the contiguous memory allocated in rte_memzone.
3. Ring lock-free queue. A lock-free queue means that it is safe in multithreading, and it can be used to manage mempool objects.
The entire memory pool accesses memory pool objects through the ring-free lock-free queue mapping. At the same time, in order to take into account multi-core conflicts, the local_cache object buffer is introduced to minimize the concurrent processing of multi-core access ring queues.

3. Data structure and source code

The data structure of the memory pool rte_mempool has been briefly analyzed before, and only relevant code snippets are given here:

struct rte_mempool {
    
    
	char name[RTE_MEMZONE_NAMESIZE];
	RTE_STD_C11
	union {
    
    
		void * pool_data;         
		uint64_t pool_id;         
	};
	void * pool_config;              
.......
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG

	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE];
#endif
}  __rte_cache_aligned;

1. Create

/* create the mempool */
struct rte_mempool *
rte_mempool_create(const char * name, unsigned n, unsigned elt_size,
	unsigned cache_size, unsigned private_data_size,
	rte_mempool_ctor_t * mp_init, void * mp_init_arg,
	rte_mempool_obj_cb_t * obj_init, void * obj_init_arg,
	int socket_id, unsigned flags)
{
    
    
	int ret;
	struct rte_mempool * mp;

	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
		private_data_size, socket_id, flags);
	if (mp == NULL)
		return NULL;

	/*
	 * Since we have 4 combinations of the SP/SC/MP/MC examine the flags to
	 * set the correct index into the table of ops structs.
	 */
	if ((flags & MEMPOOL_F_SP_PUT) && (flags & MEMPOOL_F_SC_GET))
		ret = rte_mempool_set_ops_byname(mp, "ring_sp_sc", NULL);
	else if (flags & MEMPOOL_F_SP_PUT)
		ret = rte_mempool_set_ops_byname(mp, "ring_sp_mc", NULL);
	else if (flags & MEMPOOL_F_SC_GET)
		ret = rte_mempool_set_ops_byname(mp, "ring_mp_sc", NULL);
	else
		ret = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);

	if (ret)
		goto fail;

	/* call the mempool priv initializer */
	if (mp_init)
		mp_init(mp, mp_init_arg);

	if (rte_mempool_populate_default(mp) < 0)
		goto fail;

	/* call the object initializers * /
	if (obj_init)
		rte_mempool_obj_iter(mp, obj_init, obj_init_arg);

	return mp;

 fail:
	rte_mempool_free(mp);
	return NULL;
}

It calls the rte_mempool_create_empty() function:

/*
 * Free a cache. It's the responsibility of the user to make sure that any
 * remaining objects in the cache are flushed to the corresponding
 * mempool.
 */
void
rte_mempool_cache_free(struct rte_mempool_cache *cache)
{
    
    
	rte_free(cache);
}

/* create an empty mempool */
struct rte_mempool *
rte_mempool_create_empty(const char *name, unsigned n, unsigned elt_size,
	unsigned cache_size, unsigned private_data_size,
	int socket_id, unsigned flags)
{
    
    
	char mz_name[RTE_MEMZONE_NAMESIZE];
	struct rte_mempool_list *mempool_list;
	struct rte_mempool *mp = NULL;
	struct rte_tailq_entry *te = NULL;
	const struct rte_memzone *mz = NULL;
	size_t mempool_size;
	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
	struct rte_mempool_objsz objsz;
	unsigned lcore_id;
	int ret;

	/* compilation-time checks */
	RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
			  RTE_CACHE_LINE_MASK) != 0);
	RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
			  RTE_CACHE_LINE_MASK) != 0);
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
	RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
			  RTE_CACHE_LINE_MASK) != 0);
	RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, stats) &
			  RTE_CACHE_LINE_MASK) != 0);
#endif

	mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);

	/* asked for zero items */
	if (n == 0) {
    
    
		rte_errno = EINVAL;
		return NULL;
	}

	/* asked cache too big */
	if (cache_size > RTE_MEMPOOL_CACHE_MAX_SIZE ||
	    CALC_CACHE_FLUSHTHRESH(cache_size) > n) {
    
    
		rte_errno = EINVAL;
		return NULL;
	}

	/* "no cache align" imply "no spread" */
	if (flags & MEMPOOL_F_NO_CACHE_ALIGN)
		flags |= MEMPOOL_F_NO_SPREAD;

	/* calculate mempool object sizes. */
	if (!rte_mempool_calc_obj_size(elt_size, flags, &objsz)) {
    
    
		rte_errno = EINVAL;
		return NULL;
	}

	rte_mcfg_mempool_write_lock();

	/*
	 * reserve a memory zone for this mempool: private data is
	 * cache-aligned
	 */
	private_data_size = (private_data_size +
			     RTE_MEMPOOL_ALIGN_MASK) & (~RTE_MEMPOOL_ALIGN_MASK);


	/* try to allocate tailq entry */
	te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
	if (te == NULL) {
    
    
		RTE_LOG(ERR, MEMPOOL, "Cannot allocate tailq entry!\n");
		goto exit_unlock;
	}

	mempool_size = MEMPOOL_HEADER_SIZE(mp, cache_size);
	mempool_size += private_data_size;
	mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);

	ret = snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_MZ_FORMAT, name);
	if (ret < 0 || ret >= (int)sizeof(mz_name)) {
    
    
		rte_errno = ENAMETOOLONG;
		goto exit_unlock;
	}

	mz = rte_memzone_reserve(mz_name, mempool_size, socket_id, mz_flags);
	if (mz == NULL)
		goto exit_unlock;

	/* init the mempool structure */
	mp = mz->addr;
	memset(mp, 0, MEMPOOL_HEADER_SIZE(mp, cache_size));
	ret = strlcpy(mp->name, name, sizeof(mp->name));
	if (ret < 0 || ret >= (int)sizeof(mp->name)) {
    
    
		rte_errno = ENAMETOOLONG;
		goto exit_unlock;
	}
	mp->mz = mz;
	mp->size = n;
	mp->flags = flags;
	mp->socket_id = socket_id;
	mp->elt_size = objsz.elt_size;
	mp->header_size = objsz.header_size;
	mp->trailer_size = objsz.trailer_size;
	/* Size of default caches, zero means disabled. */
	mp->cache_size = cache_size;
	mp->private_data_size = private_data_size;
	STAILQ_INIT(&mp->elt_list);
	STAILQ_INIT(&mp->mem_list);

	/*
	 * local_cache pointer is set even if cache_size is zero.
	 * The local_cache points to just past the elt_pa[] array.
	 */
	mp->local_cache = (struct rte_mempool_cache *)
		RTE_PTR_ADD(mp, MEMPOOL_HEADER_SIZE(mp, 0));

	/* Init all default caches. */
	if (cache_size != 0) {
    
    
		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
			mempool_cache_init(&mp->local_cache[lcore_id],
					   cache_size);
	}

	te->data = mp;

	rte_mcfg_tailq_write_lock();
	TAILQ_INSERT_TAIL(mempool_list, te, next);
	rte_mcfg_tailq_write_unlock();
	rte_mcfg_mempool_write_unlock();

	return mp;

exit_unlock:
	rte_mcfg_mempool_write_unlock();
	rte_free(te);
	rte_mempool_free(mp);
	return NULL;
}

As mentioned earlier, there are three important parts rte_mempool, rte_mempool_cache and mempool private in the data structure definition of the memory pool, which are all created in the above function. The generated objects are hung in the static variable rte_mempool_tailq of type rte_tailq_elem.
Then calculate the head size of the mempool and the Cache size of all cores by calculating the three data structures.
The real memory, that is, the actual memory creation is created in the rte_mempool_populate_default function.

/* Default function to populate the mempool: allocate memory in memzones,
 * and populate them. Return the number of objects added, or a negative
 * value on error.
 */
int
rte_mempool_populate_default(struct rte_mempool *mp)
{
    
    
	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
	char mz_name[RTE_MEMZONE_NAMESIZE];
	const struct rte_memzone *mz;
	ssize_t mem_size;
	size_t align, pg_sz, pg_shift = 0;
	rte_iova_t iova;
	unsigned mz_id, n;
	int ret;
	bool need_iova_contig_obj;
	size_t max_alloc_size = SIZE_MAX;

	ret = mempool_ops_alloc_once(mp);
	if (ret != 0)
		return ret;

	/* mempool must not be populated */
	if (mp->nb_mem_chunks != 0)
		return -EEXIST;

	/*
	 * the following section calculates page shift and page size values.
	 *
	 * these values impact the result of calc_mem_size operation, which
	 * returns the amount of memory that should be allocated to store the
	 * desired number of objects. when not zero, it allocates more memory
	 * for the padding between objects, to ensure that an object does not
	 * cross a page boundary. in other words, page size/shift are to be set
	 * to zero if mempool elements won't care about page boundaries.
	 * there are several considerations for page size and page shift here.
	 *
	 * if we don't need our mempools to have physically contiguous objects,
	 * then just set page shift and page size to 0, because the user has
	 * indicated that there's no need to care about anything.
	 *
	 * if we do need contiguous objects (if a mempool driver has its
	 * own calc_size() method returning min_chunk_size = mem_size),
	 * there is also an option to reserve the entire mempool memory
	 * as one contiguous block of memory.
	 *
	 * if we require contiguous objects, but not necessarily the entire
	 * mempool reserved space to be contiguous, pg_sz will be != 0,
	 * and the default ops->populate() will take care of not placing
	 * objects across pages.
	 *
	 * if our IO addresses are physical, we may get memory from bigger
	 * pages, or we might get memory from smaller pages, and how much of it
	 * we require depends on whether we want bigger or smaller pages.
	 * However, requesting each and every memory size is too much work, so
	 * what we'll do instead is walk through the page sizes available, pick
	 * the smallest one and set up page shift to match that one. We will be
	 * wasting some space this way, but it's much nicer than looping around
	 * trying to reserve each and every page size.
	 *
	 * If we fail to get enough contiguous memory, then we'll go and
	 * reserve space in smaller chunks.
	 */

	need_iova_contig_obj = !(mp->flags & MEMPOOL_F_NO_IOVA_CONTIG);
	ret = rte_mempool_get_page_size(mp, &pg_sz);
	if (ret < 0)
		return ret;

	if (pg_sz != 0)
		pg_shift = rte_bsf32(pg_sz);

	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
    
    
		size_t min_chunk_size;

		mem_size = rte_mempool_ops_calc_mem_size(
			mp, n, pg_shift, &min_chunk_size, &align);

		if (mem_size < 0) {
    
    
			ret = mem_size;
			goto fail;
		}

		ret = snprintf(mz_name, sizeof(mz_name),
			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
		if (ret < 0 || ret >= (int)sizeof(mz_name)) {
    
    
			ret = -ENAMETOOLONG;
			goto fail;
		}

		/* if we're trying to reserve contiguous memory, add appropriate
		 * memzone flag.
		 */
		if (min_chunk_size == (size_t)mem_size)
			mz_flags |= RTE_MEMZONE_IOVA_CONTIG;

		/* Allocate a memzone, retrying with a smaller area on ENOMEM */
		do {
    
    
			mz = rte_memzone_reserve_aligned(mz_name,
				RTE_MIN((size_t)mem_size, max_alloc_size),
				mp->socket_id, mz_flags, align);

			if (mz != NULL || rte_errno != ENOMEM)
				break;

			max_alloc_size = RTE_MIN(max_alloc_size,
						(size_t)mem_size) / 2;
		} while (mz == NULL && max_alloc_size >= min_chunk_size);

		if (mz == NULL) {
    
    
			ret = -rte_errno;
			goto fail;
		}

		if (need_iova_contig_obj)
			iova = mz->iova;
		else
			iova = RTE_BAD_IOVA;

		if (pg_sz == 0 || (mz_flags & RTE_MEMZONE_IOVA_CONTIG))
			ret = rte_mempool_populate_iova(mp, mz->addr,
				iova, mz->len,
				rte_mempool_memchunk_mz_free,
				(void *)(uintptr_t)mz);
		else
			ret = rte_mempool_populate_virt(mp, mz->addr,
				mz->len, pg_sz,
				rte_mempool_memchunk_mz_free,
				(void *)(uintptr_t)mz);
		if (ret < 0) {
    
    
			rte_memzone_free(mz);
			goto fail;
		}
	}

	return mp->size;

 fail:
	rte_mempool_free_memchunks(mp);
	return ret;
}

The Ring is created by the following function:

static int
mempool_ops_alloc_once(struct rte_mempool *mp)
{
    
    
	int ret;

	/* create the internal ring if not already done */
	if ((mp->flags & MEMPOOL_F_POOL_CREATED) == 0) {
    
    
		ret = rte_mempool_ops_alloc(mp);
		if (ret != 0)
			return ret;
		mp->flags |= MEMPOOL_F_POOL_CREATED;
	}
	return 0;
}
/* wrapper to allocate an external mempool's private (pool) data. */
int
rte_mempool_ops_alloc(struct rte_mempool *mp)
{
    
    
	struct rte_mempool_ops *ops;

	ops = rte_mempool_get_ops(mp->ops_index);
	return ops->alloc(mp);
}

The allocation depends on where the alloc in rte_mempool_ops is initialized:

/*
 * The following 4 declarations of mempool ops structs address
 * the need for the backward compatible mempool handlers for
 * single/multi producers and single/multi consumers as dictated by the
 * flags provided to the rte_mempool_create function
 */
static const struct rte_mempool_ops ops_mp_mc = {
    
    
	.name = "ring_mp_mc",
	.alloc = common_ring_alloc,
	.free = common_ring_free,
	.enqueue = common_ring_mp_enqueue,
	.dequeue = common_ring_mc_dequeue,
	.get_count = common_ring_get_count,
};
MEMPOOL_REGISTER_OPS(ops_mp_mc);
MEMPOOL_REGISTER_OPS(ops_sp_sc);
MEMPOOL_REGISTER_OPS(ops_mp_sc);
MEMPOOL_REGISTER_OPS(ops_sp_mc);

In this way, the call of the basic registration macro can understand the process. The nesting is very deep here, and you can follow up one by one if you are interested. On the whole, the rte_ring_create_elem function is created and inserted into the global rte_ring_tailq, and then the actual memory requested is inserted into the linked list by calling the mempool_add_elem function through the rte_mempool_ops_populate function.
2. Use
Use is to allocate Alloc:

//下面是m_buf的调用
static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
{
    
    
	struct rte_mbuf *m;

	if (rte_mempool_get(mp, (void **)&m) < 0)
		return NULL;
	MBUF_RAW_ALLOC_CHECK(m);
	return m;
}
//下面是mempool的接口
static __rte_always_inline int
rte_mempool_get_bulk(struct rte_mempool *mp, void **obj_table, unsigned int n)
{
    
    
	struct rte_mempool_cache *cache;
	cache = rte_mempool_default_cache(mp, rte_lcore_id());
	return rte_mempool_generic_get(mp, obj_table, n, cache);
}

static __rte_always_inline int
rte_mempool_get(struct rte_mempool *mp, void **obj_p)
{
    
    
	return rte_mempool_get_bulk(mp, obj_p, 1);
}

It will eventually call the rte_mempool_get_bulk function to get the memory object. Get it from the local Cache first, and then get the mbuf from the rte_ring and store it in the local local_cache if it is not enough.

3. Recycling
Recycling is Free:

static __rte_always_inline void
rte_mbuf_raw_free(struct rte_mbuf *m)
{
    
    
	RTE_ASSERT(RTE_MBUF_DIRECT(m));
	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
	RTE_ASSERT(m->next == NULL);
	RTE_ASSERT(m->nb_segs == 1);
	__rte_mbuf_sanity_check(m, 0);
	rte_mempool_put(m->pool, m);
}

static __rte_always_inline void
rte_mempool_put(struct rte_mempool *mp, void *obj)
{
    
    
	rte_mempool_put_bulk(mp, &obj, 1);
}
static __rte_always_inline void
rte_mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
		     unsigned int n)
{
    
    
	struct rte_mempool_cache *cache;
	cache = rte_mempool_default_cache(mp, rte_lcore_id());
	rte_mempool_generic_put(mp, obj_table, n, cache);
}

Find the result along the way.

Four. Summary

Computer memory management will remain an important issue for the foreseeable future. Therefore, it is very necessary to learn some memory management algorithms and related skills. It cannot be ignored that the use of memory pools is to exchange space for time, and to waste a part of memory in exchange for the reduction of memory fragmentation. In the eyes of programmers, balance is always the word. This balance is not a simple average, but a balance considered as a whole. If the data layer is unbalanced and the application is unbalanced, the whole is balanced. If you understand and think well, you will understand how important memory is as a basic development and application resource.

Guess you like

Origin blog.csdn.net/fpcc/article/details/131872791