Linux memory management: Slab Allocator makes its debut


article content
Linux memory management: Bootmem takes the lead BootmemStart process memory allocator
Linux Memory Management: The Buddy System is long overdue Buddy SystemPartner system memory allocator
Linux memory management: Slab makes its debut Slabmemory allocator

This is the third article in the source code analysis column

It is mainly divided into four major modules for analysis: memory management, device management, system startup and other parts.

Memory management is divided into three parts Bootmem: , Buddy Systemand Slab. This article mainly explains the Slabstartup process.


structure

Let’s first look Linuxat several memory allocation methods:

  • kmalloc: Called , done kmem_cache_allocthrough the allocatorslab
  • vmalloc: Allocate linearly continuous, physically discontinuous addresses, allocated in the high-end memory area
  • kzalloc: kmallocSimilar to, this function will zero out the page
  • malloc: allocate user memory

bootmem system, buddy systemand slab systemare collectively called Linuxthe three main forces of memory management.

  • bootmemIt is a rough memory management system during system initialization.
  • buddyIt is a large-scale memory management system after the system is officially running.
  • slabIt is a small memory management system after the system is officially running.

The partner system allocates memory in units of pages. Memory allocation in page units will cause waste, so the memory must be allocated into smaller units.

LinuxThere are three mechanisms for allocating cores to small slices slab, slubandslob

What is slaban allocator?

slabThe allocator management slablist slabconsists of pages, and its pages slabconsist of managed objects with a specific size. Depending on the memory allocation request, the object status may be "in use" or "not yet used".

slabThese managed objects may be taskstructures, pagestructures, etc. that are widely used in the kernel, or they may be memory of a specific size.

slabThe layer is its cache mechanism, which is managed based on objects. Its core is to allocate actual physical pages by the partner system.

Two functions:

  • Fine-grained memory regions for kernel allocation
  • Used as a cache, mainly for objects that are frequently allocated and released

The memory area of ​​the cache is divided into multiple areas slab, each slabconsisting of one or more contiguous page frames. These page frames protect allocated objects and the pages contain free objects.

Several basic concepts:

  • struct cacheslab: Dingxi with management , most of this information is to increase or decrease slabthe size of the unit, or slabthe number of objects managed within
  • cache_cache: Management cachedescriptor kmem_cachestructure slab. That is, each cachedescriptor information is placed slabin one cache_cacheto manage these specificslab

kmem_cache and kmem_list3

kmem_cacheStructure:

struct kmem_cache {
    
    
/* 1) per-cpu data, touched during every alloc/free */
	struct array_cache *array[NR_CPUS]; // per_cpu数据
/* 2) Cache tunables. Protected by cache_chain_mutex */
	unsigned int batchcount; // 可用per-cpu同时获得的对象数
	unsigned int limit; // per-cpu列表管理的最大对象数
	unsigned int shared; 

	unsigned int buffer_size; // 包含所有对象的cache对象
	u32 reciprocal_buffer_size; 
/* 3) touched by every alloc & free from the backend */

	unsigned int flags;		/* 缓存属性标签 */
	unsigned int num;		/* 每个slab的对象数 */

/* 4) cache_grow/shrink */
	/* order of pgs per slab (2^n) */
	unsigned int gfporder; // 页分配单位

	/* force GFP flags, e.g. GFP_DMA */
	gfp_t gfpflags; // gfp标签

	size_t colour;			/* cache colouring range */
	unsigned int colour_off;	/* colour offset */
	struct kmem_cache *slabp_cache;
	unsigned int slab_size;
	unsigned int dflags;		/* dynamic flags */

	/* constructor func */
	void (*ctor)(void *obj);

/* 5) cache creation/removal */
	const char *name;
	struct list_head next; // 连接cache的链表

/* 6) statistics */
#if STATS
	unsigned long num_active;
	unsigned long num_allocations;
	unsigned long high_mark;
	unsigned long grown;
	unsigned long reaped;
	unsigned long errors;
	unsigned long max_freeable;
	unsigned long node_allocs;
	unsigned long node_frees;
	unsigned long node_overflow;
	atomic_t allochit;
	atomic_t allocmiss;
	atomic_t freehit;
	atomic_t freemiss;
#endif
#if DEBUG
	int obj_offset;
	int obj_size;
#endif
	struct kmem_list3 *nodelists[MAX_NUMNODES]; // 按节点排列kmem_list3项
};

Several member variables to pay attention to are

struct list_head next: connected cachelinked list

struct kmem_list3 *nodelists[MAX_NUMNODES]: Stores slabthe first node of three linked lists

struct array_cache *array[NR_CPUS]: Manage each object list under different CPUmanagementcache

struct array_cache {
    
    
	unsigned int avail; // per-cpu可用对象数
	unsigned int limit; // 最大对象数
	unsigned int batchcount; // 可从slab同时获得的对象数
	unsigned int touched; // 显示是否勇敢cache
	spinlock_t lock;
	void *entry[];	 // 不同per-cpu使用的slab的对象地址列表
};

The purpose of different cpuplacement array_cacheis to improve performance when allocating memory.
The entry[]array stores the currently cpuused slabobject address rather than slabthe linked list position .

LinuxIn the kernel source code slab, the structures of slub, and are all set toslobstruct kmem_cache

kmem_list3, where 3the representation 3is managed in a way, one kmem_list3refers to an array slabscontaining three slabhead nodes of the linked list

/*
 * The slab lists for all objects.
 */
struct kmem_list3 {
    
    
	struct list_head slabs_partial;	// slabs_partial列表
	struct list_head slabs_full; // slabs_full列表
	struct list_head slabs_free; // slabs_free列表
	unsigned long free_objects; // 可分配的对象数
	unsigned int free_limit;  // 尚未分配的可允许的最大对象数
	unsigned int colour_next;	/* 各节点的cache coloring */
	spinlock_t list_lock;
	struct array_cache *shared;	/* 在相同节点内共享 */
	struct array_cache **alien;	/* 在不同节点内共享 */
	unsigned long next_reap;	/* 更新时间 */
	int free_touched;		/* 显示是否用过cache */
};

Regarding slabthe organization of objects, you can view the memory management principles of modern operating systems: taking Linux2.6.xx as an example of slabthe allocator

kmem_list3_init

Before slabinitialization, it is impossible to kmallocallocate some objects necessary in the initialization process. You can only use static global variables or partner systems. The method of static global variables is used here. After initialization, dynamically allocated objects slabwill be used to replace them. kmallocglobal variables

slaballocator buddyafter allocator so it can use buddythe full functionality of

In the kernel initialization phase, initialization is performed by calling kmem_cache_inita function, but the memory allocated from itslab needs to be used , so the "chicken or egg" problem arises. In order to solve this problem, the allocator divides the initialization into five stages.slabslab

static enum {
    
    
	NONE,
	PARTIAL_AC,
	PARTIAL_L3,
    EARLY,
	FULL
} g_cpucache_up;

NONE

At this stage, slabthe allocator can only provide one cache_cachecache. At this stage, slabthe allocator needs to build the cache with cache_cachestatic data.cache_array

kmem_cache_init

Initialize slaballocator

	for (i = 0; i < NUM_INIT_LISTS; i++) {
    
     // NUM_INIT_LISTS = 3 * NUM_MAXNODES
		kmem_list3_init(&initkmem_list3[i]);
		if (i < MAX_NUMNODES)
			cache_cache.nodelists[i] = NULL;
	}

kmem_list3_init()Function is used to initialize struct kmem_list3a structure instance, in slabwhich struct cachethe structure contains nodelistsmembers that nodespecify a struct kmem_list3structure for each

Each struct kmem_list3structure contains three linked lists, each linked list maintains a certain number ofslab

  • slab_partialThe linked list maintained slabcontains a certain number of availableobject
  • slab_fullslabThere are no available ones maintained on the linked list.object
  • slab_freeslabEverything maintained on the linked list objectis available
static void kmem_list3_init(struct kmem_list3 *parent)
{
    
    
	INIT_LIST_HEAD(&parent->slabs_full);
	INIT_LIST_HEAD(&parent->slabs_partial);
	INIT_LIST_HEAD(&parent->slabs_free);
	parent->shared = NULL;
	parent->alien = NULL;
	parent->colour_next = 0;
	spin_lock_init(&parent->list_lock);
	parent->free_objects = 0;
	parent->free_touched = 0;
}
#define NUM_INIT_LISTS (3 * MAX_NUMNODES)  // 3倍的机器数量

#ifdef CONFIG_NODES_SHIFT
#define NODES_SHIFT     CONFIG_NODES_SHIFT
#else
#define NODES_SHIFT     0
#endif

#define MAX_NUMNODES    (1 << NODES_SHIFT)  // 机器支持的最大的节点数量

About initkmem_list3, is a static variable

The memory allocated here is either the partner system or a static variable. It is more troublesome to release the memory allocated by the partner system at this time. Therefore, it is better to use the static variable directly at this time and replace it later, and it means that it will be automatically destroyed after the system starts __initdata.

struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];

set_up_list3s

	set_up_list3s(&cache_cache, CACHE_CACHE);

About cache_cacheis also a static variable

/* internal cache of cache description objs */
static struct kmem_cache cache_cache = {
    
    
	.batchcount = 1,
	.limit = BOOT_CPUCACHE_ENTRIES,
	.shared = 1,
	.buffer_size = sizeof(struct kmem_cache),
	.name = "kmem_cache",
};

struct kmem_cache {
    
    
/* 1) per-cpu data, touched during every alloc/free */
	struct array_cache *array[NR_CPUS];
/* 2) Cache tunables. Protected by cache_chain_mutex */
	unsigned int batchcount;
	unsigned int limit;
	unsigned int shared;

	unsigned int buffer_size;  // 包含所有对象的cache大小
	u32 reciprocal_buffer_size;  
/* 3) touched by every alloc & free from the backend */

	unsigned int flags;		/* constant flags */
	unsigned int num;		/* # of objs per slab */

/* 4) cache_grow/shrink */
	/* order of pgs per slab (2^n) */
	unsigned int gfporder;

	/* force GFP flags, e.g. GFP_DMA */
	gfp_t gfpflags;

	size_t colour;			/* cache colouring range */
	unsigned int colour_off;	/* colour offset */
	struct kmem_cache *slabp_cache;
	unsigned int slab_size;
	unsigned int dflags;		/* dynamic flags */

	/* constructor func */
	void (*ctor)(void *obj);

/* 5) cache creation/removal */
	const char *name;
	struct list_head next;

/* 6) statistics */
#if STATS
	unsigned long num_active;
	unsigned long num_allocations;
	unsigned long high_mark;
	unsigned long grown;
	unsigned long reaped;
	unsigned long errors;
	unsigned long max_freeable;
	unsigned long node_allocs;
	unsigned long node_frees;
	unsigned long node_overflow;
	atomic_t allochit;
	atomic_t allocmiss;
	atomic_t freehit;
	atomic_t freemiss;
#endif
#if DEBUG
	/*
	 * If debugging is enabled, then the allocator can add additional
	 * fields and/or padding to every object. buffer_size contains the total
	 * object size including these internal fields, the following two
	 * variables contain the offset to the user object and its size.
	 */
	int obj_offset;
	int obj_size;
#endif
	/*
	 * We put nodelists[] at the end of kmem_cache, because we want to size
	 * this array to nr_node_ids slots instead of MAX_NUMNODES
	 * (see kmem_cache_init())
	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
	 * is statically defined, so we reserve the max number of nodes.
	 */
	struct kmem_list3 *nodelists[MAX_NUMNODES];
	/*
	 * Do not add fields after nodelists[]
	 */
};
/*
 * For setting up all the kmem_list3s for cache whose buffer_size is same as
 * size of kmem_list3.
 */
static void __init set_up_list3s(struct kmem_cache *cachep, int index)
{
    
    
	int node;

	for_each_online_node(node) {
    
    
        // cachep指向缓存对象的struct kmem_cache
        // index指向node偏移
        // 将nodelists成员设置为initkmem_list3数组中特定的struct kmem_list3成员
		cachep->nodelists[node] = &initkmem_list3[index + node];
		cachep->nodelists[node]->next_reap = jiffies +
		    REAPTIMEOUT_LIST3 +
		    ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
	}
}

In fact, there has always been a question here, why use CACHE_CACHEsuch an inexplicable thing as a subscript. It turns out that everything starting initkmem_list3from the beginning is prepared for , and each one manages a three-linked list.CACHE_CACHEkmem_list3cache_cachenodelists[]kmem_list3

slabThe allocator struct kmem_cachemanages the cache object, which contains members nodelists, which are used to nodeallocate a struct kmem_list3structure for each system. The struct kmem_list3structure is maintained from the current nodephysical page , so this function is used to set the structure of each slabcache object.nodekmem_list3

This function is only used in slabthe initial phase of the allocator, so it is marked__init

	node = numa_node_id();

	/* 1) create the cache_cache */
	INIT_LIST_HEAD(&cache_chain);
	list_add(&cache_cache.next, &cache_chain);

slabThe allocator first initializes the cache global linked list cache_chainand then cache_cacheinserts into the linked list.

static inline void INIT_LIST_HEAD(struct list_head *list) // 初始化链表头
{
    
    
	list->next = list;
	list->prev = list;
}

#ifndef CONFIG_DEBUG_LIST
static inline void __list_add(struct list_head *new, 
			      struct list_head *prev,
			      struct list_head *next) // 插入到链表
{
    
    
	next->prev = new;
	new->next = next;
	new->prev = prev;
	prev->next = new;
}
#else
extern void __list_add(struct list_head *new,
			      struct list_head *prev,
			      struct list_head *next);
#endif
	cache_cache.colour_off = cache_line_size();
	cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node];

	/*
	 * struct kmem_cache size depends on nr_node_ids, which
	 * can be less than MAX_NUMNODES.
	 */
	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
				 nr_node_ids * sizeof(struct kmem_list3 *);
#if DEBUG
	cache_cache.obj_size = cache_cache.buffer_size;
#endif
	cache_cache.buffer_size = ALIGN(cache_cache.buffer_size,
					cache_line_size());
	cache_cache.reciprocal_buffer_size =
		reciprocal_value(cache_cache.buffer_size);

Certain attributes are specified

  • colour_offcache_cacheShading length specified
  • arrayLocal cache set up
  • buffer_sizestruct kmem_cacheThe cached object length is set and aligned

Regarding the shading properties colour, in order to improve cputhe efficiency of cache utilization, slabshading is a scheme that uses different lines in the hardware cache slabfor objects in different objects cpu. By placing the objects at slabdifferent starting offsets in the cache, the objects may be cpuused in the cache. different rows, thus ensuring that slabobjects from the same cache are less likely to refresh each other

slab shading and cpu hardware cache

Which initarray_cacheis static data

static struct arraycache_init initarray_cache __initdata =
    {
    
     {
    
    0, BOOT_CPUCACHE_ENTRIES, 1, 0} };

struct arraycache_init {
    
    
	struct array_cache cache;
	void *entries[BOOT_CPUCACHE_ENTRIES];
};

struct array_cache {
    
    
	unsigned int avail;
	unsigned int limit;
	unsigned int batchcount;
	unsigned int touched;
	spinlock_t lock;
	void *entry[];	
};

array_cacheIt is called the local cache. Generally, slabwhen allocating objects, it will be searched from the local cache first. If the local cache is empty, the cache in the shared cache will be moved to the local cache; if the shared cache is empty, it will be allocated slabfrom it Cache between nodes; if slabthere are no shared objects, allocate new ones slaband reallocate the cache

  • kmem_cache->array[], local cache
  • kmem_cache->shared, shared cache, which caches the cache that overflows with cpuall local caches currently on the same nodecpu
  • kmem_cache->alien, stored in other nodes slab cached objects, when allocated on a node is released on another node ( slab, will be added to the cache of the node where the object is located, otherwise added to the local cache or shared cache), when the cache is full , will call relocation to cacheobjectslab->nodeid!=numa_node_id()alienaliencache_free_alien()shared

cache_estimate

Calculate buddythe number of physical pages the allocator needs to provide in order to cache_cachebuildslab

In each loop, the call cache_estimatecalculates the number of physical pages cache_cacheused by the cache , the number of cache objects that have been maintained, and finally the wasted memory length.slabslabslab

	for (order = 0; order < MAX_ORDER; order++) {
    
    
		cache_estimate(order, cache_cache.buffer_size,
			cache_line_size(), 0, &left_over, &cache_cache.num);
		if (cache_cache.num)
			break;
	}

orderRefers to the rank of the partner system

static void cache_estimate(unsigned long gfporder, size_t buffer_size,
			   size_t align, int flags, size_t *left_over,
			   unsigned int *num)
{
    
    
	int nr_objs;
	size_t mgmt_size;
	size_t slab_size = PAGE_SIZE << gfporder; // 从buddy获得的内存的长度

	if (flags & CFLGS_OFF_SLAB) {
    
      // 是否将管理数据放在slab之外
		mgmt_size = 0;
		nr_objs = slab_size / buffer_size;  // 计算这段内存最大容纳对象的数量

		if (nr_objs > SLAB_LIMIT)
			nr_objs = SLAB_LIMIT;
	} else {
    
    
		nr_objs = (slab_size - sizeof(struct slab)) /
			  (buffer_size + sizeof(kmem_bufctl_t));

		if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
		       > slab_size)
			nr_objs--;

		if (nr_objs > SLAB_LIMIT)
			nr_objs = SLAB_LIMIT;

		mgmt_size = slab_mgmt_size(nr_objs, align);
	}
	*num = nr_objs; // 可分配缓存对象数量
	*left_over = slab_size - nr_objs*buffer_size - mgmt_size; // 剩余长度
}

slabManagement objects are divided into two modes:

  • Internal management: put it BUFCTL_ENDlater

  • External management struct slab: s_memSpecify objectthe starting position by (at this time, colourit is also managed externally)

If slabthe correct data for the build is found, slabthe data occupying the physical page is stored in the cache_cachecachegfporder

	cache_cache.gfporder = order;
	cache_cache.colour = left_over / cache_cache.colour_off; // 确定着色范围
	cache_cache.slab_size = ALIGN(cache_cache.num * sizeof(kmem_bufctl_t) +
				      sizeof(struct slab), cache_line_size()); // 存储slab管理数据的长度

kmem_cache_create

General caches. The length of objects in these general caches is a power of 2. These caches are also called anonymous caches.

slabcache_arrayThe allocator establishes the corresponding universal cache by calculating the length, and kmem_cache_createcreates

struct cache_sizes {
    
    
	size_t		 	cs_size;
	struct kmem_cache	*cs_cachep;
#ifdef CONFIG_ZONE_DMA
	struct kmem_cache	*cs_dmacachep;
#endif
};

struct cache_sizes malloc_sizes[] = {
    
    
#define CACHE(x) {
      
       .cs_size = (x) },
#include <linux/kmalloc_sizes.h>
	CACHE(ULONG_MAX)
#undef CACHE
};
EXPORT_SYMBOL(malloc_sizes);

/* Must match cache_sizes above. Out of line to keep cache footprint low. */
struct cache_names {
    
    
	char *name;
	char *name_dma;
};

static struct cache_names __initdata cache_names[] = {
    
    
#define CACHE(x) {
      
       .name = "size-" #x, .name_dma = "size-" #x "(DMA)" },
#include <linux/kmalloc_sizes.h>
	{
    
    NULL,}
#undef CACHE
};

ininclude/linux/kmalloc_sizes.h

#if (PAGE_SIZE == 4096)
	CACHE(32)
#endif
	CACHE(64)
#if L1_CACHE_BYTES < 64
	CACHE(96)
#endif
	CACHE(128)
#if L1_CACHE_BYTES < 128
	CACHE(192)
#endif
	CACHE(256)
	CACHE(512)
	CACHE(1024)
	CACHE(2048)
	CACHE(4096)
	CACHE(8192)
	CACHE(16384)
	CACHE(32768)
	CACHE(65536)
	CACHE(131072)
#if KMALLOC_MAX_SIZE >= 262144
	CACHE(262144)
#endif
#if KMALLOC_MAX_SIZE >= 524288
	CACHE(524288)
#endif
#if KMALLOC_MAX_SIZE >= 1048576
	CACHE(1048576)
#endif
#if KMALLOC_MAX_SIZE >= 2097152
	CACHE(2097152)
#endif
#if KMALLOC_MAX_SIZE >= 4194304
	CACHE(4194304)
#endif
#if KMALLOC_MAX_SIZE >= 8388608
	CACHE(8388608)
#endif
#if KMALLOC_MAX_SIZE >= 16777216
	CACHE(16777216)
#endif
#if KMALLOC_MAX_SIZE >= 33554432
	CACHE(33554432)
#endif

The kmem_cache_createfunction body is relatively complex, but its main function is to create a cache

  • nameSpecifies the name of the cache
  • sizeSpecify cache length
  • alignSpecifies cache alignment
  • flagsIndicate cache creation flag
  • ctorPoints to the function that builds the cache
struct kmem_cache *
kmem_cache_create (const char *name, size_t size, size_t align,
	unsigned long flags, void (*ctor)(void *))
{
    
    
	size_t left_over, slab_size, ralign;
	struct kmem_cache *cachep = NULL, *pc;

slabThe allocator is used to struct kmem_cacherepresent a cache, which contains the basic information of the cache, including the local cache, shared cache, and linked slablist of the cache.

The cache creation process is used to populate and create the data required for the cache

First perform basic detection, including whether the cached name exists, whether it is interrupted at this time, and whether the cached object is smaller BYTES_PER_WORDor larger thanKMALLOC_MAX_SIZE

	/*
	 * Sanity checks... these are all serious usage bugs.
	 */
	if (!name || in_interrupt() || (size < BYTES_PER_WORD) ||
	    size > KMALLOC_MAX_SIZE) {
    
    
		printk(KERN_ERR "%s: Early error in slab %s\n", __func__,
				name);
		BUG();
	}

Regarding the name of the cache at this time, see its calling point and create it based on the INDEX_ACobtained array.sizes

Here is the general cache, sizes[INDEX_AC].cs_cachepreferring to kmem_cachethe structure

Note : struct kmem_list3It is cache_cachemanaged by the owner

	sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,
					sizes[INDEX_AC].cs_size,
					ARCH_KMALLOC_MINALIGN,
					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
					NULL);

This index can be explored in depth. Its main purpose is to detect caches that can be found in the general cache CACHE()and are suitable for allocating the size of this structure.

static __always_inline int index_of(const size_t size)
{
    
    
	extern void __bad_size(void);

	if (__builtin_constant_p(size)) {
    
    
		int i = 0;

#define CACHE(x) \
	if (size <=x) \
		return i; \
	else \
		i++;
#include <linux/kmalloc_sizes.h>
#undef CACHE
		__bad_size();
	} else
		__bad_size();
	return 0;
}

#define INDEX_AC index_of(sizeof(struct arraycache_init))
#define INDEX_L3 index_of(sizeof(struct kmem_list3))

slabVisible on any system using it

liuzixuan@lzx-ubuntu ~ # cat /proc/slabinfo | awk -v OFS="\t" -F' ' '{print $1,$3,$4,$5,$6}'
kmalloc-8k	96	8192	4	8
kmalloc-4k	3128	4096	8	8
kmalloc-2k	1760	2048	16	8
kmalloc-1k	4096	1024	16	4
kmalloc-512	4112	512	16	2
kmalloc-256	1808	256	16	1
kmalloc-192	14679	192	21	1
kmalloc-128	1888	128	32	1
kmalloc-96	6090	96	42	1
kmalloc-64	57728	64	64	1
kmalloc-32	70912	32	128	1
kmalloc-16	16896	16	256	1
kmalloc-8	10240	8	512	1

Next, lock the mutex by get_online_cpusgetting it available CPUand then callingmutex_lock

	/*
	 * We use cache_chain_mutex to ensure a consistent view of
	 * cpu_online_mask as well.  Please see cpuup_callback
	 */
	get_online_cpus();
	mutex_lock(&cache_chain_mutex);

Iterate through all caches

	list_for_each_entry(pc, &cache_chain, next) {
    
    
		char tmp;
		int res;

		/*
		 * This happens when the module gets unloaded and doesn't
		 * destroy its slab cache and no-one else reuses the vmalloc
		 * area of the module.  Print a warning.
		 */
		res = probe_kernel_address(pc->name, tmp);
		if (res) {
    
    
			printk(KERN_ERR
			       "SLAB: cache with size %d has lost its name\n",
			       pc->buffer_size);
			continue;
		}

		if (!strcmp(pc->name, name)) {
    
     // 不允许名字相同的高速缓存存在
			printk(KERN_ERR
			       "kmem_cache_create: duplicate cache %s\n", name);
			dump_stack();
			goto oops;
		}
	}

Alignment-related operations are performed based on the flags provided when the cache is created, and the alignment results will eventually be stored ralignin

	if (size & (BYTES_PER_WORD - 1)) {
    
    
		size += (BYTES_PER_WORD - 1);
		size &= ~(BYTES_PER_WORD - 1);
	}

	/* calculate the final buffer alignment: */

	/* 1) arch recommendation: can be overridden for debug */
	if (flags & SLAB_HWCACHE_ALIGN) {
    
    
		/*
		 * Default alignment: as specified by the arch code.  Except if
		 * an object is really small, then squeeze multiple objects into
		 * one cacheline.
		 */
		ralign = cache_line_size();
		while (size <= ralign / 2)
			ralign /= 2;
	} else {
    
    
		ralign = BYTES_PER_WORD;
	}

	/*
	 * Redzoning and user store require word alignment or possibly larger.
	 * Note this will be overridden by architecture or caller mandated
	 * alignment if either is greater than BYTES_PER_WORD.
	 */
	if (flags & SLAB_STORE_USER)
		ralign = BYTES_PER_WORD;

	if (flags & SLAB_RED_ZONE) {
    
    
		ralign = REDZONE_ALIGN;
		/* If redzoning, ensure that the second redzone is suitably
		 * aligned, by adjusting the object size accordingly. */
		size += REDZONE_ALIGN - 1;
		size &= ~(REDZONE_ALIGN - 1);
	}

	/* 2) arch mandated alignment */
	if (ralign < ARCH_SLAB_MINALIGN) {
    
    
		ralign = ARCH_SLAB_MINALIGN;
	}
	/* 3) caller mandated alignment */
	if (ralign < align) {
    
    
		ralign = align;
	}

function kmem_cache_zallocallocates a new cachestruct kmem_cache

	cachep = kmem_cache_zalloc(&cache_cache, GFP_KERNEL);
	if (!cachep)
		goto oops;

Used to determine slabwhether the default management data is located slabinternally or externally

The conditions for judgment are slab_early_initandsize

	/*
	 * Determine if the slab management is 'on' or 'off' slab.
	 * (bootstrapping cannot cope with offslab caches so don't do
	 * it too early on.)
	 */
	if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init)
		/*
		 * Size is large, assume best to place the slab management obj
		 * off-slab (should allow better packing of objs).
		 */
		flags |= CFLGS_OFF_SLAB;
	size = ALIGN(size, align); // 获得高速缓存对象对齐之后的长度
	// 计算出高速缓存每个slab维护对象的数量和占用物理页的数量
	left_over = calculate_slab_order(cachep, size, align, flags); 
	if (!cachep->num) {
    
    
		printk(KERN_ERR
		       "kmem_cache_create: couldn't create cache %s.\n", name);
		kmem_cache_free(&cache_cache, cachep);
		cachep = NULL;
		goto oops;
	}
	slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t) 
			  + sizeof(struct slab), align);

Data input

	if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
    
    
		flags &= ~CFLGS_OFF_SLAB;
		left_over -= slab_size;
	}

	if (flags & CFLGS_OFF_SLAB) {
    
    
		/* really off slab. No need for manual alignment */
		slab_size =
		    cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
	}

	cachep->colour_off = cache_line_size();
	/* Offset must be a multiple of the alignment. */
	if (cachep->colour_off < align)
		cachep->colour_off = align;
	cachep->colour = left_over / cachep->colour_off;
	cachep->slab_size = slab_size;
	cachep->flags = flags;
	cachep->gfpflags = 0;
	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
		cachep->gfpflags |= GFP_DMA;
	cachep->buffer_size = size;
	cachep->reciprocal_buffer_size = reciprocal_value(size);

	if (flags & CFLGS_OFF_SLAB) {
    
    
		cachep->slabp_cache = kmem_find_general_cachep(slab_size, 0u);
		/*
		 * This is a possibility for one of the malloc_sizes caches.
		 * But since we go off slab only for object size greater than
		 * PAGE_SIZE/8, and malloc_sizes gets created in ascending order,
		 * this should not happen at all.
		 * But leave a BUG_ON for some lucky dude.
		 */
		BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
	}
	cachep->ctor = ctor;
	cachep->name = name;

Set up local cache, shared cache and slablinked list of caches

	if (setup_cpu_cache(cachep)) {
    
    
		__kmem_cache_destroy(cachep);
		cachep = NULL;
		goto oops;
	}

cache_chainIf the allocation is successful, the cache is inserted into the system cache linked list.

	/* cache setup completed, link it into the list */
	list_add(&cachep->next, &cache_chain);

The function finally returns a pointer to the cache cachepand exports the function EXPORT_SYMBOL()to other parts of the kernel for use

	return cachep;

kmem_cache_alloc

Used to allocate a usable cache object from the cache object

void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
    
    
	void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));

	trace_kmem_cache_alloc(_RET_IP_, ret,
			       obj_size(cachep), cachep->buffer_size, flags);

	return ret;
}
static __always_inline void *
__cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
{
    
    
	unsigned long save_flags;
	void *objp;

	lockdep_trace_alloc(flags);

	if (slab_should_failslab(cachep, flags))
		return NULL;

	cache_alloc_debugcheck_before(cachep, flags);
	local_irq_save(save_flags);
	objp = __do_cache_alloc(cachep, flags); // 分配
	local_irq_restore(save_flags);
	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
	prefetchw(objp);

	if (unlikely((flags & __GFP_ZERO) && objp))
		memset(objp, 0, obj_size(cachep));

	return objp;
}
static __always_inline void *
__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
    
    
	return ____cache_alloc(cachep, flags);
}

In slabthe allocator, in order to speed up the allocation of cache objects, the allocator builds a cache stack slabfor each cache object . You can quickly interact with the two available cache objects from the cache stack. The cache object is maintained using , each member in the array corresponds to a cache stack, so a cache stack is maintainedcpucpustruct kmem_cachearraystruct cache_arraystruct cache_array

In fact, it cache_cachemanages the structure cacheof each otherkmem_cache

static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
    
    
	void *objp;
	struct array_cache *ac;

	check_irq_off();

	ac = cpu_cache_get(cachep); // 获得对应的缓存栈
	if (likely(ac->avail)) {
    
    
		STATS_INC_ALLOCHIT(cachep);
		ac->touched = 1;
		objp = ac->entry[--ac->avail];
	} else {
    
    
		STATS_INC_ALLOCMISS(cachep);
		objp = cache_alloc_refill(cachep, flags);
	}
	return objp;
}

The allocator at this stage slabis already available to allocate struct kmem_cacheobjects. At this stage , slabthe allocator uses the obtained struct cache_arraylength to build a general cache that matches the length. slabThe allocator can then use this cache to provide objects to slabthe allocator for Maintains a local cache of caches, only one runs at this stagestruct cache_arraystruct cache_arraycpu

SLABThe allocator next creates a cache corresponding to the system general cache and, if the macro is on, a cache for the general cache CONFIG_ZONE_DMAas wellDMA

	while (sizes->cs_size != ULONG_MAX) {
    
    
		/*
		 * For performance, all the general caches are L1 aligned.
		 * This should be particularly beneficial on SMP boxes, as it
		 * eliminates "false sharing".
		 * Note for systems short on memory removing the alignment will
		 * allow tighter packing of the smaller caches.
		 */
		if (!sizes->cs_cachep) {
    
    
			sizes->cs_cachep = kmem_cache_create(names->name,
					sizes->cs_size,
					ARCH_KMALLOC_MINALIGN,
					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
					NULL);
		}
#ifdef CONFIG_ZONE_DMA
		sizes->cs_dmacachep = kmem_cache_create(
					names->name_dma,
					sizes->cs_size,
					ARCH_KMALLOC_MINALIGN,
					ARCH_KMALLOC_FLAGS|SLAB_CACHE_DMA|
						SLAB_PANIC,
					NULL);
#endif
		sizes++;
		names++;
	}

PARTIAL_AC

Since the original cache_cachelocal cache initarray_cacheis maintained with static data, struct cache_arraythe general cache can be used at this time, so allocate memory kmalloc()for ptrpointers.

	/* 4) Replace the bootstrap head arrays */
	{
    
    
		struct array_cache *ptr;

		ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);

Migrate cache_cachethe corresponding local cache data to ptrthe corresponding memory, then initialize ptrthe corresponding data, and then cache_cachepoint the local cache toptr

		memcpy(ptr, cpu_cache_get(&cache_cache),
		       sizeof(struct arraycache_init));
		/*
		 * Do not assume that spinlocks can be initialized via memcpy:
		 */
		spin_lock_init(&ptr->lock);

		cache_cache.array[smp_processor_id()] = ptr;
		local_irq_enable();

Call to kmalloc()allocate memory and ptrpoint to the newly allocated memory

		ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);

		local_irq_disable();
		BUG_ON(cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep)
		       != &initarray_generic.cache);
		memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),
		       sizeof(struct arraycache_init));

		malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =
		    ptr;

PARTIAL_L3

At this time, slabthe allocator is ready to update the static slablinked list data to slabthe provided memory.

	/* 5) Replace the bootstrap kmem_list3's */
	{
    
    
		int nid;

		for_each_online_node(nid) {
    
     // 遍历所有在线的node
			init_list(&cache_cache, &initkmem_list3[CACHE_CACHE + nid], nid);

			init_list(malloc_sizes[INDEX_AC].cs_cachep,
				  &initkmem_list3[SIZE_AC + nid], nid);

			if (INDEX_AC != INDEX_L3) {
    
    
				init_list(malloc_sizes[INDEX_L3].cs_cachep,
					  &initkmem_list3[SIZE_L3 + nid], nid);
			}
		}
	}

During each traversal, the function replaces all the memory occupied by cache_cachethe cache and struct cache_arraythe struct kmem_list3corresponding linked list with the allocated memory.slabslab

init_list

init_listUsed to create a new struct kmem_list3structure, cache the original kmem_list3 slablinked list data to the new one kmem_list3, and point the cached linked slablist to the new one.kmem_list3

cachepPoint to cache, listpoint to slablinked list, nodeidindicate nodeinformation

/*
 * swap the static kmem_list3 with kmalloced memory
 */
static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
			int nodeid)
{
    
    
	struct kmem_list3 *ptr;
	// 从指定的节点上分配内存
	ptr = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, nodeid);
	BUG_ON(!ptr);

	local_irq_disable();
	memcpy(ptr, list, sizeof(struct kmem_list3));
	/*
	 * Do not assume that spinlocks can be initialized via memcpy:
	 */
	spin_lock_init(&ptr->list_lock);
	// 将slab链表上的数据全部迁移到ptr对应的链表上
	MAKE_ALL_LISTS(cachep, ptr, nodeid);
	cachep->nodelists[nodeid] = ptr;
	local_irq_enable();
}

EARLY

At this stage , the allocator will use the allocator to allocate slaball data related to the allocator and no longer use static data.slab

FULL

Get started with slaballocators

	/* Done! */
	g_cpucache_up = FULL;

SUPPLEMENT

objectIt is a cache object (i.e. memory area). Each slablinked list manages many cache objects, so struct array_cacheit void *entry[]can also point to cache objects and use these memories.

struct slab {
    
    
	struct list_head list;
	unsigned long colouroff;
	void *s_mem;		/* including colour offset */
	unsigned int inuse;	/* num of objs active in slab */
	kmem_bufctl_t free;
	unsigned short nodeid;
};

Cache allocation process:

  • The general order of cache allocation is: local cache -> shared cache -> slablinked list ->buddy system

  • When there is no available cache object in the cache's local cache or shared cache, the cache will look it up in the slablinked list slabs_partialand slabs_freethen put it back into the local cache.

  • If the number of cache objects maintained on the local cache exceeds the upper limit, the local cache releases the cache objects back to the shared cache

  • If the number of cache objects maintained on the shared cache exceeds the upper limit, the shared cache will be released to slabthe linked list

Cache classification

  • Private/normal cache. It does not target specific objects in the kernel. It first provides kmem_cachea cache for the structure itself and saves it in cache_cache(this variable represents cache_chainthe first element in the linked list)
  • The general cache is created by specifying specific objects according to the needs of the kernel. slabThe size of each managed object is consistent. When you need to allocate a byte space, just go directly to the fixed byte space slabto find it and then allocate it.

The biggest difference between them is that the general cache has allocated memory space, which kmalloc()can be obtained directly when used, and is not really released, while the dedicated cache requires steps such as kfree()finding the location, allocating memory, building and allocating memory in the partner system.slab

Now that we have a general-purpose cache, why do we need a dedicated cache?

When a certain data structure in your code needs to be allocated and released very frequently and has high performance requirements, you can consider creating a dedicated cache.

For memory areas that are expected to be frequently used, a set of dedicated buffers of a specific size can be created for processing to avoid memory fragmentation; for less used memory areas, a buffer (power of 2) can be created for processing. , even if this processing mode generates fragments, it will have little impact on the performance of the entire system

Guess you like

Origin blog.csdn.net/qq_48322523/article/details/128252796