Linux memory management-detailed explanation of slab allocator

linux related video analysis:

90 minutes to understand the Linux memory architecture, the advantages of numa, the realization of slab, and the principle of vmalloc.
Analyze the linux kernel architecture in 5 aspects, so that you are no longer unfamiliar with the kernel and
take you to implement a Linux kernel file system

Linux has an allocation algorithm called buddy system, which mainly solves the problem of allocating consecutive memory pages. The partner allocation algorithm mainly uses memory pages (4KB) as the allocation unit, which means that the partner allocation algorithm can allocate 2order memory pages at a time (order 0, 1, 2...9). But sometimes we only need to apply for a small memory area (such as 32 bytes). At this time, it is wasteful to use the partner allocation algorithm. In order to solve the problem of small memory allocation, Linux uses the slab allocation algorithm.

Related data structure

The slab algorithm has two important data structures, one is kmem_cache_t and the other is slab_t. Let's take a look at the kmem_cache_t structure first:

1. struct kmem_cache_s {
    
     
2.  struct list_head    slabs_full;
3.  struct list_head    slabs_partial;
4.  struct list_head    slabs_free;
5.  unsigned int        objsize;
6.  unsigned int        flags;
7.  unsigned int        num;
8.     spinlock_t          spinlock; 
9.  
10.      /* 2) slab additions /removals */ 
11.  /* order of pgs per slab (2^n) */ 
12.  unsigned int        gfporder; 
13.  
14.  /* force GFP flags, e.g. GFP_DMA */ 
15.  unsigned int        gfpflags; 
16.  
17.     size_t              colour;
18.  unsigned int        colour_off;
19.  unsigned int        colour_next;
20.     kmem_cache_t        *slabp_cache; 
21.  ... 
22.  struct list_head      next;
23.  ...
24. };

The following introduces the more important fields in the kmem_cache_t structure:

slab_full: fully allocated slab
slab_partial: Partially allocated slab
slab_free: slab that has not been allocated
objsize: the size of the stored object
num: the number of objects that a slab can store
gfporder: A slab is composed of 2gfporder memory pages
colour/colour_off/colour_next: the size of the colouring area (will be mentioned later)

The slab_t structure is defined as follows:

1. typedef struct slab_s {
    
     
2.  struct list_head    list;
3.  unsigned long       colouroff;
4.  void                *s_mem;
5.  unsigned int        inuse;
6.     kmem_bufctl_t        free;
7. } slab_t;

The purpose of each field of the slab_t structure is as follows:

list: link (full/partial/full empty) chain
colouroff: colour compensation
s_mem: the starting memory address of the storage object
inuse: how many objects have been allocated
free: Used to connect free objects

Use a graph to show the relationship between them, as follows:
Insert picture description here

Need to explain here, a slab will be divided into multiple objects (can be understood as a structure), these objects are the smallest unit allocated by the slab algorithm, and a slab generally has one or more memory pages (but not more than 24 pages )composition.

The slab of the slab_free list in the kmem_cache_t structure is the main candidate for memory recovery. Since the object is allocated and released from the slab, a single slab can be in the slab list. For example, when all objects in a slab are allocated, they are moved from the slab_partial list to the slab_full list. When a slab in the slab_free list is allocated an object, it will be moved from the slab_free list to the slab_partial list. When all objects in a slab are released, they will be moved from the slab_partial list to the slab_free list.

[Article benefits] C/C++ Linux server architect learning materials plus group 812855908 (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Insert picture description here

slab allocator initialization

The initialization of the slab allocator is completed by the kmem_cache_init() function, as follows:

1. void __init kmem_cache_init(void) 
2. {
    
     
3.     size_t left_over; 
4.  
5.     init_MUTEX(&cache_chain_sem); 
6.     INIT_LIST_HEAD(&cache_chain); 
7.  
8.     kmem_cache_estimate(0, cache_cache.objsize, 0, 
9.             &left_over, &cache_cache.num); 
10.  if (!cache_cache.num) 
11.         BUG(); 
12.  
13.     cache_cache.colour = left_over/cache_cache.colour_off; 
14.     cache_cache.colour_next = 0; 
15. }

This function is mainly used to initialize the variable cache_cache, cache_cache is a structure variable of type kmem_cache_t, defined as follows:

1. static kmem_cache_t cache_cache = {
    
     
2.     slabs_full:       LIST_HEAD_INIT(cache_cache.slabs_full), 
3.     slabs_partial:    LIST_HEAD_INIT(cache_cache.slabs_partial), 
4.     slabs_free:       LIST_HEAD_INIT(cache_cache.slabs_free), 
5.     objsize:          sizeof(kmem_cache_t), 
6.     flags:            SLAB_NO_REAP, 
7.     spinlock:         SPIN_LOCK_UNLOCKED, 
8.     colour_off:       L1_CACHE_BYTES, 
9.     name:             "kmem_cache", 
10. };

Why do we need such an object? Because the kmem_cache_t structure itself is also a small memory object, it should also be allocated by the slab allocator, but in this way, the problem of "whether the egg or the chicken comes first" appears. When the system is initialized, the slab allocator has not been initialized, so the slab allocator cannot be used to allocate a kmem_cache_t object. At this time, the slab allocator can only be managed by defining a kmem_cache_t type static variable, so the cache_cache static variable It is used to manage the slab distributor.

As can be seen from the above code, the objsize field of cache_cache is set to the size of sizeof(kmem_cache_t), so this object is mainly used to allocate different types of kmem_cache_t objects.

The kmem_cache_init() function calls the kmem_cache_estimate() function to calculate how many objects of size cache_cache.objsize can be stored in a slab, and save them in the cache_cache.num field. It is impossible to allocate all objects in a slab. For example: a 4096-byte slab is used to allocate 22-byte objects, which can be divided into 186, but the remaining 4 bytes cannot be used. , So this part of the memory is used as the shading area. The function of the shaded area is to stagger different slabs so that the CPU can cache slabs more effectively. Of course, this belongs to the optimization part and does not have much impact on the slab allocation algorithm. That is to say, even if the slab is not colored, the slab allocation algorithm can still work.

kmem_cache_t object application

kmem_cache_t is used to manage and allocate objects, so when you want to use the slab allocator, you must first apply for a kmem_cache_t object. Applying for a kmem_cache_t object is performed by the kmem_cache_create() function:

1. kmem_cache_t *kmem_cache_create ( 
2.  const char *name, 
3.     size_t size, 
4.     size_t offset, 
5.  unsigned long flags, 
6.  void (*ctor)(void*, kmem_cache_t *, unsigned long), 
7.  void (*dtor)(void*, kmem_cache_t *, unsigned long) 
8. ) {
    
     
9.     ... 
10.     cachep = (kmem_cache_t *) kmem_cache_alloc(&cache_cache, SLAB_KERNEL); 
11.  if (!cachep) 
12.  goto opps; 
13.     memset(cachep, 0, sizeof(kmem_cache_t)); 
14.     ... 
15.  do {
    
    
16.  unsigned int break_flag = 0; 
17. cal_wastage: 
18.         kmem_cache_estimate(cachep->gfporder, size, flags, 
19.                         &left_over, &cachep->num); 
20.  if (break_flag) 
21.  break;
22.  if (cachep->gfporder >= MAX_GFP_ORDER) 
23.  break; 
24.  if (!cachep->num)
25.  goto next; 
26.  if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit) {
    
     
27.  /* Oops, this num of objs will cause problems. */ 
28.             cachep->gfporder--; 
29.             break_flag++; 
30.  goto cal_wastage; 
31.         } 
32.  
33.  if (cachep->gfporder >= slab_break_gfp_order) 
34.  break; 
35.  
36.  if ((left_over*8) <= (PAGE_SIZE<<cachep->gfporder)) 
37.  break;    /* Acceptable internal fragmentation. */ 
38. next: 
39.         cachep->gfporder++; 
40.     } while (1); 
41.  
42.  if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
    
     
43.         flags &= ~CFLGS_OFF_SLAB; 
44.         left_over -= slab_size; 
45.     } 
46.  
47.  /* Offset must be a multiple of the alignment. */ 
48.     offset += (align-1); 
49.     offset &= ~(align-1); 
50.  if (!offset) 
51.         offset = L1_CACHE_BYTES; 
52.     cachep->colour_off = offset; 
53.     cachep->colour = left_over/offset; 
54.  
55.     cachep->flags = flags; 
56.     cachep->gfpflags = 0; 
57.  if (flags & SLAB_CACHE_DMA) 
58.         cachep->gfpflags |= GFP_DMA; 
59.     spin_lock_init(&cachep->spinlock); 
60.     cachep->objsize = size; 
61.     INIT_LIST_HEAD(&cachep->slabs_full); 
62.     INIT_LIST_HEAD(&cachep->slabs_partial); 
63.     INIT_LIST_HEAD(&cachep->slabs_free); 
64.  
65.  if (flags & CFLGS_OFF_SLAB) 
66.         cachep->slabp_cache = kmem_find_general_cachep(slab_size,0); 
67.     cachep->ctor = ctor; 
68.     cachep->dtor = dtor; 
69.     strcpy(cachep->name, name); 
70.  
71.     down(&cache_chain_sem); 
72.     {
    
     
73.  struct list_head *p; 
74.  
75.         list_for_each(p, &cache_chain) {
    
     
76.             kmem_cache_t *pc = list_entry(p, kmem_cache_t, next); 
77.         } 
78.     } 
79.  
80.     list_add(&cachep->next, &cache_chain); 
81.     up(&cache_chain_sem); 
82. opps: 
83.  return cachep; 
84. }

The kmem_cache_create() function is relatively long, so the code above removes some less important points to make the code more clearly reflect its principle.

In the kmem_cache_create() function, first call kmem_cache_alloc() to apply for a kmem_cache_t object. We see that when kmem_cache_alloc() is called, the cache_cache variable is passed in. After applying for the kmem_cache_t object, it needs to be initialized, mainly to initialize all the fields of the kmem_cache_t object: 1) Calculate how many pages are needed as the size of the slab. 2) Calculate how many objects a slab can allocate. 3) Calculate the coloring area information. 4) Initialize the slab_full / slab_partial / slab_free linked list. 5) Save the applied kmem_cache_t object in the cache_chain linked list.

Object allocation

After applying for the kmem_cache_t object, use the kmem_cache_alloc() function to apply for the specified object. The kmem_cache_alloc() function code is as follows:

1. static inline void * 
2. kmem_cache_alloc_one_tail (kmem_cache_t *cachep, slab_t *slabp) 
3. {
    
     
4.  void *objp; 
5.  
6.     slabp->inuse++; 
7.     objp = slabp->s_mem + slabp->free*cachep->objsize; 
8.     slabp->free = slab_bufctl(slabp)[slabp->free]; 
9.  
10.  if (unlikely(slabp->free == BUFCTL_END)) {
    
     
11.         list_del(&slabp->list); 
12.         list_add(&slabp->list, &cachep->slabs_full); 
13.     } 
14.  return objp; 
15. }
16.  
17. static inline void * 
18. __kmem_cache_alloc(kmem_cache_t *cachep, int flags) 
19. {
    
     
20.  unsigned long save_flags; 
21.  void* objp; 
22.  struct list_head * slabs_partial, * entry; 
23.     slab_t *slabp;     
24.  
25.     kmem_cache_alloc_head(cachep, flags); 
26. try_again: 
27.     local_irq_save(save_flags); 
28.  
29.     slabs_partial = &(cachep)->slabs_partial;
30.     entry = slabs_partial->next;
31.  
32.  if (unlikely(entry == slabs_partial)) {
    
    
33.  struct list_head * slabs_free;
34.         slabs_free = &(cachep)->slabs_free;
35.         entry = slabs_free->next;
36.  if (unlikely(entry == slabs_free))
37.  goto alloc_new_slab;
38.         list_del(entry);
39.         list_add(entry, slabs_partial);
40.     }
41.  
42.     slabp = list_entry(entry, slab_t, list); 
43.     objp = kmem_cache_alloc_one_tail(cachep, slabp); 
44.  
45.     local_irq_restore(save_flags); 
46.  return objp; 
47.  
48. alloc_new_slab: 
49.     local_irq_restore(save_flags); 
50.  if (kmem_cache_grow(cachep, flags)) 
51.  goto try_again; 
52.  return NULL; 
53. }

After the kmem_cache_alloc() function is expanded by me, as in the above code, the main steps of the kmem_cache_alloc() function are: 1) Find whether there is a slab available from the slab_partial list of the kmem_cache_t object, and if so, allocate an object directly from the slab. 2) If there is no available slab in the slab_partial list, then find the available slab from the slab_free list. If there is an available slab, allocate an object from the slab and place the slab in the slab_partial list. 3) If there is no available slab in the slab_free list, then call kmem_cache_grow() function to apply for a new slab for object allocation.

The structure of a slab is as follows:
Insert picture description here

The gray part is the colored area, the green part is the slab management structure, the yellow part is the index of the idle object linked list, and the red part is the entity of the object. We can see that the s_mem field of the slab structure points to the starting address of the object entity list.

When assigning objects, first check whether there are free objects available through the free field of the slab structure. The free field stores the index of the first node of the free object linked list.

Object release

The release of the object is relatively simple, mainly by calling the kmem_cache_free() function, and the kmem_cache_free() function will eventually call the kmem_cache_free_one() function, the code is as follows:

1. static inline void 
2. kmem_cache_free_one(kmem_cache_t *cachep, void *objp) 
3. {
    
     
4.     slab_t* slabp; 
5.  
6.     {
    
     
7.  unsigned int objnr = (objp-slabp->s_mem)/cachep->objsize;  
8.         slab_bufctl(slabp)[objnr] = slabp->free; 
9.         slabp->free = objnr; 
10.     } 
11.  
12.  /* fixup slab chains */ 
13.     {
    
     
14.  int inuse = slabp->inuse; 
15.  if (unlikely(!--slabp->inuse)) {
    
     
16.  /* Was partial or full, now empty. */ 
17.             list_del(&slabp->list); 
18.             list_add(&slabp->list, &cachep->slabs_free); 
19.         } else if (unlikely(inuse == cachep->num)) {
    
     
20.  /* Was full. */ 
21.             list_del(&slabp->list); 
22.             list_add(&slabp->list, &cachep->slabs_partial); 
23.         } 
24.     } 
25. }

When the object is released, the index of the object is first added to the free object list of the slab, and then the slab is moved to the appropriate list according to the usage of the slab. 1) If all objects in the slab are released, put the slab in the slab_free list. 2) If the slab where the object is located was originally in slab_full, then move the slab to slab_partial.