article | content |
---|---|
Linux memory management: Bootmem takes the lead | Bootmem Start process memory allocator |
Linux Memory Management: The Buddy System is long overdue | Buddy System Partner system memory allocator |
Linux memory management: Slab makes its debut | Slab memory allocator |
This is the third article in the source code analysis column
It is mainly divided into four major modules for analysis: memory management, device management, system startup and other parts.
Memory management is divided into three parts Bootmem
: , Buddy System
and Slab
. This article mainly explains the Slab
startup process.
Table of contents
structure
Let’s first look Linux
at several memory allocation methods:
kmalloc
: Called , donekmem_cache_alloc
through the allocatorslab
vmalloc
: Allocate linearly continuous, physically discontinuous addresses, allocated in the high-end memory areakzalloc
:kmalloc
Similar to, this function will zero out the pagemalloc
: allocate user memory
bootmem system
,buddy system
andslab system
are collectively calledLinux
the three main forces of memory management.
bootmem
It is a rough memory management system during system initialization.buddy
It is a large-scale memory management system after the system is officially running.slab
It is a small memory management system after the system is officially running.
The partner system allocates memory in units of pages. Memory allocation in page units will cause waste, so the memory must be allocated into smaller units.
Linux
There are three mechanisms for allocating cores to small slices slab
, slub
andslob
What is slab
an allocator?
slab
The allocator management slab
list slab
consists of pages, and its pages slab
consist of managed objects with a specific size. Depending on the memory allocation request, the object status may be "in use" or "not yet used".
slab
These managed objects may be task
structures, page
structures, etc. that are widely used in the kernel, or they may be memory of a specific size.
slab
The layer is its cache mechanism, which is managed based on objects. Its core is to allocate actual physical pages by the partner system.
Two functions:
- Fine-grained memory regions for kernel allocation
- Used as a cache, mainly for objects that are frequently allocated and released
The memory area of the cache is divided into multiple areas slab
, each slab
consisting of one or more contiguous page frames. These page frames protect allocated objects and the pages contain free objects.
Several basic concepts:
struct cache
slab
: Dingxi with management , most of this information is to increase or decreaseslab
the size of the unit, orslab
the number of objects managed withincache_cache
: Managementcache
descriptorkmem_cache
structureslab
. That is, eachcache
descriptor information is placedslab
in onecache_cache
to manage these specificslab
kmem_cache and kmem_list3
kmem_cache
Structure:
struct kmem_cache {
/* 1) per-cpu data, touched during every alloc/free */
struct array_cache *array[NR_CPUS]; // per_cpu数据
/* 2) Cache tunables. Protected by cache_chain_mutex */
unsigned int batchcount; // 可用per-cpu同时获得的对象数
unsigned int limit; // per-cpu列表管理的最大对象数
unsigned int shared;
unsigned int buffer_size; // 包含所有对象的cache对象
u32 reciprocal_buffer_size;
/* 3) touched by every alloc & free from the backend */
unsigned int flags; /* 缓存属性标签 */
unsigned int num; /* 每个slab的对象数 */
/* 4) cache_grow/shrink */
/* order of pgs per slab (2^n) */
unsigned int gfporder; // 页分配单位
/* force GFP flags, e.g. GFP_DMA */
gfp_t gfpflags; // gfp标签
size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
struct kmem_cache *slabp_cache;
unsigned int slab_size;
unsigned int dflags; /* dynamic flags */
/* constructor func */
void (*ctor)(void *obj);
/* 5) cache creation/removal */
const char *name;
struct list_head next; // 连接cache的链表
/* 6) statistics */
#if STATS
unsigned long num_active;
unsigned long num_allocations;
unsigned long high_mark;
unsigned long grown;
unsigned long reaped;
unsigned long errors;
unsigned long max_freeable;
unsigned long node_allocs;
unsigned long node_frees;
unsigned long node_overflow;
atomic_t allochit;
atomic_t allocmiss;
atomic_t freehit;
atomic_t freemiss;
#endif
#if DEBUG
int obj_offset;
int obj_size;
#endif
struct kmem_list3 *nodelists[MAX_NUMNODES]; // 按节点排列kmem_list3项
};
Several member variables to pay attention to are
struct list_head next
: connected cache
linked list
struct kmem_list3 *nodelists[MAX_NUMNODES]
: Stores slab
the first node of three linked lists
struct array_cache *array[NR_CPUS]
: Manage each object list under different CPU
managementcache
struct array_cache {
unsigned int avail; // per-cpu可用对象数
unsigned int limit; // 最大对象数
unsigned int batchcount; // 可从slab同时获得的对象数
unsigned int touched; // 显示是否勇敢cache
spinlock_t lock;
void *entry[]; // 不同per-cpu使用的slab的对象地址列表
};
The purpose of different
cpu
placementarray_cache
is to improve performance when allocating memory.
Theentry[]
array stores the currentlycpu
usedslab
object address rather thanslab
the linked list position .
Linux
In the kernel source code slab
, the structures of slub
, and are all set toslob
struct kmem_cache
kmem_list3
, where 3
the representation 3
is managed in a way, one kmem_list3
refers to an array slabs
containing three slab
head nodes of the linked list
/*
* The slab lists for all objects.
*/
struct kmem_list3 {
struct list_head slabs_partial; // slabs_partial列表
struct list_head slabs_full; // slabs_full列表
struct list_head slabs_free; // slabs_free列表
unsigned long free_objects; // 可分配的对象数
unsigned int free_limit; // 尚未分配的可允许的最大对象数
unsigned int colour_next; /* 各节点的cache coloring */
spinlock_t list_lock;
struct array_cache *shared; /* 在相同节点内共享 */
struct array_cache **alien; /* 在不同节点内共享 */
unsigned long next_reap; /* 更新时间 */
int free_touched; /* 显示是否用过cache */
};
Regarding
slab
the organization of objects, you can view the memory management principles of modern operating systems: taking Linux2.6.xx as an example ofslab
the allocator
kmem_list3_init
Before
slab
initialization, it is impossible tokmalloc
allocate some objects necessary in the initialization process. You can only use static global variables or partner systems. The method of static global variables is used here. After initialization, dynamically allocated objectsslab
will be used to replace them.kmalloc
global variables
slab
allocatorbuddy
after allocator so it can usebuddy
the full functionality of
In the kernel initialization phase, initialization is performed by calling kmem_cache_init
a function, but the memory allocated from itslab
needs to be used , so the "chicken or egg" problem arises. In order to solve this problem, the allocator divides the initialization into five stages.slab
slab
static enum {
NONE,
PARTIAL_AC,
PARTIAL_L3,
EARLY,
FULL
} g_cpucache_up;
NONE
At this stage, slab
the allocator can only provide one cache_cache
cache. At this stage, slab
the allocator needs to build the cache with cache_cache
static data.cache_array
kmem_cache_init
Initialize slab
allocator
for (i = 0; i < NUM_INIT_LISTS; i++) {
// NUM_INIT_LISTS = 3 * NUM_MAXNODES
kmem_list3_init(&initkmem_list3[i]);
if (i < MAX_NUMNODES)
cache_cache.nodelists[i] = NULL;
}
kmem_list3_init()
Function is used to initialize struct kmem_list3
a structure instance, in slab
which struct cache
the structure contains nodelists
members that node
specify a struct kmem_list3
structure for each
Each
struct kmem_list3
structure contains three linked lists, each linked list maintains a certain number ofslab
slab_partial
The linked list maintainedslab
contains a certain number of availableobject
slab_full
slab
There are no available ones maintained on the linked list.object
slab_free
slab
Everything maintained on the linked listobject
is available
static void kmem_list3_init(struct kmem_list3 *parent)
{
INIT_LIST_HEAD(&parent->slabs_full);
INIT_LIST_HEAD(&parent->slabs_partial);
INIT_LIST_HEAD(&parent->slabs_free);
parent->shared = NULL;
parent->alien = NULL;
parent->colour_next = 0;
spin_lock_init(&parent->list_lock);
parent->free_objects = 0;
parent->free_touched = 0;
}
#define NUM_INIT_LISTS (3 * MAX_NUMNODES) // 3倍的机器数量
#ifdef CONFIG_NODES_SHIFT
#define NODES_SHIFT CONFIG_NODES_SHIFT
#else
#define NODES_SHIFT 0
#endif
#define MAX_NUMNODES (1 << NODES_SHIFT) // 机器支持的最大的节点数量
About initkmem_list3
, is a static variable
The memory allocated here is either the partner system or a static variable. It is more troublesome to release the memory allocated by the partner system at this time. Therefore, it is better to use the static variable directly at this time and replace it later, and it means that it will be automatically destroyed after the system starts
__initdata
.
struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
set_up_list3s
set_up_list3s(&cache_cache, CACHE_CACHE);
About cache_cache
is also a static variable
/* internal cache of cache description objs */
static struct kmem_cache cache_cache = {
.batchcount = 1,
.limit = BOOT_CPUCACHE_ENTRIES,
.shared = 1,
.buffer_size = sizeof(struct kmem_cache),
.name = "kmem_cache",
};
struct kmem_cache {
/* 1) per-cpu data, touched during every alloc/free */
struct array_cache *array[NR_CPUS];
/* 2) Cache tunables. Protected by cache_chain_mutex */
unsigned int batchcount;
unsigned int limit;
unsigned int shared;
unsigned int buffer_size; // 包含所有对象的cache大小
u32 reciprocal_buffer_size;
/* 3) touched by every alloc & free from the backend */
unsigned int flags; /* constant flags */
unsigned int num; /* # of objs per slab */
/* 4) cache_grow/shrink */
/* order of pgs per slab (2^n) */
unsigned int gfporder;
/* force GFP flags, e.g. GFP_DMA */
gfp_t gfpflags;
size_t colour; /* cache colouring range */
unsigned int colour_off; /* colour offset */
struct kmem_cache *slabp_cache;
unsigned int slab_size;
unsigned int dflags; /* dynamic flags */
/* constructor func */
void (*ctor)(void *obj);
/* 5) cache creation/removal */
const char *name;
struct list_head next;
/* 6) statistics */
#if STATS
unsigned long num_active;
unsigned long num_allocations;
unsigned long high_mark;
unsigned long grown;
unsigned long reaped;
unsigned long errors;
unsigned long max_freeable;
unsigned long node_allocs;
unsigned long node_frees;
unsigned long node_overflow;
atomic_t allochit;
atomic_t allocmiss;
atomic_t freehit;
atomic_t freemiss;
#endif
#if DEBUG
/*
* If debugging is enabled, then the allocator can add additional
* fields and/or padding to every object. buffer_size contains the total
* object size including these internal fields, the following two
* variables contain the offset to the user object and its size.
*/
int obj_offset;
int obj_size;
#endif
/*
* We put nodelists[] at the end of kmem_cache, because we want to size
* this array to nr_node_ids slots instead of MAX_NUMNODES
* (see kmem_cache_init())
* We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
* is statically defined, so we reserve the max number of nodes.
*/
struct kmem_list3 *nodelists[MAX_NUMNODES];
/*
* Do not add fields after nodelists[]
*/
};
/*
* For setting up all the kmem_list3s for cache whose buffer_size is same as
* size of kmem_list3.
*/
static void __init set_up_list3s(struct kmem_cache *cachep, int index)
{
int node;
for_each_online_node(node) {
// cachep指向缓存对象的struct kmem_cache
// index指向node偏移
// 将nodelists成员设置为initkmem_list3数组中特定的struct kmem_list3成员
cachep->nodelists[node] = &initkmem_list3[index + node];
cachep->nodelists[node]->next_reap = jiffies +
REAPTIMEOUT_LIST3 +
((unsigned long)cachep) % REAPTIMEOUT_LIST3;
}
}
In fact, there has always been a question here, why use
CACHE_CACHE
such an inexplicable thing as a subscript. It turns out that everything startinginitkmem_list3
from the beginning is prepared for , and each one manages a three-linked list.CACHE_CACHE
kmem_list3
cache_cache
nodelists[]
kmem_list3
slab
The allocator struct kmem_cache
manages the cache object, which contains members nodelists
, which are used to node
allocate a struct kmem_list3
structure for each system. The struct kmem_list3
structure is maintained from the current node
physical page , so this function is used to set the structure of each slab
cache object.node
kmem_list3
This function is only used in
slab
the initial phase of the allocator, so it is marked__init
node = numa_node_id();
/* 1) create the cache_cache */
INIT_LIST_HEAD(&cache_chain);
list_add(&cache_cache.next, &cache_chain);
slab
The allocator first initializes the cache global linked list cache_chain
and then cache_cache
inserts into the linked list.
static inline void INIT_LIST_HEAD(struct list_head *list) // 初始化链表头
{
list->next = list;
list->prev = list;
}
#ifndef CONFIG_DEBUG_LIST
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next) // 插入到链表
{
next->prev = new;
new->next = next;
new->prev = prev;
prev->next = new;
}
#else
extern void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next);
#endif
cache_cache.colour_off = cache_line_size();
cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node];
/*
* struct kmem_cache size depends on nr_node_ids, which
* can be less than MAX_NUMNODES.
*/
cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
nr_node_ids * sizeof(struct kmem_list3 *);
#if DEBUG
cache_cache.obj_size = cache_cache.buffer_size;
#endif
cache_cache.buffer_size = ALIGN(cache_cache.buffer_size,
cache_line_size());
cache_cache.reciprocal_buffer_size =
reciprocal_value(cache_cache.buffer_size);
Certain attributes are specified
colour_off
cache_cache
Shading length specifiedarray
Local cache set upbuffer_size
struct kmem_cache
The cached object length is set and aligned
Regarding the shading properties
colour
, in order to improvecpu
the efficiency of cache utilization,slab
shading is a scheme that uses different lines in the hardware cacheslab
for objects in different objectscpu
. By placing the objects atslab
different starting offsets in the cache, the objects may becpu
used in the cache. different rows, thus ensuring thatslab
objects from the same cache are less likely to refresh each other
Which initarray_cache
is static data
static struct arraycache_init initarray_cache __initdata =
{
{
0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
struct arraycache_init {
struct array_cache cache;
void *entries[BOOT_CPUCACHE_ENTRIES];
};
struct array_cache {
unsigned int avail;
unsigned int limit;
unsigned int batchcount;
unsigned int touched;
spinlock_t lock;
void *entry[];
};
array_cache
It is called the local cache. Generally,slab
when allocating objects, it will be searched from the local cache first. If the local cache is empty, the cache in the shared cache will be moved to the local cache; if the shared cache is empty, it will be allocatedslab
from it Cache between nodes; ifslab
there are no shared objects, allocate new onesslab
and reallocate the cache
kmem_cache->array[]
, local cachekmem_cache->shared
, shared cache, which caches the cache that overflows withcpu
all local caches currently on the same nodecpu
kmem_cache->alien
, stored in other nodesslab cached objects
, when allocated on a node is released on another node (slab
, will be added to the cache of the node where the object is located, otherwise added to the local cache or shared cache), when the cache is full , will call relocation to cacheobject
slab->nodeid!=numa_node_id()
alien
alien
cache_free_alien()
shared
cache_estimate
Calculate buddy
the number of physical pages the allocator needs to provide in order to cache_cache
buildslab
In each loop, the call cache_estimate
calculates the number of physical pages cache_cache
used by the cache , the number of cache objects that have been maintained, and finally the wasted memory length.slab
slab
slab
for (order = 0; order < MAX_ORDER; order++) {
cache_estimate(order, cache_cache.buffer_size,
cache_line_size(), 0, &left_over, &cache_cache.num);
if (cache_cache.num)
break;
}
order
Refers to the rank of the partner system
static void cache_estimate(unsigned long gfporder, size_t buffer_size,
size_t align, int flags, size_t *left_over,
unsigned int *num)
{
int nr_objs;
size_t mgmt_size;
size_t slab_size = PAGE_SIZE << gfporder; // 从buddy获得的内存的长度
if (flags & CFLGS_OFF_SLAB) {
// 是否将管理数据放在slab之外
mgmt_size = 0;
nr_objs = slab_size / buffer_size; // 计算这段内存最大容纳对象的数量
if (nr_objs > SLAB_LIMIT)
nr_objs = SLAB_LIMIT;
} else {
nr_objs = (slab_size - sizeof(struct slab)) /
(buffer_size + sizeof(kmem_bufctl_t));
if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
> slab_size)
nr_objs--;
if (nr_objs > SLAB_LIMIT)
nr_objs = SLAB_LIMIT;
mgmt_size = slab_mgmt_size(nr_objs, align);
}
*num = nr_objs; // 可分配缓存对象数量
*left_over = slab_size - nr_objs*buffer_size - mgmt_size; // 剩余长度
}
slab
Management objects are divided into two modes:
-
Internal management: put it
BUFCTL_END
later -
External management
struct slab
:s_mem
Specifyobject
the starting position by (at this time,colour
it is also managed externally)
If slab
the correct data for the build is found, slab
the data occupying the physical page is stored in the cache_cache
cachegfporder
cache_cache.gfporder = order;
cache_cache.colour = left_over / cache_cache.colour_off; // 确定着色范围
cache_cache.slab_size = ALIGN(cache_cache.num * sizeof(kmem_bufctl_t) +
sizeof(struct slab), cache_line_size()); // 存储slab管理数据的长度
kmem_cache_create
General caches. The length of objects in these general caches is a power of 2. These caches are also called anonymous caches.
slab
cache_array
The allocator establishes the corresponding universal cache by calculating the length, and kmem_cache_create
creates
struct cache_sizes {
size_t cs_size;
struct kmem_cache *cs_cachep;
#ifdef CONFIG_ZONE_DMA
struct kmem_cache *cs_dmacachep;
#endif
};
struct cache_sizes malloc_sizes[] = {
#define CACHE(x) {
.cs_size = (x) },
#include <linux/kmalloc_sizes.h>
CACHE(ULONG_MAX)
#undef CACHE
};
EXPORT_SYMBOL(malloc_sizes);
/* Must match cache_sizes above. Out of line to keep cache footprint low. */
struct cache_names {
char *name;
char *name_dma;
};
static struct cache_names __initdata cache_names[] = {
#define CACHE(x) {
.name = "size-" #x, .name_dma = "size-" #x "(DMA)" },
#include <linux/kmalloc_sizes.h>
{
NULL,}
#undef CACHE
};
ininclude/linux/kmalloc_sizes.h
#if (PAGE_SIZE == 4096)
CACHE(32)
#endif
CACHE(64)
#if L1_CACHE_BYTES < 64
CACHE(96)
#endif
CACHE(128)
#if L1_CACHE_BYTES < 128
CACHE(192)
#endif
CACHE(256)
CACHE(512)
CACHE(1024)
CACHE(2048)
CACHE(4096)
CACHE(8192)
CACHE(16384)
CACHE(32768)
CACHE(65536)
CACHE(131072)
#if KMALLOC_MAX_SIZE >= 262144
CACHE(262144)
#endif
#if KMALLOC_MAX_SIZE >= 524288
CACHE(524288)
#endif
#if KMALLOC_MAX_SIZE >= 1048576
CACHE(1048576)
#endif
#if KMALLOC_MAX_SIZE >= 2097152
CACHE(2097152)
#endif
#if KMALLOC_MAX_SIZE >= 4194304
CACHE(4194304)
#endif
#if KMALLOC_MAX_SIZE >= 8388608
CACHE(8388608)
#endif
#if KMALLOC_MAX_SIZE >= 16777216
CACHE(16777216)
#endif
#if KMALLOC_MAX_SIZE >= 33554432
CACHE(33554432)
#endif
The kmem_cache_create
function body is relatively complex, but its main function is to create a cache
name
Specifies the name of the cachesize
Specify cache lengthalign
Specifies cache alignmentflags
Indicate cache creation flagctor
Points to the function that builds the cache
struct kmem_cache *
kmem_cache_create (const char *name, size_t size, size_t align,
unsigned long flags, void (*ctor)(void *))
{
size_t left_over, slab_size, ralign;
struct kmem_cache *cachep = NULL, *pc;
slab
The allocator is used to struct kmem_cache
represent a cache, which contains the basic information of the cache, including the local cache, shared cache, and linked slab
list of the cache.
The cache creation process is used to populate and create the data required for the cache
First perform basic detection, including whether the cached name exists, whether it is interrupted at this time, and whether the cached object is smaller BYTES_PER_WORD
or larger thanKMALLOC_MAX_SIZE
/*
* Sanity checks... these are all serious usage bugs.
*/
if (!name || in_interrupt() || (size < BYTES_PER_WORD) ||
size > KMALLOC_MAX_SIZE) {
printk(KERN_ERR "%s: Early error in slab %s\n", __func__,
name);
BUG();
}
Regarding the name of the cache at this time, see its calling point and create it based on the INDEX_AC
obtained array.sizes
Here is the general cache,
sizes[INDEX_AC].cs_cachep
referring tokmem_cache
the structureNote :
struct kmem_list3
It iscache_cache
managed by the owner
sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,
sizes[INDEX_AC].cs_size,
ARCH_KMALLOC_MINALIGN,
ARCH_KMALLOC_FLAGS|SLAB_PANIC,
NULL);
This index can be explored in depth. Its main purpose is to detect caches that can be found in the general cache CACHE()
and are suitable for allocating the size of this structure.
static __always_inline int index_of(const size_t size)
{
extern void __bad_size(void);
if (__builtin_constant_p(size)) {
int i = 0;
#define CACHE(x) \
if (size <=x) \
return i; \
else \
i++;
#include <linux/kmalloc_sizes.h>
#undef CACHE
__bad_size();
} else
__bad_size();
return 0;
}
#define INDEX_AC index_of(sizeof(struct arraycache_init))
#define INDEX_L3 index_of(sizeof(struct kmem_list3))
slab
Visible on any system using it
liuzixuan@lzx-ubuntu ~ # cat /proc/slabinfo | awk -v OFS="\t" -F' ' '{print $1,$3,$4,$5,$6}'
kmalloc-8k 96 8192 4 8
kmalloc-4k 3128 4096 8 8
kmalloc-2k 1760 2048 16 8
kmalloc-1k 4096 1024 16 4
kmalloc-512 4112 512 16 2
kmalloc-256 1808 256 16 1
kmalloc-192 14679 192 21 1
kmalloc-128 1888 128 32 1
kmalloc-96 6090 96 42 1
kmalloc-64 57728 64 64 1
kmalloc-32 70912 32 128 1
kmalloc-16 16896 16 256 1
kmalloc-8 10240 8 512 1
Next, lock the mutex by get_online_cpus
getting it available CPU
and then callingmutex_lock
/*
* We use cache_chain_mutex to ensure a consistent view of
* cpu_online_mask as well. Please see cpuup_callback
*/
get_online_cpus();
mutex_lock(&cache_chain_mutex);
Iterate through all caches
list_for_each_entry(pc, &cache_chain, next) {
char tmp;
int res;
/*
* This happens when the module gets unloaded and doesn't
* destroy its slab cache and no-one else reuses the vmalloc
* area of the module. Print a warning.
*/
res = probe_kernel_address(pc->name, tmp);
if (res) {
printk(KERN_ERR
"SLAB: cache with size %d has lost its name\n",
pc->buffer_size);
continue;
}
if (!strcmp(pc->name, name)) {
// 不允许名字相同的高速缓存存在
printk(KERN_ERR
"kmem_cache_create: duplicate cache %s\n", name);
dump_stack();
goto oops;
}
}
Alignment-related operations are performed based on the flags provided when the cache is created, and the alignment results will eventually be stored ralign
in
if (size & (BYTES_PER_WORD - 1)) {
size += (BYTES_PER_WORD - 1);
size &= ~(BYTES_PER_WORD - 1);
}
/* calculate the final buffer alignment: */
/* 1) arch recommendation: can be overridden for debug */
if (flags & SLAB_HWCACHE_ALIGN) {
/*
* Default alignment: as specified by the arch code. Except if
* an object is really small, then squeeze multiple objects into
* one cacheline.
*/
ralign = cache_line_size();
while (size <= ralign / 2)
ralign /= 2;
} else {
ralign = BYTES_PER_WORD;
}
/*
* Redzoning and user store require word alignment or possibly larger.
* Note this will be overridden by architecture or caller mandated
* alignment if either is greater than BYTES_PER_WORD.
*/
if (flags & SLAB_STORE_USER)
ralign = BYTES_PER_WORD;
if (flags & SLAB_RED_ZONE) {
ralign = REDZONE_ALIGN;
/* If redzoning, ensure that the second redzone is suitably
* aligned, by adjusting the object size accordingly. */
size += REDZONE_ALIGN - 1;
size &= ~(REDZONE_ALIGN - 1);
}
/* 2) arch mandated alignment */
if (ralign < ARCH_SLAB_MINALIGN) {
ralign = ARCH_SLAB_MINALIGN;
}
/* 3) caller mandated alignment */
if (ralign < align) {
ralign = align;
}
function kmem_cache_zalloc
allocates a new cachestruct kmem_cache
cachep = kmem_cache_zalloc(&cache_cache, GFP_KERNEL);
if (!cachep)
goto oops;
Used to determine slab
whether the default management data is located slab
internally or externally
The conditions for judgment are
slab_early_init
andsize
/*
* Determine if the slab management is 'on' or 'off' slab.
* (bootstrapping cannot cope with offslab caches so don't do
* it too early on.)
*/
if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init)
/*
* Size is large, assume best to place the slab management obj
* off-slab (should allow better packing of objs).
*/
flags |= CFLGS_OFF_SLAB;
size = ALIGN(size, align); // 获得高速缓存对象对齐之后的长度
// 计算出高速缓存每个slab维护对象的数量和占用物理页的数量
left_over = calculate_slab_order(cachep, size, align, flags);
if (!cachep->num) {
printk(KERN_ERR
"kmem_cache_create: couldn't create cache %s.\n", name);
kmem_cache_free(&cache_cache, cachep);
cachep = NULL;
goto oops;
}
slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
+ sizeof(struct slab), align);
Data input
if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
flags &= ~CFLGS_OFF_SLAB;
left_over -= slab_size;
}
if (flags & CFLGS_OFF_SLAB) {
/* really off slab. No need for manual alignment */
slab_size =
cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
}
cachep->colour_off = cache_line_size();
/* Offset must be a multiple of the alignment. */
if (cachep->colour_off < align)
cachep->colour_off = align;
cachep->colour = left_over / cachep->colour_off;
cachep->slab_size = slab_size;
cachep->flags = flags;
cachep->gfpflags = 0;
if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
cachep->gfpflags |= GFP_DMA;
cachep->buffer_size = size;
cachep->reciprocal_buffer_size = reciprocal_value(size);
if (flags & CFLGS_OFF_SLAB) {
cachep->slabp_cache = kmem_find_general_cachep(slab_size, 0u);
/*
* This is a possibility for one of the malloc_sizes caches.
* But since we go off slab only for object size greater than
* PAGE_SIZE/8, and malloc_sizes gets created in ascending order,
* this should not happen at all.
* But leave a BUG_ON for some lucky dude.
*/
BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
}
cachep->ctor = ctor;
cachep->name = name;
Set up local cache, shared cache and slab
linked list of caches
if (setup_cpu_cache(cachep)) {
__kmem_cache_destroy(cachep);
cachep = NULL;
goto oops;
}
cache_chain
If the allocation is successful, the cache is inserted into the system cache linked list.
/* cache setup completed, link it into the list */
list_add(&cachep->next, &cache_chain);
The function finally returns a pointer to the cache cachep
and exports the function EXPORT_SYMBOL()
to other parts of the kernel for use
return cachep;
kmem_cache_alloc
Used to allocate a usable cache object from the cache object
void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));
trace_kmem_cache_alloc(_RET_IP_, ret,
obj_size(cachep), cachep->buffer_size, flags);
return ret;
}
static __always_inline void *
__cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
{
unsigned long save_flags;
void *objp;
lockdep_trace_alloc(flags);
if (slab_should_failslab(cachep, flags))
return NULL;
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
objp = __do_cache_alloc(cachep, flags); // 分配
local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
prefetchw(objp);
if (unlikely((flags & __GFP_ZERO) && objp))
memset(objp, 0, obj_size(cachep));
return objp;
}
static __always_inline void *
__do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
return ____cache_alloc(cachep, flags);
}
In slab
the allocator, in order to speed up the allocation of cache objects, the allocator builds a cache stack slab
for each cache object . You can quickly interact with the two available cache objects from the cache stack. The cache object is maintained using , each member in the array corresponds to a cache stack, so a cache stack is maintainedcpu
cpu
struct kmem_cache
array
struct cache_array
struct cache_array
In fact, it
cache_cache
manages the structurecache
of each otherkmem_cache
static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *objp;
struct array_cache *ac;
check_irq_off();
ac = cpu_cache_get(cachep); // 获得对应的缓存栈
if (likely(ac->avail)) {
STATS_INC_ALLOCHIT(cachep);
ac->touched = 1;
objp = ac->entry[--ac->avail];
} else {
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags);
}
return objp;
}
The allocator at this stage slab
is already available to allocate struct kmem_cache
objects. At this stage , slab
the allocator uses the obtained struct cache_array
length to build a general cache that matches the length. slab
The allocator can then use this cache to provide objects to slab
the allocator for Maintains a local cache of caches, only one runs at this stagestruct cache_array
struct cache_array
cpu
SLAB
The allocator next creates a cache corresponding to the system general cache and, if the macro is on, a cache for the general cache CONFIG_ZONE_DMA
as wellDMA
while (sizes->cs_size != ULONG_MAX) {
/*
* For performance, all the general caches are L1 aligned.
* This should be particularly beneficial on SMP boxes, as it
* eliminates "false sharing".
* Note for systems short on memory removing the alignment will
* allow tighter packing of the smaller caches.
*/
if (!sizes->cs_cachep) {
sizes->cs_cachep = kmem_cache_create(names->name,
sizes->cs_size,
ARCH_KMALLOC_MINALIGN,
ARCH_KMALLOC_FLAGS|SLAB_PANIC,
NULL);
}
#ifdef CONFIG_ZONE_DMA
sizes->cs_dmacachep = kmem_cache_create(
names->name_dma,
sizes->cs_size,
ARCH_KMALLOC_MINALIGN,
ARCH_KMALLOC_FLAGS|SLAB_CACHE_DMA|
SLAB_PANIC,
NULL);
#endif
sizes++;
names++;
}
PARTIAL_AC
Since the original cache_cache
local cache initarray_cache
is maintained with static data, struct cache_array
the general cache can be used at this time, so allocate memory kmalloc()
for ptr
pointers.
/* 4) Replace the bootstrap head arrays */
{
struct array_cache *ptr;
ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
Migrate cache_cache
the corresponding local cache data to ptr
the corresponding memory, then initialize ptr
the corresponding data, and then cache_cache
point the local cache toptr
memcpy(ptr, cpu_cache_get(&cache_cache),
sizeof(struct arraycache_init));
/*
* Do not assume that spinlocks can be initialized via memcpy:
*/
spin_lock_init(&ptr->lock);
cache_cache.array[smp_processor_id()] = ptr;
local_irq_enable();
Call to kmalloc()
allocate memory and ptr
point to the newly allocated memory
ptr = kmalloc(sizeof(struct arraycache_init), GFP_KERNEL);
local_irq_disable();
BUG_ON(cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep)
!= &initarray_generic.cache);
memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),
sizeof(struct arraycache_init));
malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =
ptr;
PARTIAL_L3
At this time, slab
the allocator is ready to update the static slab
linked list data to slab
the provided memory.
/* 5) Replace the bootstrap kmem_list3's */
{
int nid;
for_each_online_node(nid) {
// 遍历所有在线的node
init_list(&cache_cache, &initkmem_list3[CACHE_CACHE + nid], nid);
init_list(malloc_sizes[INDEX_AC].cs_cachep,
&initkmem_list3[SIZE_AC + nid], nid);
if (INDEX_AC != INDEX_L3) {
init_list(malloc_sizes[INDEX_L3].cs_cachep,
&initkmem_list3[SIZE_L3 + nid], nid);
}
}
}
During each traversal, the function replaces all the memory occupied by cache_cache
the cache and struct cache_array
the struct kmem_list3
corresponding linked list with the allocated memory.slab
slab
init_list
init_list
Used to create a new struct kmem_list3
structure, cache the original kmem_list3 slab
linked list data to the new one kmem_list3
, and point the cached linked slab
list to the new one.kmem_list3
cachep
Point to cache, list
point to slab
linked list, nodeid
indicate node
information
/*
* swap the static kmem_list3 with kmalloced memory
*/
static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
int nodeid)
{
struct kmem_list3 *ptr;
// 从指定的节点上分配内存
ptr = kmalloc_node(sizeof(struct kmem_list3), GFP_KERNEL, nodeid);
BUG_ON(!ptr);
local_irq_disable();
memcpy(ptr, list, sizeof(struct kmem_list3));
/*
* Do not assume that spinlocks can be initialized via memcpy:
*/
spin_lock_init(&ptr->list_lock);
// 将slab链表上的数据全部迁移到ptr对应的链表上
MAKE_ALL_LISTS(cachep, ptr, nodeid);
cachep->nodelists[nodeid] = ptr;
local_irq_enable();
}
EARLY
At this stage , the allocator will use the allocator to allocate slab
all data related to the allocator and no longer use static data.slab
FULL
Get started with slab
allocators
/* Done! */
g_cpucache_up = FULL;
SUPPLEMENT
object
It is a cache object (i.e. memory area). Each slab
linked list manages many cache objects, so struct array_cache
it void *entry[]
can also point to cache objects and use these memories.
struct slab {
struct list_head list;
unsigned long colouroff;
void *s_mem; /* including colour offset */
unsigned int inuse; /* num of objs active in slab */
kmem_bufctl_t free;
unsigned short nodeid;
};
Cache allocation process:
-
The general order of cache allocation is: local cache -> shared cache ->
slab
linked list ->buddy system
-
When there is no available cache object in the cache's local cache or shared cache, the cache will look it up in the
slab
linked listslabs_partial
andslabs_free
then put it back into the local cache. -
If the number of cache objects maintained on the local cache exceeds the upper limit, the local cache releases the cache objects back to the shared cache
-
If the number of cache objects maintained on the shared cache exceeds the upper limit, the shared cache will be released to
slab
the linked list
Cache classification
- Private/normal cache. It does not target specific objects in the kernel. It first provides
kmem_cache
a cache for the structure itself and saves it incache_cache
(this variable representscache_chain
the first element in the linked list) - The general cache is created by specifying specific objects according to the needs of the kernel.
slab
The size of each managed object is consistent. When you need to allocate a byte space, just go directly to the fixed byte spaceslab
to find it and then allocate it.
The biggest difference between them is that the general cache has allocated memory space, which
kmalloc()
can be obtained directly when used, and is not really released, while the dedicated cache requires steps such askfree()
finding the location, allocating memory, building and allocating memory in the partner system.slab
Now that we have a general-purpose cache, why do we need a dedicated cache?
When a certain data structure in your code needs to be allocated and released very frequently and has high performance requirements, you can consider creating a dedicated cache.
For memory areas that are expected to be frequently used, a set of dedicated buffers of a specific size can be created for processing to avoid memory fragmentation; for less used memory areas, a buffer (power of 2) can be created for processing. , even if this processing mode generates fragments, it will have little impact on the performance of the entire system