Linux memory management: principles of memory allocation and memory recycling


article content
Linux memory management: Bootmem takes the lead BootmemStart process memory allocator
Linux Memory Management: The Buddy System is long overdue Buddy SystemPartner system memory allocator
Linux memory management: Slab makes its debut Slabmemory allocator
Linux memory management: principles of memory allocation and memory recycling Memory allocation and memory recovery principles

This is the fourth article in the source code analysis column

It is mainly divided into four major modules for analysis: memory management, device management, system startup and other parts.

Memory management is divided into three parts: Bootmem, Buddy Systemand . Of course, in addition to memory initialization, there must also be memory allocation and memory recycling.Slab

Some todowill be added later


Reclaim Memory

Basic Concept

When the system memory pressure is high, each memory linuxin the system with high pressure will be recycled.zone

Memory recycling is mainly performed on anonymous pages and file pages.

  • For anonymous pages, some infrequently used anonymous pages will be filtered out during the memory recycling process, written to swapthe partition, and released to the partner system as free page frames.
  • For a file page, if the content saved in this file page is a clean page, there is no need to be able to write, and the free page will be released directly to the partner system; on the contrary, the dirty page will be written back to the disk first, and then released to the partner system.

However, there will be a drawback at this time, which is that it will I/Ocause great pressure. Therefore, in the system, zonea line is generally set for each one. When the number of free page frames does not meet this line, a memory recycling operation will be performed. ; Otherwise, memory recycling will not be performed.

Memory recycling is zonebased on units. Generally, zonethere are three lines:

  • watermark[WMARK_MIN]: This threshold will be used for allocation in slow allocation after fast allocation fails. If this value still cannot be allocated in slow allocation, direct memory recycling and fast memory recycling will be performed.
  • watermark[WMARK_LOW]: Low threshold, which is the default threshold for fast allocation. During the allocation process, if the zonenumber of free pages is lower than this threshold, the system will zoneperform fast memory recycling
  • watermark[WMARK_HIGH]: A high threshold is zonea value that is satisfactory for the number of free pages. Generally, when zoneperforming memory reclamation, the goal is zoneto increase the number of free pages to this value
liuzixuan@liuzixuan-ubuntu ~ # cat /proc/zoneinfo
Node 0, zone   Normal
  pages free     5179
        min      4189
        low      5236
        high     6283

For zonememory recycling, three things are mainly targeted: slab, lrupages in the linked list buffer_head, and pages in lruthe linked list. It is mainly used to manage the memory pages used in the process space. It mainly manages three types of pages: anonymous pages, file pages and shmempages .

The premise for judging whether a memory page can be recycled ispage->_count = 0

Self Allocator

Memory allocation alloc_pageand alloc_pagegenerally calls__alloc_pages_nodemask->__alloc_pages_internal

static inline struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
		struct zonelist *zonelist, nodemask_t *nodemask)
{
    
    
	return __alloc_pages_internal(gfp_mask, order, zonelist, nodemask);
}

__alloc_pages_internallowGenerally, a fast memory allocation with a threshold get_page_from_freelistand a minslow memory allocation with a usage threshold are performed.

struct page *
__alloc_pages_internal(gfp_t gfp_mask, unsigned int order,
			struct zonelist *zonelist, nodemask_t *nodemask)
{
    
    
    // ...
	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
			zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET);
	if (page)
		goto got_pg;

Fast memory allocation

The fast memory allocation function obtains the appropriate allocation get_page_from_freelistthrough thelow threshold . If the threshold is not reached , fast memory recycling will be performed. After fast memory recycling, allocation will be attempted again.zonelistzonezonelow

  • gfp_mask: used to apply for memorygfp mask
  • order: Apply for physical memory level
  • zonelist: array zoneof nodeszonelist
  • alloc_flags: Apply for memory after conversionflags
  • high_zoneidx: The maximum amount of memory allowed to be applied forzone

alloc flagsIt is used by buddy to apply for memory internally flagand determines some memory allocation behaviors:

/* The ALLOC_WMARK bits are used as an index to zone->watermark */
#define ALLOC_WMARK_MIN		WMARK_MIN
#define ALLOC_WMARK_LOW		WMARK_LOW
#define ALLOC_WMARK_HIGH	WMARK_HIGH
#define ALLOC_NO_WATERMARKS	0x04 /* don't check watermarks at all */
 
/* Mask to get the watermark bits */
#define ALLOC_WMARK_MASK	(ALLOC_NO_WATERMARKS-1)
 
/*
 * Only MMU archs have async oom victim reclaim - aka oom_reaper so we
 * cannot assume a reduced access to memory reserves is sufficient for
 * !MMU
 */
#ifdef CONFIG_MMU
#define ALLOC_OOM		0x08
#else
#define ALLOC_OOM		ALLOC_NO_WATERMARKS
#endif
 
#define ALLOC_HARDER		 0x10 /* try to alloc harder */
#define ALLOC_HIGH		 0x20 /* __GFP_HIGH set */
#define ALLOC_CPUSET		 0x40 /* check for correct cpuset */
#define ALLOC_CMA		 0x80 /* allow allocations from CMA areas */
#ifdef CONFIG_ZONE_DMA32
#define ALLOC_NOFRAGMENT	0x100 /* avoid mixing pageblock types */
#else
#define ALLOC_NOFRAGMENT	  0x0
#endif
#define ALLOC_KSWAPD		0x800 /* allow waking of kswapd, __GFP_KSWAPD_RECLAIM set */

The meaning of the logo is as follows

  • ALLO_WMARK_XXXwatermark: is relevant when applying for memory
  • ALLOC_NO_WATERMARKS: Do not check when applying for memorywater mark
  • ALLOC_OOM: Allowed to trigger when memory is insufficient : Whether to allow the use of reserved memory OOMALLOC_HARDERin page migrationMIGRATE_HIGHATOMIC
  • ALLOC_HIGH: __GFP_HIGHSame function as
  • ALLOC_CPUSET: Whether to use CPUSETfunction to control memory application
  • ALLOC_CMA: Allows memory to be requested CMAfrom
  • ALLOC_NOFRAGMENT: If set, it determines no_fallbackthe policy to be used when there is insufficient memory. It does not allow memory to be applied for from remote nodes, that is, it does not allow the generation of external memory fragments.
  • ALLOC_KSWAPD: Allowed to be enabled when memory is insufficientkswapd

get_page_from_freelistIt is the first attempt of the algorithm to apply for memory. The core idea is to obtain the physical page from the corresponding one whenbuddy the memory is sufficient.zoneorderfreelist

get_page_from_freelist

/*
 * get_page_from_freelist goes through the zonelist trying to allocate
 * a page.
 */
static struct page *
get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
		struct zonelist *zonelist, int high_zoneidx, int alloc_flags)
{
    
    
	struct zoneref *z;
	struct page *page = NULL;
	int classzone_idx;
	struct zone *zone, *preferred_zone;
	nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */
	int zlc_active = 0;		/* set if using zonelist_cache */
	int did_zlc_setup = 0;		/* just call zlc_setup() one time */

obtained zone_id

	classzone_idx = zone_idx(preferred_zone);

Let's look at zonelist_scanthe label. If the label is passed zonelist, look for the one with enough free pages.zone

zonelist_scan:
	/*
	 * Scan zonelist, looking for a zone with enough free.
	 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
	 */
	for_each_zone_zonelist_nodemask(zone, z, zonelist,
						high_zoneidx, nodemask) {
    
    
		if (NUMA_BUILD && zlc_active &&
			!zlc_zone_worth_trying(zonelist, z, allowednodes))
				continue;
		if ((alloc_flags & ALLOC_CPUSET) &&
			!cpuset_zone_allowed_softwall(zone, gfp_mask))
				goto try_next_zone;

		if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
    
    
			unsigned long mark;
			int ret;
			if (alloc_flags & ALLOC_WMARK_MIN)
				mark = zone->pages_min;
			else if (alloc_flags & ALLOC_WMARK_LOW)
				mark = zone->pages_low;
			else
				mark = zone->pages_high;

			if (zone_watermark_ok(zone, order, mark,
				    classzone_idx, alloc_flags))
				goto try_this_zone;

			if (zone_reclaim_mode == 0)
				goto this_zone_full;

			ret = zone_reclaim(zone, gfp_mask, order);
			switch (ret) {
    
    
			case ZONE_RECLAIM_NOSCAN:
				/* did not scan */
				goto try_next_zone;
			case ZONE_RECLAIM_FULL:
				/* scanned but unreclaimable */
				goto this_zone_full;
			default:
				/* did we reclaim enough */
				if (!zone_watermark_ok(zone, order, mark,
						classzone_idx, alloc_flags))
					goto this_zone_full;
			}
		}

About macro zonelist_scanexpansion to

for (z = first_zones_zonelist(zonelist, high_zoneidx, nodemask, &zone)
{
    
    
	zone;
    z = next_zones_zonelist(++z, high_zoneidx, nodemask, &zone)// 获取zonelist中的下一个zone
}
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
					enum zone_type highest_zoneidx,
					nodemask_t *nodes,
					struct zone **zone)
{
    
    
	return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes,
								zone);
}

struct zoneref *next_zones_zonelist(struct zoneref *z,
					enum zone_type highest_zoneidx,
					nodemask_t *nodes,
					struct zone **zone)
{
    
    
	/*
	 * Find the next suitable zone to use for the allocation.
	 * Only filter based on nodemask if it's set
	 */
	if (likely(nodes == NULL))
		while (zonelist_zone_idx(z) > highest_zoneidx)
			z++;
	else
		while (zonelist_zone_idx(z) > highest_zoneidx ||
				(z->zone && !zref_in_nodemask(z, nodes)))
			z++;

	*zone = zonelist_zone(z); // 获得zonelist中的zone
	return z;
}
		if (NUMA_BUILD && zlc_active &&
            // z->zone所在的节点不允许分配或者该zone已经饱满了
			!zlc_zone_worth_trying(zonelist, z, allowednodes)) 
				continue;
		if ((alloc_flags & ALLOC_CPUSET) &&
            // 开启了检查内存节点是否在指定CPU集合,并且该zone不被允许在该CPU上分配内存
			!cpuset_zone_allowed_softwall(zone, gfp_mask))
				goto try_next_zone;

zlc_zone_worth_tryingCheck whether the node allows allocation or zonewhether it is full

static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zoneref *z,
						nodemask_t *allowednodes)
{
    
    
	struct zonelist_cache *zlc;	/* cached zonelist speedup info */
	int i;				/* index of *z in zonelist zones */
	int n;				/* node that zone *z is on */

	zlc = zonelist->zlcache_ptr; // 得到zonelist_cache指针信息
	if (!zlc)
		return 1;

	i = z - zonelist->_zonerefs; // 获得_zonerefs数组位置
	n = zlc->z_to_n[i];

	/* This zone is worth trying if it is allowed but not full */
	return node_isset(n, *allowednodes) && !test_bit(i, zlc->fullzones);
}
struct zonelist_cache {
    
    
	unsigned short z_to_n[MAX_ZONES_PER_ZONELIST];		/* zone->nid */
	DECLARE_BITMAP(fullzones, MAX_ZONES_PER_ZONELIST);	/* zone full? */
	unsigned long last_full_zap;		/* when last zap'd (jiffies) */
};

The main ones are to test two functions: node_issetandtest_bit

#define node_isset(node, nodemask) test_bit((node), (nodemask).bits)

static inline int test_bit(int nr, const volatile unsigned long *addr)
{
    
    
	return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

cpuset_zone_allowed_softwallThe same applies to functions, no summary is given.

Get the water level of fast memory allocation

		if (!(alloc_flags & ALLOC_NO_WATERMARKS)) {
    
    
			unsigned long mark;
			int ret;
			if (alloc_flags & ALLOC_WMARK_MIN)
				mark = zone->pages_min; // 选择min阈值
			else if (alloc_flags & ALLOC_WMARK_LOW)
				mark = zone->pages_low; // 选择low阈值
			else
				mark = zone->pages_high; // 选择high阈值
zone_watermark_ok

Check that zonethere are enough pages to allocate

			if (zone_watermark_ok(zone, order, mark,
				    classzone_idx, alloc_flags))
				goto try_this_zone;

Check the water level, which maskmay be one of low, minor high. From this function, it can be seen that 2^orderthe page to be allocated must meet several conditions.

  • In addition to the allocated page frame, the memory management area also has at least minone free page frame.
  • In addition to the allocated page frame, there are more than or equal to free page frames in blocks of orderat leastomin/2^o
/*
 * Return 1 if free pages are above 'mark'. This takes into account the order
 * of the allocation.
 */
int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
		      int classzone_idx, int alloc_flags)
{
    
    
	/* free_pages my go negative - that's OK */
	long min = mark;
    // 获得空闲页的数量vm_stat[NR_FREE_PAGES]
    // 减去要分配的页面(1 << order)
	long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1;
	int o;

	if (alloc_flags & ALLOC_HIGH)
		min -= min / 2;
	if (alloc_flags & ALLOC_HARDER)
		min -= min / 4;
	// lowmem_reserve表示要预留的页面个数
	if (free_pages <= min + z->lowmem_reserve[classzone_idx]) 
		return 0;
	// 除去要分配的页面个数,从order k 到 order 10的空闲页面总数,至少得是 min/(2^k)
	for (o = 0; o < order; o++) {
    
    
		/* At the next order, this order's pages become unavailable */
		free_pages -= z->free_area[o].nr_free << o;

		/* Require fewer higher order pages to be free */
		min >>= 1;

		if (free_pages <= min)
			return 0;
	}
	return 1;
}

After passing zone_watermark_okthe water level monitoring, try_this_zonego allocate the page frame, that is, buffered_rmqueuethe function

buffered_rmqueue

This function is zonethe core function for page allocation in a designated area in the partner system.

static struct page *buffered_rmqueue(struct zone *preferred_zone,
			struct zone *zone, int order, gfp_t gfp_flags)
{
    
    
	unsigned long flags;
	struct page *page;
	int cold = !!(gfp_flags & __GFP_COLD);
	int cpu;
	int migratetype = allocflags_to_migratetype(gfp_flags); //

again:
	cpu  = get_cpu();
	if (likely(order == 0)) {
    
      // 表示单页,从pcplist进行分配 冷热页
		struct per_cpu_pages *pcp;
		// 获取到本节点的cpu高速缓存页
		pcp = &zone_pcp(zone, cpu)->pcp;
		local_irq_save(flags);
        // 该链表为空,大概率上次获取的cpu高速缓存的迁移类型和这次不一致
		if (!pcp->count) {
    
    
            // 从伙伴系统中获得页,然后向高速缓存中添加内存页
			pcp->count = rmqueue_bulk(zone, 0,
					pcp->batch, &pcp->list, migratetype);
            // 如果链表仍然为空,那么说明伙伴系统中页面也没有了,分配失败
			if (unlikely(!pcp->count))
				goto failed;
		}
		
		/* Find a page of the appropriate migrate type */
        // 如果分配的页面不需要考虑硬件缓存(注意不是每CPU页面缓存),则取出链表的最后一个节点返回给上层
		if (cold) {
    
    
			list_for_each_entry_reverse(page, &pcp->list, lru)
				if (page_private(page) == migratetype)
					break;
		} else {
    
     // 如果要考虑硬件缓存,则取出链表的第一个页面,这个页面是最近刚释放到每CPU缓存的,缓存热度更高
			list_for_each_entry(page, &pcp->list, lru)
				if (page_private(page) == migratetype)
					break;
		}

		/* Allocate more to the pcp list if necessary */
		if (unlikely(&page->lru == &pcp->list)) {
    
    
			pcp->count += rmqueue_bulk(zone, 0,
					pcp->batch, &pcp->list, migratetype);
			page = list_entry(pcp->list.next, struct page, lru);
		}
		//将页面从每CPU缓存链表中取出,并将每CPU缓存计数减1
		list_del(&page->lru);
		pcp->count--;
    // 分配的是多个页面,不需要考虑每CPU页面缓存,直接从系统中分配
	} else {
    
     // 去指定的migratetype的链表中去分配
		spin_lock_irqsave(&zone->lock, flags); //关中断,并获得管理区的锁
        
		page = __rmqueue(zone, order, migratetype);
		spin_unlock(&zone->lock); //先回收(打开)锁,待后面统计计数设置完毕后再开中断
		if (!page)
			goto failed;
	}
	// 事件统计计数,debug(调试)用
	__count_zone_vm_events(PGALLOC, zone, 1 << order);
	zone_statistics(preferred_zone, zone);
	local_irq_restore(flags); //恢复中断
	put_cpu();

	VM_BUG_ON(bad_range(zone, page));
	if (prep_new_page(page, order, gfp_flags))
		goto again;
	return page;

failed:
	local_irq_restore(flags);
	put_cpu();
	return NULL;
}

The function __rmqueueis divided into two situations:

  • Quick allocation __rmqueue_smallest: allocate directly from the specified migration type linked list
  • Slow allocation __rmqueue_fallback: When there is insufficient memory for the migration type in the specified linked list, the backup list is used.
/*
 * Do the hard work of removing an element from the buddy allocator.
 * Call me with the zone->lock already held.
 */
static struct page *__rmqueue(struct zone *zone, unsigned int order,
						int migratetype)
{
    
    
	struct page *page;
	// 快速分配
	page = __rmqueue_smallest(zone, order, migratetype);

	if (unlikely(!page))
		page = __rmqueue_fallback(zone, order, migratetype);

	return page;
}

It is completed at this time. If zone_watermark_okthe water level detection fails, continue to call downwards.


Running here indicates that zonethe page should be recycled

That is, when the water level detection fails, it means that there are no available pages, so a certain amount of memory recycling is performed here.

			ret = zone_reclaim(zone, gfp_mask, order);

By zone_reclaimdoing some page recycling

2^orderTrue will be returned only when the number of page frames has been recycled . Even if the number of page frames has been recycled and the number has not been reached, false will be returned.

int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
{
    
    
	int node_id;
	int ret;
	// 都小于最小设定的值
	if (zone_pagecache_reclaimable(zone) <= zone->min_unmapped_pages &&
	    zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
		return ZONE_RECLAIM_FULL;

	if (zone_is_all_unreclaimable(zone)) // 设定了标识不回收
		return ZONE_RECLAIM_FULL;
	// 如果没有设置__GFP_WAIT,即wait为0,则不继续进行内存分配
    // 如果PF_MEMALLOC被设置,也就是说调用内存分配函数的本身就是内存回收进程,则不继续进行内存分配
	if (!(gfp_mask & __GFP_WAIT) || (current->flags & PF_MEMALLOC))
		return ZONE_RECLAIM_NOSCAN;

	node_id = zone_to_nid(zone);// 获得本zone的nodeid
    // 不属于该cpu范围
	if (node_state(node_id, N_CPU) && node_id != numa_node_id())
		return ZONE_RECLAIM_NOSCAN;
	// 其他进程在回收
	if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
		return ZONE_RECLAIM_NOSCAN;

	ret = __zone_reclaim(zone, gfp_mask, order);// 回收该zone的页
	zone_clear_flag(zone, ZONE_RECLAIM_LOCKED);// 释放回收锁

	if (!ret)
		count_vm_event(PGSCAN_ZONE_RECLAIM_FAILED);

	return ret;
}

Note here PF_MEMALLOCand __GFP_WAIT, which PF_MEMALLOCis a process flag bit. Generally, non-memory management subsystems should not use this flag.

Continue to call down__zone_reclaim

static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
{
    
    
	/* Minimum pages needed in order to stay on node */
	const unsigned long nr_pages = 1 << order;
	struct task_struct *p = current;
	struct reclaim_state reclaim_state;
	int priority;
	struct scan_control sc = {
    
     // 控制扫描结果
		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
		.may_swap = 1,
		.swap_cluster_max = max_t(unsigned long, nr_pages,
					SWAP_CLUSTER_MAX),
		.gfp_mask = gfp_mask,
		.swappiness = vm_swappiness,
		.order = order,
		.isolate_pages = isolate_pages_global,
	};
	unsigned long slab_reclaimable;

	disable_swap_token();
	cond_resched();
	/*
	 * We need to be able to allocate from the reserves for RECLAIM_SWAP
	 * and we also need to be able to write out pages for RECLAIM_WRITE
	 * and RECLAIM_SWAP.
	 */
	p->flags |= PF_MEMALLOC | PF_SWAPWRITE;
	reclaim_state.reclaimed_slab = 0;
	p->reclaim_state = &reclaim_state;

	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
    
    
		priority = ZONE_RECLAIM_PRIORITY;
		do {
    
    
			note_zone_scanning_priority(zone, priority);
			shrink_zone(priority, zone, &sc); // 回收内存
			priority--;
		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
	}

	slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
	if (slab_reclaimable > zone->min_slab_pages) {
    
    
		while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
			zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
				slab_reclaimable - nr_pages)
			;
		sc.nr_reclaimed += slab_reclaimable -
			zone_page_state(zone, NR_SLAB_RECLAIMABLE);
	}

	p->reclaim_state = NULL;
	current->flags &= ~(PF_MEMALLOC | PF_SWAPWRITE);
	return sc.nr_reclaimed >= nr_pages;
}

About shrink_zoneand shrink_slabwill be explained later

			switch (ret) {
    
    
			case ZONE_RECLAIM_NOSCAN:
				/* did not scan */
				goto try_next_zone;
			case ZONE_RECLAIM_FULL:
				/* scanned but unreclaimable */
				goto this_zone_full;
			default:
				/* did we reclaim enough */
				if (!zone_watermark_ok(zone, order, mark, 
						classzone_idx, alloc_flags))
					goto this_zone_full;
			}

Ideally, start allocating memory

try_this_zone:
		page = buffered_rmqueue(preferred_zone, zone, order, gfp_mask);
		if (page)
			break;

should zonebe full

this_zone_full:
		if (NUMA_BUILD)
			zlc_mark_zone_full(zonelist, z);

continue to nextzone

try_next_zone:
		if (NUMA_BUILD && !did_zlc_setup) {
    
    
			/* we do zlc_setup after the first zone is tried */
			allowednodes = zlc_setup(zonelist, alloc_flags);
			zlc_active = 1;
			did_zlc_setup = 1;
		}
	}

cycle again

	if (unlikely(NUMA_BUILD && page == NULL && zlc_active)) {
    
    
		/* Disable zlc cache for second zonelist scan */
		zlc_active = 0;
		goto zonelist_scan;
	}

Slow memory allocation

Slow memory allocation: If there is a fast memory allocation, that is, if zonelistno zonememory is acquired in the fast allocation, mina threshold will be used for slow allocation.

  • Asynchronous memory compression
  • direct memory reclamation
  • Light synchronous memory compression (as oomallocated)

Wake up each nodekernel kswapdthread

	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx)
		wakeup_kswapd(zone, order);  // 唤醒每个node的kswapd内核线程
/*
 * A zone is low on free memory, so wake its kswapd task to service it.
 */
void wakeup_kswapd(struct zone *zone, int order)
{
    
    
	pg_data_t *pgdat;

	if (!populated_zone(zone))
		return;

	pgdat = zone->zone_pgdat;
	// 检查水位
	if (zone_watermark_ok(zone, order, zone->pages_low, 0, 0))
		return;
	if (pgdat->kswapd_max_order < order)
		pgdat->kswapd_max_order = order;
	if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))  // 允许位
		return;
	if (!waitqueue_active(&pgdat->kswapd_wait))  
		return;
	wake_up_interruptible(&pgdat->kswapd_wait);  
}

Reduce requirements and try to minuse thresholds as criteria for fast memory allocation

	alloc_flags = ALLOC_WMARK_MIN;
	if ((unlikely(rt_task(p)) && !in_interrupt()) || !wait)
		alloc_flags |= ALLOC_HARDER;
	if (gfp_mask & __GFP_HIGH)
		alloc_flags |= ALLOC_HIGH;
	if (wait)
		alloc_flags |= ALLOC_CPUSET;

Several macro definitions here

  • ALLOC_HARDER: Indicates trying harder to allocate memory
  • ALLOC_HIGH: Indicates setting the caller's __GFP_HIGHhigh priority
  • ALLOC_CPUSET: Indicates checking cpusetwhether allocation of memory pages is allowed
	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
						high_zoneidx, alloc_flags);
	if (page)
		goto got_pg;

There is a saying here called "Five Swords". The first sword is used casually and is allocated lowaccording to standard, that is, called directly get_free_page_list(). The second sword is used here to minallocate memory according to the standard, and when allocating memory. Signs that add ALLOC_WMARK_MINand ALLOC_HARDERand ALLOC_HIGHto signs

Here the call is made without checking the water level at all, and alloc_flagsthe value is assigned asALLOC_NO_WATERMARKS

rebalance:
	if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
			&& !in_interrupt()) {
    
    
		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
    
    
nofail_alloc:
			/* go through the zonelist yet again, ignoring mins */
			page = get_page_from_freelist(gfp_mask, nodemask, order,
				zonelist, high_zoneidx, ALLOC_NO_WATERMARKS); // 不检查水位分配内存
			if (page)
				goto got_pg;
			if (gfp_mask & __GFP_NOFAIL) {
    
    
				congestion_wait(WRITE, HZ/50);
				goto nofail_alloc;
			}
		}
		goto nopage;
	}
try_to_free_pages

Acquire memory pages by synchronously releasing memory. The main function istry_to_free_pages

	cpuset_update_task_memory_state();
	p->flags |= PF_MEMALLOC;

	lockdep_set_current_reclaim_state(gfp_mask);
	reclaim_state.reclaimed_slab = 0;
	p->reclaim_state = &reclaim_state;

	did_some_progress = try_to_free_pages(zonelist, order,
						gfp_mask, nodemask);

	p->reclaim_state = NULL;
	lockdep_clear_current_reclaim_state();
	p->flags &= ~PF_MEMALLOC;

The function body is

unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
				gfp_t gfp_mask, nodemask_t *nodemask)
{
    
    
	struct scan_control sc = {
    
     // 扫描控制结构
		.gfp_mask = gfp_mask,
		.may_writepage = !laptop_mode,
		.swap_cluster_max = SWAP_CLUSTER_MAX,
		.may_unmap = 1,
		.may_swap = 1,
		.swappiness = vm_swappiness,
		.order = order,
		.mem_cgroup = NULL,
		.isolate_pages = isolate_pages_global,
		.nodemask = nodemask,
	};

	return do_try_to_free_pages(zonelist, &sc);
}

nodeThere is a conceptual issue that needs to be clarified. The first three methods are to allocate memory. If the memory is insufficient, just get the page frame from other nodes (of course there is also memory recycling zone_reclaim, but the essence is to find memory from other nodes), and from here on, it is directly from other nodes. Search for this node

do_try_to_free_pagesFilled with a large number of shrink_zonesums shrink_slab, it is the main logical function of recycling pages.

static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
					struct scan_control *sc)
{
    
    
	int priority;
	unsigned long ret = 0;
	unsigned long total_scanned = 0;
	struct reclaim_state *reclaim_state = current->reclaim_state;
	unsigned long lru_pages = 0;
	struct zoneref *z;
	struct zone *zone;
	enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);

	delayacct_freepages_start();

	if (scanning_global_lru(sc))
		count_vm_event(ALLOCSTALL);
	/*
	 * mem_cgroup will not do shrink_slab.
	 */
	if (scanning_global_lru(sc)) {
    
    
		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
    
    

			if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
				continue;

			lru_pages += zone_lru_pages(zone);
		}
	}

	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
    
    
		sc->nr_scanned = 0;
		if (!priority)
			disable_swap_token();
		shrink_zones(priority, zonelist, sc);
		/*
		 * Don't shrink slabs when reclaiming memory from
		 * over limit cgroups
		 */
		if (scanning_global_lru(sc)) {
    
    
			shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages);
			if (reclaim_state) {
    
    
				sc->nr_reclaimed += reclaim_state->reclaimed_slab;
				reclaim_state->reclaimed_slab = 0;
			}
		}
		total_scanned += sc->nr_scanned;
		if (sc->nr_reclaimed >= sc->swap_cluster_max) {
    
    
			ret = sc->nr_reclaimed;
			goto out;
		}

		/*
		 * Try to write back as many pages as we just scanned.  This
		 * tends to cause slow streaming writers to write data to the
		 * disk smoothly, at the dirtying rate, which is nice.   But
		 * that's undesirable in laptop mode, where we *want* lumpy
		 * writeout.  So in laptop mode, write out the whole world.
		 */
		if (total_scanned > sc->swap_cluster_max +
					sc->swap_cluster_max / 2) {
    
    
			wakeup_pdflush(laptop_mode ? 0 : total_scanned);
			sc->may_writepage = 1;
		}

		/* Take a nap, wait for some writeback to complete */
		if (sc->nr_scanned && priority < DEF_PRIORITY - 2)
			congestion_wait(WRITE, HZ/10);
	}
	/* top priority shrink_zones still had more to do? don't OOM, then */
	if (!sc->all_unreclaimable && scanning_global_lru(sc))
		ret = sc->nr_reclaimed;
out:
	/*
	 * Now that we've scanned all the zones at this priority level, note
	 * that level within the zone so that the next thread which performs
	 * scanning of this zone will immediately start out at this priority
	 * level.  This affects only the decision whether or not to bring
	 * mapped pages onto the inactive list.
	 */
	if (priority < 0)
		priority = 0;

	if (scanning_global_lru(sc)) {
    
    
		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
    
    

			if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
				continue;

			zone->prev_priority = priority;
		}
	} else
		mem_cgroup_record_reclaim_priority(sc->mem_cgroup, priority);

	delayacct_freepages_end();

	return ret;
}	
	if (likely(did_some_progress)) {
    
    
		page = get_page_from_freelist(gfp_mask, nodemask, order,
					zonelist, high_zoneidx, alloc_flags); // 回收后再去分配
		if (page)
			goto got_pg;

todo, too complicated, we’ll look at it later

The last thing is to use ommthe mechanism, that is, if there is really no page allocated, then kill the page occupied by a certain process (somewhat cruel), which is the so-called out of memory killermechanism.

	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
    
    
		if (!try_set_zone_oom(zonelist, gfp_mask)) {
    
    
			schedule_timeout_uninterruptible(1);
			goto restart;
		}

		/*
		 * Go through the zonelist yet one more time, keep
		 * very high watermark here, this is only to catch
		 * a parallel oom killing, we must fail if we're still
		 * under heavy pressure.
		 */
		page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask,
			order, zonelist, high_zoneidx,
			ALLOC_WMARK_HIGH|ALLOC_CPUSET); // 虚晃一枪,使用ALLOC_WMARK_HIGH来要求,明显不可能完成
		if (page) {
    
    
			clear_zonelist_oom(zonelist, gfp_mask);
			goto got_pg;
		}

		/* The OOM killer will not help higher order allocs so fail */
		if (order > PAGE_ALLOC_COSTLY_ORDER) {
    
    
			clear_zonelist_oom(zonelist, gfp_mask);
			goto nopage;
		}

		out_of_memory(zonelist, gfp_mask, order); // 释放进程的内存
		clear_zonelist_oom(zonelist, gfp_mask);
		goto restart;

Scan_Control

Scan control structure, its main function is to save variables and parameters for a memory recycling or memory compression, and some processing results will also be saved here.

Mainly used in memory recycling and memory compression

struct scan_control {
    
     
	/* Incremented by the number of inactive pages that were scanned */
	unsigned long nr_scanned;  // 已经扫描的页框数量

	/* Number of pages freed so far during a call to shrink_zones() */
	unsigned long nr_reclaimed;   // 已经回收的页框数量

	/* This context's GFP mask */
	gfp_t gfp_mask;  // 申请内存时使用的分配标志

	int may_writepage;  // 能否执行回写操作

	/* Can mapped pages be reclaimed? */
	int may_unmap;  // 能否进行unmap操作,即将所有映射了此页的页表项清空

	/* Can pages be swapped as part of reclaim? */
	int may_swap;  // 能否进行swap交换

	/* This context's SWAP_CLUSTER_MAX. If freeing memory for
	 * suspend, we effectively ignore SWAP_CLUSTER_MAX.
	 * In this context, it doesn't matter that we scan the
	 * whole list at once. */
	int swap_cluster_max;

	int swappiness;

	int all_unreclaimable;

	int order;  // 申请内存时使用的order值,因为只有申请内存,然后内存不足时才会进行扫描

	/* Which cgroup do we reclaim from */
	struct mem_cgroup *mem_cgroup;  // 目标memcg,如果是针对整个zone的,则此为NULL

	/*
	 * Nodemask of nodes allowed by the caller. If NULL, all nodes
	 * are scanned.
	 */
	nodemask_t	*nodemask;  // 允许执行扫描的node节点掩码

	/* Pluggable isolate pages callback */
	unsigned long (*isolate_pages)(unsigned long nr, struct list_head *dst,
			unsigned long *scanned, int order, int mode,
			struct zone *z, struct mem_cgroup *mem_cont,
			int active, int file);
};

Memory compression technology: Under normal circumstances, when memory is tight, in order to achieve system optimization, frequent writing of memory data I/Oback to the disk not only affects flashthe lifespan, but also seriously affects system performance. Therefore, the introduction of memory compression technology, the mainstream ones are as follows: kind

  • zSwap:Swap space, generally compressed anonymous pages
  • zRam: A method of using memory to simulate block devices. Generally, anonymous pages are compressed.
  • zCache: Generally, file pages are compressed.

Anonymous pages are pages that are not associated with files, such as the heap and stack of a process. They cannot be exchanged with disk files, but they can be exchanged by dividing additional swap swappartitions on the hard disk or using swap files.

Regarding active pages and lazy pages, generally we judge whether the page is active by whether it is frequently accessed by applications in the system. If the page is not set, it means it is lazy and needs to be moved. Lazy linked list, and if the page is set, indicating that it has been accessed recently, it should be moved to the active linked list

As time goes by, the least active pages will be at the end of the lazy linked list. When there is insufficient memory, the kernel will swap out these pages, because these pages are rarely used from birth to when they are swapped out, so According to LRUthe principle of , swapping out these pages will cause minimal damage to the system.

Memory Shrink

Memory recycling generally refers to the right zonememory recycling, or it may refer to the last one memcgto be recycled.

  • Recycle 2^(order+1)page frames each time, be satisfied with this memory allocation, and try to reclaim as many page frames as possible. lruIf the number in the inactive linked list does not meet this standard, the judgment of this state will be cancelled.
  • zoneMemory recycling is accompanied by zonememory compression, so zonewhen memory recycling is performed, free page frames will be returned until memory compression is satisfied.

According to the previous recycling function, there are three main functions shrink_zone, shrink_listandshrink_slab

shrink_zone

static void shrink_zone(int priority, struct zone *zone,
				struct scan_control *sc)
{
    
    
	unsigned long nr[NR_LRU_LISTS];
	unsigned long nr_to_scan;
	unsigned long percent[2];	/* anon @ 0; file @ 1 */
	enum lru_list l;
	unsigned long nr_reclaimed = sc->nr_reclaimed;
	unsigned long swap_cluster_max = sc->swap_cluster_max;

get_scan_ratio

	get_scan_ratio(zone, sc, percent);

Generally, when the physical memory is not enough, there are two options:

  • Replace some anonymous pages to swapa partition
  • Flush page cachethe data inside back to the disk, or clean it up directly

In both methods, swapthe weights replaced by

	/*
	 * With swappiness at 100, anonymous and file have the same priority.
	 * This scanning priority is essentially the inverse of IO cost.
	 */
	anon_prio = sc->swappiness;
	file_prio = 200 - sc->swappiness;

First determine whether it is completely closedswap

static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
					unsigned long *percent)
{
    
    
	unsigned long anon, file, free;
	unsigned long anon_prio, file_prio;
	unsigned long ap, fp;
	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);

	/* If we have no swap space, do not bother scanning anon pages. */
	if (!sc->may_swap || (nr_swap_pages <= 0)) {
    
    
		percent[0] = 0;
		percent[1] = 100;
		return;
	}

Count the number of anonymous pages and page cachepages

	anon  = zone_nr_pages(zone, sc, LRU_ACTIVE_ANON) +
		zone_nr_pages(zone, sc, LRU_INACTIVE_ANON);
	file  = zone_nr_pages(zone, sc, LRU_ACTIVE_FILE) +
		zone_nr_pages(zone, sc, LRU_INACTIVE_FILE);

If the number of free pages free+ page cachethe number of pages is less than highthe threshold, then all are placed swapin the middle

	if (scanning_global_lru(sc)) {
    
    
		free  = zone_page_state(zone, NR_FREE_PAGES);
		/* If we have very few page cache pages,
		   force-scan anon pages. */
		if (unlikely(file + free <= zone->pages_high)) {
    
      
			percent[0] = 100;
			percent[1] = 0;
			return;
		}
	}

Calculate proportion

	anon_prio = sc->swappiness;
	file_prio = 200 - sc->swappiness;

	/*
	 * The amount of pressure on anon vs file pages is inversely
	 * proportional to the fraction of recently scanned pages on
	 * each list that were recently referenced and in active use.
	 */
	ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
	ap /= reclaim_stat->recent_rotated[0] + 1;

	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
	fp /= reclaim_stat->recent_rotated[1] + 1;

	/* Normalize to percentages */
	percent[0] = 100 * ap / (ap + fp + 1);
	percent[1] = 100 - percent[0];

There is a transition in the middle

	for_each_evictable_lru(l) {
    
    
		int file = is_file_lru(l);
		unsigned long scan;

		scan = zone_nr_pages(zone, sc, l);
		if (priority) {
    
    
			scan >>= priority;
			scan = (scan * percent[file]) / 100;
		}
		if (scanning_global_lru(sc)) {
    
    
			zone->lru[l].nr_scan += scan;
			nr[l] = zone->lru[l].nr_scan;
			if (nr[l] >= swap_cluster_max)
				zone->lru[l].nr_scan = 0;
			else
				nr[l] = 0;
		} else
			nr[l] = scan;
	}

The macro for_each_evictable_lruexpands to

for (l = 0; l <= LRU_ACTIVE_FILE; l++)

lThe variable represented isstruct lru_list

enum lru_list {
    
    
	LRU_INACTIVE_ANON = LRU_BASE,
	LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
#ifdef CONFIG_UNEVICTABLE_LRU
	LRU_UNEVICTABLE,
#else
	LRU_UNEVICTABLE = LRU_ACTIVE_FILE, /* avoid compiler errors in dead code */
#endif
	NR_LRU_LISTS
};

Visible LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE, indicating a recyclable linked list

Therefore, this loop loops through these three lrulinked lists and traverses all recyclable linked lists.


shrink_list

Traverse the linked list in LRU_INACTIVE_ANON,LRU_ACTIVE_ANON,LRU_INACTIVE_FILE,LRU_ACTIVE_FILEthis orderlru

	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
					nr[LRU_INACTIVE_FILE]) {
    
    
		for_each_evictable_lru(l) {
    
    
			if (nr[l]) {
    
    
				nr_to_scan = min(nr[l], swap_cluster_max);
				nr[l] -= nr_to_scan;
				// 对此lru类型的链表进行回收
				nr_reclaimed += shrink_list(l, nr_to_scan,
							    zone, sc, priority);
			}
		}
		/*
		 * On large memory systems, scan >> priority can become
		 * really large. This is fine for the starting priority;
		 * we want to put equal scanning pressure on each zone.
		 * However, if the VM has a harder time of freeing pages,
		 * with multiple processes reclaiming pages, the total
		 * freeing target can get unreasonably large.
		 */
		if (nr_reclaimed > swap_cluster_max &&
			priority < DEF_PRIORITY && !current_is_kswapd())
			break;
	}

Remember vfs_cache_inithow each directory entry was stored in shrinker_list? This is where memory recycling is performed, with the main categories

  • LRU_ACTIVE_FILE: Active file, call , process shrink_active_listactive linked listlru
  • LRU_ACTIVE_ANON: Activity anonymous, call shrink_active_list, lruprocess the activity list and there are too few non-anonymous activity pages
  • shrink_inactive_list, inactive lrulinked list
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
	struct zone *zone, struct scan_control *sc, int priority)
{
    
    
	int file = is_file_lru(lru);

	if (lru == LRU_ACTIVE_FILE) {
    
    
		shrink_active_list(nr_to_scan, zone, sc, priority, file);
		return 0;
	}

	if (lru == LRU_ACTIVE_ANON && inactive_anon_is_low(zone, sc)) {
    
    
		shrink_active_list(nr_to_scan, zone, sc, priority, file);
		return 0;
	}
	return shrink_inactive_list(nr_to_scan, zone, sc, priority, file);
}

Set reclaimed memory

	sc->nr_reclaimed = nr_reclaimed;

When there are too few non-anonymous active pages, callshrink_active_list

	/*
	 * Even if we did not try to evict anon pages at all, we want to
	 * rebalance the anon lru active/inactive ratio.
	 */
	if (inactive_anon_is_low(zone, sc))
		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);

If too many dirty pages are written back, sleep here for a while.

	throttle_vm_writeout(sc->gfp_mask);

shrink_active_list

Select a function to see

static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
			struct scan_control *sc, int priority, int file)
{
    
    
	unsigned long pgmoved;
	int pgdeactivate = 0;
	unsigned long pgscanned;
	LIST_HEAD(l_hold);	/* The pages which were snipped off */
	LIST_HEAD(l_inactive);
	struct page *page;
	struct pagevec pvec;
	enum lru_list lru;
	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);

	lru_add_drain();
	spin_lock_irq(&zone->lru_lock);
	pgmoved = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order,
					ISOLATE_ACTIVE, zone,
					sc->mem_cgroup, 1, file);
	/*
	 * zone->pages_scanned is used for detect zone's oom
	 * mem_cgroup remembers nr_scan by itself.
	 */
	if (scanning_global_lru(sc)) {
    
    
		zone->pages_scanned += pgscanned;
	}
	reclaim_stat->recent_scanned[!!file] += pgmoved;

	if (file)
		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
	else
		__mod_zone_page_state(zone, NR_ACTIVE_ANON, -pgmoved);
	spin_unlock_irq(&zone->lru_lock);

	pgmoved = 0;
	while (!list_empty(&l_hold)) {
    
    
		cond_resched();
		page = lru_to_page(&l_hold);
		list_del(&page->lru);

		if (unlikely(!page_evictable(page, NULL))) {
    
    
			putback_lru_page(page);
			continue;
		}

		/* page_referenced clears PageReferenced */
		if (page_mapping_inuse(page) &&
		    page_referenced(page, 0, sc->mem_cgroup))
			pgmoved++;

		list_add(&page->lru, &l_inactive);
	}

	/*
	 * Move the pages to the [file or anon] inactive list.
	 */
	pagevec_init(&pvec, 1);
	lru = LRU_BASE + file * LRU_FILE;

	spin_lock_irq(&zone->lru_lock);
	/*
	 * Count referenced pages from currently used mappings as
	 * rotated, even though they are moved to the inactive list.
	 * This helps balance scan pressure between file and anonymous
	 * pages in get_scan_ratio.
	 */
	reclaim_stat->recent_rotated[!!file] += pgmoved;

	pgmoved = 0;
	while (!list_empty(&l_inactive)) {
    
    
		page = lru_to_page(&l_inactive);
		prefetchw_prev_lru_page(page, &l_inactive, flags);
		VM_BUG_ON(PageLRU(page));
		SetPageLRU(page);
		VM_BUG_ON(!PageActive(page));
		ClearPageActive(page);

		list_move(&page->lru, &zone->lru[lru].list);
		mem_cgroup_add_lru_list(page, lru);
		pgmoved++;
		if (!pagevec_add(&pvec, page)) {
    
    
			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
			spin_unlock_irq(&zone->lru_lock);
			pgdeactivate += pgmoved;
			pgmoved = 0;
			if (buffer_heads_over_limit)
				pagevec_strip(&pvec);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}
	}
	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
	pgdeactivate += pgmoved;
	__count_zone_vm_events(PGREFILL, zone, pgscanned);
	__count_vm_events(PGDEACTIVATE, pgdeactivate);
	spin_unlock_irq(&zone->lru_lock);
	if (buffer_heads_over_limit)
		pagevec_strip(&pvec);
	pagevec_release(&pvec);
}

Fast memory reclamation

In get_page_from_freelist()the function, during the traversal zonelistprocess, each zoneone is judged before allocation. If zonethe amount of free memory after allocation <threshold + the number of reserved page frames, then zonerapid memory recycling will be performed.

The threshold may be min/low/highany of

direct memory reclamation

Direct memory reclamation occurs in slow allocation. In slow allocation, the kernel thread nodeof all nodes is first woken up, and then the call is tried to use the threshold to remove the continuous page frame from the back . If it fails, the asynchronous compression is performed. After the asynchronous compression This call attempts to use a threshold, and if it fails, direct memory recycling is performed.kswapget_page_from_freelistminzonelistzonezonelistzoneget_page_from_freelistmin

kswapd memory recycling

kswapd->balance_pgdat()->kswapd_shrink_zone()->shrink_zone()

During the allocation process, as long as get_page_from_freelist()the function cannot obtain a continuous page frame from the threshold and the allocation memory flag is lownot marked , the kernel thread will be awakened and memory recycling will be performed in it.zonelistzonegfp_mask__GFP_NO_KSWAPDkswapdkswapd

Guess you like

Origin blog.csdn.net/qq_48322523/article/details/128307150