The last time I said that physical memory is described by the three-level structure of node, zone, and page. Node is based on whether the current system is NUMA or UMA system. Assuming that we are currently UMA system architecture, there is only one node.
In this section, we will focus on learning ZONE, focusing on the data structure of ZONE, where you can see how our page is managed in ZONE, and you will see the buddy distributor.
struct zone {
unsigned long _watermark[NR_WMARK];
unsigned long watermark_boost;
unsigned long nr_reserved_highatomic;
long lowmem_reserve[MAX_NR_ZONES];
const char *name;
struct free_area free_area[MAX_ORDER];
unsigned long flags;
}
- Water level: Each zone has three water level values
- WMARK_MIN: The lowest water level, which means that the memory is obviously not enough.
- WMARK_LOW: low water level, which means that the memory has begun to tighten, need to start the recycling page kernel linear kswapped to reclaim memory
- WMARK_HIGH: high water level, which means that the memory is still sufficient.
enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
NR_WMARK
};
- lowmem_reserve: The memory reserved in this zone, when the system memory is insufficient, the system will use the reserved memory to do some operations, such as using the reserved memory process to release more memory
- free_area: used to maintain free pages, where the subscript of the array corresponds to the order number of the page. The maximum order is currently 11. Free_are structure
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
};
- free_list: used to link the free pages of various orders together
- nr_free: represents that there are multiple free pages in this order
-
And each order is divided into several groups according to the type of migration
enum migratetype {
MIGRATE_UNMOVABLE,
MIGRATE_MOVABLE,
MIGRATE_RECLAIMABLE,
#ifdef CONFIG_CMA
MIGRATE_CMA,
#endif
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
#ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE, /* can't allocate from here */
#endif
MIGRATE_TYPES
};
- MIGRATE_UNMOVABLE: non-movable page
- MIGRATE_MOVABLE: a page that can be moved, when memory fragmentation occurs, you can move this page to make more continuous space
- MIGRATE_RECLAIMABLE: pages that can be recycled
- MIGRATE_CMA: page dedicated to CMA applications
- MIGRATE_PCPTYPES: the use of per-cpu
- MIGRATE_HIGHATOMIC: higher-order atomic allocation
- MIGRATE_ISOLATE: isolation, pages cannot be allocated from here
If you use a picture to represent the connection between zone structures, then look at the picture
You can view the page information cat / proc / pagetypeinfo through my current device
root:/ # cat /proc/pagetypeinfo
Page block order: 10
Pages per block: 1024
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone Normal, type Unmovable 801 290 24 5 4 7 2 0 1 1 0
Node 0, zone Normal, type Movable 288 296 69 18 87 34 14 5 1 1 55
Node 0, zone Normal, type Reclaimable 0 3 1 1 1 1 0 1 1 0 0
Node 0, zone Normal, type CMA 12 6 5 3 3 2 1 3 2 2 57
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type Unmovable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type Movable 1980 1089 913 347 35 8 4 2 3 0 653
Node 0, zone Movable, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable CMA HighAtomic Isolate
Node 0, zone Normal 535 498 49 72 0 0
Node 0, zone Movable 0 768 0 0 0 0
Number of mixed blocks Unmovable Movable Reclaimable CMA HighAtomic Isolate
Node 0, zone Normal 0 15 1 0 0 0
Node 0, zone Movable 0 0 0 0 0 0
You can clearly see the remaining conditions of different types, zones, and pages in each order. Of course, you can also see the remaining status of each page from cat / proc / buddyinfo
root:/ # cat /proc/buddyinfo
Node 0, zone Normal 338 355 130 37 99 43 17 9 5 4 112
Node 0, zone Movable 1604 1204 912 348 35 8 4 2 3 0 653
Over time, the page with the largest order value will slowly break down and become a smaller order page. At this time, when applying for a continuous large page, there will be a defragmentation operation
Of course, our zone, you can also view the detailed information of the zone through cat / proc / zoneinfo
root:/ # cat /proc/zoneinfo
Node 0, zone Normal
pages free 126204
min 1251
low 9254
high 9566
spanned 1308544
present 1180543
managed 1136476
protection: (0, 24576)
nr_free_pages 126204
nr_zone_inactive_anon 984
nr_zone_active_anon 61238
nr_zone_inactive_file 423539
nr_zone_active_file 122889
nr_zone_unevictable 987
nr_zone_write_pending 288
nr_mlock 987
nr_page_table_pages 13969
nr_kernel_stack 36784
nr_bounce 0
nr_zspages 0
nr_free_cma 60532
Node 0, zone Movable
pages free 680267
min 866
low 6404
high 6620
spanned 786432
present 786432
managed 786432
protection: (0, 0)
nr_free_pages 680267
nr_zone_inactive_anon 0
nr_zone_active_anon 104777
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 121
nr_zone_write_pending 0
nr_mlock 121
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
It can be seen that we currently have only one node, the two zones are NORAML and Movable, and the detailed information of each water level.
And our zone is managed by struct pglist_data, each node of pglist_date structure corresponds to one, each node corresponds to a pglist_data structure on the numa machine, and there is only one pglist_data structure on the Uma machine to describe the entire memory
/*
* On NUMA machines, each NUMA node would have a pg_data_t to describe
* it's memory layout. On UMA machines there is a single pglist_data which
* describes the whole memory.
*
* Memory statistics and page replacement data structures are maintained on a
* per-zone basis.
*/
typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[MAX_ZONELISTS];
int nr_zones;
/*
* This is a per-node reserve of pages that are not available
* to userspace allocations.
*/
unsigned long totalreserve_pages;
/* Fields commonly accessed by the page reclaim scanner */
struct lruvec lruvec;
unsigned long flags;
} pg_data_t;
- node_zone: Describes how many zones exist under this node
- node_zonelist: list of standby zones, when the preferred zone fails to be allocated, it will go to the standby zone to find available pages
- totalreserve_page: total pages reserved
- lruvec: All the applied pages will be added to the lru linked list, which is used for recycling pages
There are two types in node_zonelist, namely ZONELIST_FALLBACK and ZONELIST_NOFALLBACK, there is only one zonelist in UMA system, which is ZONELIST_FALLBACK
The LRU list will be divided into different lists according to different LRU types. Common ones include anonymous activity pages, anonymous low activity pages, active file pages, and low activity file pages.
The layout of a memory can be completely described by the pglist_data structure.
- Through pglist_data, we know that there are several zones, and there is a freelist in each zone to indicate the free pages of each order, and what type of migration each page belongs to.
- When applying for the page, apply according to the water level in the zone. When the memory is insufficient, the kernel swapd will be opened to reclaim the memory.
- Each applied page will be linked to the lru linked list. When there is insufficient memory, it will find out which pages are rarely used recently according to the lru algorithm, and then release