Detailed explanation of the zone of physical memory management

The last time I said that physical memory is described by the three-level structure of node, zone, and page. Node is based on whether the current system is NUMA or UMA system. Assuming that we are currently UMA system architecture, there is only one node.

In this section, we will focus on learning ZONE, focusing on the data structure of ZONE, where you can see how our page is managed in ZONE, and you will see the buddy distributor.

struct zone {
    unsigned long _watermark[NR_WMARK];
    unsigned long watermark_boost;
 
    unsigned long nr_reserved_highatomic;
    long lowmem_reserve[MAX_NR_ZONES];
 
 
    const char      *name;
    struct free_area    free_area[MAX_ORDER];
    unsigned long       flags;
}
  • Water level: Each zone has three water level values
    • WMARK_MIN: The lowest water level, which means that the memory is obviously not enough.
    • WMARK_LOW: low water level, which means that the memory has begun to tighten, need to start the recycling page kernel linear kswapped to reclaim memory
    • WMARK_HIGH: high water level, which means that the memory is still sufficient.
enum zone_watermarks {
    WMARK_MIN,
    WMARK_LOW,
    WMARK_HIGH,
    NR_WMARK
};
  • lowmem_reserve: The memory reserved in this zone, when the system memory is insufficient, the system will use the reserved memory to do some operations, such as using the reserved memory process to release more memory
  • free_area: used to maintain free pages, where the subscript of the array corresponds to the order number of the page. The maximum order is currently 11. Free_are structure
struct free_area {
    struct list_head    free_list[MIGRATE_TYPES];
    unsigned long       nr_free;
};
  • free_list: used to link the free pages of various orders together
  • nr_free: represents that there are multiple free pages in this order
  • And each order is divided into several groups according to the type of migration

enum migratetype {
    MIGRATE_UNMOVABLE,
    MIGRATE_MOVABLE,
    MIGRATE_RECLAIMABLE,
#ifdef CONFIG_CMA
    MIGRATE_CMA,
#endif
    MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
    MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
#ifdef CONFIG_MEMORY_ISOLATION
    MIGRATE_ISOLATE,    /* can't allocate from here */
#endif
    MIGRATE_TYPES
};
  • MIGRATE_UNMOVABLE: non-movable page
  • MIGRATE_MOVABLE: a page that can be moved, when memory fragmentation occurs, you can move this page to make more continuous space
  • MIGRATE_RECLAIMABLE: pages that can be recycled
  • MIGRATE_CMA: page dedicated to CMA applications
  • MIGRATE_PCPTYPES: the use of per-cpu
  • MIGRATE_HIGHATOMIC: higher-order atomic allocation
  • MIGRATE_ISOLATE: isolation, pages cannot be allocated from here

If you use a picture to represent the connection between zone structures, then look at the picture

You can view the page information cat / proc / pagetypeinfo through my current device

root:/ # cat /proc/pagetypeinfo
Page block order: 10
Pages per block:  1024
 
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone   Normal, type    Unmovable    801    290     24      5      4      7      2      0      1      1      0
Node    0, zone   Normal, type      Movable    288    296     69     18     87     34     14      5      1      1     55
Node    0, zone   Normal, type  Reclaimable      0      3      1      1      1      1      0      1      1      0      0
Node    0, zone   Normal, type          CMA     12      6      5      3      3      2      1      3      2      2     57
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  Movable, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  Movable, type      Movable   1980   1089    913    347     35      8      4      2      3      0    653
Node    0, zone  Movable, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  Movable, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  Movable, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone  Movable, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
 
Number of blocks type     Unmovable      Movable  Reclaimable          CMA   HighAtomic      Isolate
Node 0, zone   Normal          535          498           49           72            0            0
Node 0, zone  Movable            0          768            0            0            0            0
 
Number of mixed blocks    Unmovable      Movable  Reclaimable          CMA   HighAtomic      Isolate
Node 0, zone   Normal            0           15            1            0            0            0
Node 0, zone  Movable            0            0            0            0            0            0

You can clearly see the remaining conditions of different types, zones, and pages in each order. Of course, you can also see the remaining status of each page from cat / proc / buddyinfo

root:/ # cat /proc/buddyinfo
Node 0, zone   Normal    338    355    130     37     99     43     17      9      5      4    112
Node 0, zone  Movable   1604   1204    912    348     35      8      4      2      3      0    653

Over time, the page with the largest order value will slowly break down and become a smaller order page. At this time, when applying for a continuous large page, there will be a defragmentation operation

Of course, our zone, you can also view the detailed information of the zone through cat / proc / zoneinfo

root:/ # cat /proc/zoneinfo
Node 0, zone   Normal
  pages free     126204
        min      1251
        low      9254
        high     9566
        spanned  1308544
        present  1180543
        managed  1136476
        protection: (0, 24576)
      nr_free_pages 126204
      nr_zone_inactive_anon 984
      nr_zone_active_anon 61238
      nr_zone_inactive_file 423539
      nr_zone_active_file 122889
      nr_zone_unevictable 987
      nr_zone_write_pending 288
      nr_mlock     987
      nr_page_table_pages 13969
      nr_kernel_stack 36784
      nr_bounce    0
      nr_zspages   0
      nr_free_cma  60532
Node 0, zone  Movable
  pages free     680267
        min      866
        low      6404
        high     6620
        spanned  786432
        present  786432
        managed  786432
        protection: (0, 0)
      nr_free_pages 680267
      nr_zone_inactive_anon 0
      nr_zone_active_anon 104777
      nr_zone_inactive_file 0
      nr_zone_active_file 0
      nr_zone_unevictable 121
      nr_zone_write_pending 0
      nr_mlock     121
      nr_page_table_pages 0
      nr_kernel_stack 0
      nr_bounce    0
      nr_zspages   0
      nr_free_cma  0

It can be seen that we currently have only one node, the two zones are NORAML and Movable, and the detailed information of each water level.

And our zone is managed by struct pglist_data, each node of pglist_date structure corresponds to one, each node corresponds to a pglist_data structure on the numa machine, and there is only one pglist_data structure on the Uma machine to describe the entire memory

/*
 * On NUMA machines, each NUMA node would have a pg_data_t to describe
 * it's memory layout. On UMA machines there is a single pglist_data which
 * describes the whole memory.
 *
 * Memory statistics and page replacement data structures are maintained on a
 * per-zone basis.
 */
typedef struct pglist_data {
    struct zone node_zones[MAX_NR_ZONES];
    struct zonelist node_zonelists[MAX_ZONELISTS];
    int nr_zones;
 
    /*
     * This is a per-node reserve of pages that are not available
     * to userspace allocations.
     */
    unsigned long       totalreserve_pages;
 
    /* Fields commonly accessed by the page reclaim scanner */
    struct lruvec       lruvec;
 
    unsigned long       flags;
 
} pg_data_t;
  • node_zone: Describes how many zones exist under this node
  • node_zonelist: list of standby zones, when the preferred zone fails to be allocated, it will go to the standby zone to find available pages
  • totalreserve_page: total pages reserved
  • lruvec: All the applied pages will be added to the lru linked list, which is used for recycling pages

 

There are two types in node_zonelist, namely ZONELIST_FALLBACK and ZONELIST_NOFALLBACK, there is only one zonelist in UMA system, which is ZONELIST_FALLBACK

The LRU list will be divided into different lists according to different LRU types. Common ones include anonymous activity pages, anonymous low activity pages, active file pages, and low activity file pages.

The layout of a memory can be completely described by the pglist_data structure.

  • Through pglist_data, we know that there are several zones, and there is a freelist in each zone to indicate the free pages of each order, and what type of migration each page belongs to.
  • When applying for the page, apply according to the water level in the zone. When the memory is insufficient, the kernel swapd will be opened to reclaim the memory.
  • Each applied page will be linked to the lru linked list. When there is insufficient memory, it will find out which pages are rarely used recently according to the lru algorithm, and then release

 

Published 187 original articles · won 108 · 370,000 views

Guess you like

Origin blog.csdn.net/longwang155069/article/details/105451267