(E) Linux memory management zone_sizes_init

background

  • Read the fucking source code! --By Lu Xun
  • A picture is worth a thousand words. --By Gorky

Description:

  1. Kernel Version: 4.14
  2. ARM64 processor, Contex-A53, binuclear
  3. Use tools: Source Insight 3.5, Visio

1 Introduction

In (four) Sparse Memory Model Linux memory model of , we analyzed the bootmem_inittop half of the function, this time we came to the lower half of it, mainly around the lower half of zone_sizes_initthe expansion functions.
Prospects Review:
bootmem_init()function code is as follows:

void __init bootmem_init(void)
{
    unsigned long min, max;

    min = PFN_UP(memblock_start_of_DRAM());
    max = PFN_DOWN(memblock_end_of_DRAM());

    early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);

    max_pfn = max_low_pfn = max;

    arm64_numa_init();
    /*
     * Sparsemem tries to allocate bootmem in memory_present(), so must be
     * done after the fixed reservations.
     */
    arm64_memory_present();

    sparse_init();
    zone_sizes_init(min, max);

    memblock_dump_all();
}

In Linux, the use of physical memory address area zoneto manage. Not going to be too much foreplay, first on a zone_sizes_initfunction call graph it:

Need to point out that the use of that ARM64,UMA(只有一个Node), in addition, flow analysis that does not open macro, the corresponding function is not in-depth analysis. Start exploring today!

2. Data Structure

The key structure as shown in FIG.
In the NUMAnext framework, each Nodewill correspond to a struct pglist_data, in UMAonly with a unique architecture in a struct pglist_datastructure, such as we are in ARM64 UMAa global variable that is used in struct pglist_data __refdata contig_page_data.

struct pglist_data 关键字段

struct zone node_zones[];           //对应的ZONE区域,比如ZONE_DMA,ZONE_NORMAL等
struct zonelist_node_zonelists[];

unsigned long node_start_pfn;           //节点的起始内存页面帧号
unsigned long node_present_pages;    //总共可用的页面数
unsigned long node_spanned_pages;  //总共的页面数,包括有空洞的区域

wait_queue_head_t kswapd_wait;        //页面回收进程使用的等待队列
struct task_struct *kswapd;               //页面回收进程

struct zone 关键字段

unsigned long watermark[];          //水位值,WMARK_MIN/WMARK_LOV/WMARK_HIGH,页面分配器和kswapd页面回收中会用到
long lowmem_reserved[];             //zone中预留的内存
struct pglist_data *zone_pgdat;     //执行所属的pglist_data
struct per_cpu_pageset *pageset;  //Per-CPU上的页面,减少自旋锁的争用

unsigned long zone_start_pfn;       //ZONE的起始内存页面帧号
unsigned long managed_pages;    //被Buddy System管理的页面数量
unsigned long spanned_pages;     //ZONE中总共的页面数,包含空洞的区域
unsigned long present_pages;      //ZONE里实际管理的页面数量

struct frea_area free_area[];         //管理空闲页面的列表

Macro described point: struct pglist_dataDescription (Single Node memory UMAall memory architecture), and then divided into different memory zoneareas, zonedescribe a different page in the region, including free pages, Buddy Systemmanagement of pages and the like.

3. zone

The code it:

enum zone_type {
#ifdef CONFIG_ZONE_DMA
    /*
     * ZONE_DMA is used when there are devices that are not able
     * to do DMA to all of addressable memory (ZONE_NORMAL). Then we
     * carve out the portion of memory that is needed for these devices.
     * The range is arch specific.
     *
     * Some examples
     *
     * Architecture     Limit
     * ---------------------------
     * parisc, ia64, sparc  <4G
     * s390         <2G
     * arm          Various
     * alpha        Unlimited or 0-16MB.
     *
     * i386, x86_64 and multiple other arches
     *          <16M.
     */
    ZONE_DMA,
#endif
#ifdef CONFIG_ZONE_DMA32
    /*
     * x86_64 needs two ZONE_DMAs because it supports devices that are
     * only able to do DMA to the lower 16M but also 32 bit devices that
     * can only do DMA areas below 4G.
     */
    ZONE_DMA32,
#endif
    /*
     * Normal addressable memory is in ZONE_NORMAL. DMA operations can be
     * performed on pages in ZONE_NORMAL if the DMA devices support
     * transfers to all addressable memory.
     */
    ZONE_NORMAL,
#ifdef CONFIG_HIGHMEM
    /*
     * A memory area that is only addressable by the kernel through
     * mapping portions into its own address space. This is for example
     * used by i386 to allow the kernel to address the memory beyond
     * 900MB. The kernel will set up special mappings (page
     * table entries on i386) for each page that the kernel needs to
     * access.
     */
    ZONE_HIGHMEM,
#endif
    ZONE_MOVABLE,
#ifdef CONFIG_ZONE_DEVICE
    ZONE_DEVICE,
#endif
    __MAX_NR_ZONES

};

General memory management to deal with a variety of different architectures, X86, ARM, MIPS ..., in order to reduce complexity, only need to pick your architecture related. Currently I use the platform, only the configuration ZONE_DMAand ZONE_NORMAL. FIG Log output is as follows:

Why did not ZONE_NORMALthe region, tracking a pass code is found, ZONE_DMAthe size of the locale from the beginning of the 4G starting memory area and the border area can not exceed 4G, but I use the memory as 512M, so all in this region.

Can be seen from the above structure, the ZONE_DMAdefinition of a macro, ZONE_NORMALis the architecture has all the region, then why the need for a ZONE_DMAthe region to Photo:

Therefore, if the address range of devices are in the region of memory, then a ZONE_NORMALis sufficient.

4. calculate_node_totalpages

From this it is easy to see the name to know is to count Nodethe number of pages, a picture to explain all:

  • In front of the article's analysis, the physical memory by the memblockmaintenance, the entire memory area, there could be a hole area, which is the figure of the holepart;
  • For each type of ZONEarea, respectively, will go across statistics page frame, and there may be empty, and calculate the actual pages available present_pages;
  • NodeManagement of all ZONEits spanned_pagesand present_pagesstatistics of each ZONEcorresponding page sum.

The end of the calculation process, the basic information put into the management of the page frame.

5. free_area_init_core

In simple terms, free_area_init_corethe function mainly to complete the struct pglist_datafield initialized structure and initializes it manages each zone, look at the code it:

/*
 * Set up the zone data structures:
 *   - mark all pages reserved
 *   - mark all memory queues empty
 *   - clear the memory bitmaps
 *
 * NOTE: pgdat should get zeroed by caller.
 */
static void __paginginit free_area_init_core(struct pglist_data *pgdat)
{
    enum zone_type j;
    int nid = pgdat->node_id;

    pgdat_resize_init(pgdat);
#ifdef CONFIG_NUMA_BALANCING
    spin_lock_init(&pgdat->numabalancing_migrate_lock);
    pgdat->numabalancing_migrate_nr_pages = 0;
    pgdat->numabalancing_migrate_next_window = jiffies;
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
    spin_lock_init(&pgdat->split_queue_lock);
    INIT_LIST_HEAD(&pgdat->split_queue);
    pgdat->split_queue_len = 0;
#endif
    init_waitqueue_head(&pgdat->kswapd_wait);
    init_waitqueue_head(&pgdat->pfmemalloc_wait);
#ifdef CONFIG_COMPACTION
    init_waitqueue_head(&pgdat->kcompactd_wait);
#endif
    pgdat_page_ext_init(pgdat);
    spin_lock_init(&pgdat->lru_lock);
    lruvec_init(node_lruvec(pgdat));

    pgdat->per_cpu_nodestats = &boot_nodestats;

    for (j = 0; j < MAX_NR_ZONES; j++) {
        struct zone *zone = pgdat->node_zones + j;
        unsigned long size, realsize, freesize, memmap_pages;
        unsigned long zone_start_pfn = zone->zone_start_pfn;

        size = zone->spanned_pages;
        realsize = freesize = zone->present_pages;

        /*
         * Adjust freesize so that it accounts for how much memory
         * is used by this zone for memmap. This affects the watermark
         * and per-cpu initialisations
         */
        memmap_pages = calc_memmap_size(size, realsize);
        if (!is_highmem_idx(j)) {
            if (freesize >= memmap_pages) {
                freesize -= memmap_pages;
                if (memmap_pages)
                    printk(KERN_DEBUG
                           "  %s zone: %lu pages used for memmap\n",
                           zone_names[j], memmap_pages);
            } else
                pr_warn("  %s zone: %lu pages exceeds freesize %lu\n",
                    zone_names[j], memmap_pages, freesize);
        }

        /* Account for reserved pages */
        if (j == 0 && freesize > dma_reserve) {
            freesize -= dma_reserve;
            printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
                    zone_names[0], dma_reserve);
        }

        if (!is_highmem_idx(j))
            nr_kernel_pages += freesize;
        /* Charge for highmem memmap if there are enough kernel pages */
        else if (nr_kernel_pages > memmap_pages * 2)
            nr_kernel_pages -= memmap_pages;
        nr_all_pages += freesize;

        /*
         * Set an approximate value for lowmem here, it will be adjusted
         * when the bootmem allocator frees pages into the buddy system.
         * And all highmem pages will be managed by the buddy system.
         */
        zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
#ifdef CONFIG_NUMA
        zone->node = nid;
#endif
        zone->name = zone_names[j];
        zone->zone_pgdat = pgdat;
        spin_lock_init(&zone->lock);
        zone_seqlock_init(zone);
        zone_pcp_init(zone);

        if (!size)
            continue;

        set_pageblock_order();
        setup_usemap(pgdat, zone, zone_start_pfn, size);
        init_currently_empty_zone(zone, zone_start_pfn, size);
        memmap_init(size, nid, j, zone_start_pfn);
    }
}
  • Initialization struct pglist_datalock queues and internal use;

Traversing each zoneregion, the following initial:

  • According to zonethe spanned_pagesand the present_pagescall calc_memmap_sizeis calculated to manage the zonerequired struct pagenumber of pages occupied structure memmap_pages;

  • zoneThe freesizerepresentation of the available area, subtracted memmap_pagesand DMA_RESERVEthe region, as shown in the development of the printing plate shown in Log: memmapuse 2048the page, DMAreserved 0;

  • Calculation nr_kernel_pagesand nr_all_pagesnumber, in order to illustrate the relationship between these two parameters and pages to a map (because I use the platform only one ZONE_DMAarea, and ARM64no ZONE_HIGHMEMregion, does not have the typical, so in order ARM32for example):

  • Initialize zoneall types of locks used;

  • Allocation and initialization usemap, initialization Buddy Systemfor use in free_area[], lruvec, pcpand the like;

  • memmap_init()->memmap_init_zone()This function is mainly based on PFN, through pfn_to_pageto find the corresponding struct pageand initializes the structure that, and set MIGRATE_MOVABLEflag indicating that the movable;

Finally, we recall bootmem_inita function, it is found that substantially complete linux physical memory initialization frame, comprising Node, Zone, Page Frameand the corresponding data structure.

Combined with the article Sparse Memory (four) Linux memory model of Model read, the effect will be better Oh!

Continuing in ...

Guess you like

Origin www.cnblogs.com/LoyenWang/p/11568481.html