Linux-3.14.12 memory management, memory management framework to build notes [(1)]

Conventional computer architecture, the entire memory is a physical line, access to the entire memory space of the CPU time required is the same. Such memory structure is known as UMA (Uniform Memory Architecture, consistent storage structure). But with the development of computers, some new server architecture, especially in the case of multi-CPU, physical memory space access would be difficult to control the time required for the same. In the multi-CPU environment, the system is only one bus with multiple CPU are linked to the above, and each CPU has its own local physical memory space, but you can also go visit other CPU physical memory space by bus, but also there are some common physical memory space and more common CPU can access. Ever since this presents a new situation, a variety of physical memory space in which the different positions, so the length of time they will visit different, can not guarantee consistency. For memory configuration in this case, it is called NUMA (Non-Uniform Memory Architecture, the non-uniform memory architecture). In fact there is no complete UMA, such as the common single-CPU computer access time, RAM, ROM and other physical storage space is not the same, but purely in terms of RAM, is the UMA. There is also a structure called the MPP (Massive Parallel Processing, massively parallel processing system) is performed by a plurality of nodes SMP servers by certain Internet connection, work together to accomplish the same task. It seems from the outside users, it is a server system.

Return to the topic, focused look at NUMA. Since the introduction of NUMA memory architecture, which requires appropriate management mechanisms to support, linux 2.4 version has already begun its support. With the support of the new management mechanism, also will introduce the concept of Node (storage node), the same access time storage space comes down to a storage node. So 3.14.12 version of the current analysis, linux physical memory physical memory management mechanism will be divided into three levels of management, followed by: Node (storage node), Zone (management area) and Page (page).

image

Node is a data structure stored pg_data_t, each NUMA node has a memory layout pg_data_t responsible disclosed information of the node. Wherein the member is node_zones, its data structure is stored in structure pg_data_t management information area Zone, each has a plurality of node_zones pg_data_t, usually three: ZONE_DMA, ZONE_NORMAL, ZONE_HIGHMEM.

ZONE_DMA area is usually due to the computer part of the equipment can not access all memory directly and specifically to the district carved out of the part of the equipment used, x86 environment, the area is usually less than 16M.

ZONE_NORMAL located behind the ZONE_DMA region, the core region is mapped directly to the end portion of the linear address, the x86 environment, this region is generally 16M-896M.

ZONE_HIGHMEM area is the system in addition to and after ZONE_DMA ZONE_NORMAL remaining physical memory area, this area can not be directly mapped by the kernel, the x86 environment, this region is usually after the memory 896M.

Why should the existence of high-end memory? Usually we know the size of the kernel space is 1G (linear space: 3-4G). So how much 1G memory mapped This page global directory entries? It is easy to figure out is 256, there are so many kernel threads which, 1G enough? Obviously enough, if you want to use exceeds 1G of memory how to do? If you want to use the memory, it is clear that the mapping must be done, then make a few pages out of the global directory entries do mapping? Bingo, that is so, how much to make of it? linux kernel design is to make 32-page global directory entries, 256 1/8. Then the 32-page global directory entries corresponding to how much memory space? Count may know 128M, that is a direct mapping of the memory space is 896M. Use more than 896M of memory space as high memory, when once used, you need to do mapping conversion, which is what a waste of resources. So do not often use high memory, it is the origin of such a.

Then look at the initialize memory management framework implementation, initmem_init ():

【file:/arch/x86/mm/init_32.c】
#ifndef CONFIG_NEED_MULTIPLE_NODES
void __init initmem_init(void)
{
#ifdef CONFIG_HIGHMEM
    highstart_pfn = highend_pfn = max_pfn;
    if (max_pfn > max_low_pfn)
        highstart_pfn = max_low_pfn;
    printk(KERN_NOTICE "%ldMB HIGHMEM available.\n",
        pages_to_mb(highend_pfn - highstart_pfn));
    high_memory = (void *) __va(highstart_pfn * PAGE_SIZE - 1) + 1;
#else
    high_memory = (void *) __va(max_low_pfn * PAGE_SIZE - 1) + 1;
#endif
 
    memblock_set_node(0, (phys_addr_t)ULLONG_MAX, &memblock.memory, 0);
    sparse_memory_present_with_active_regions(0);
 
#ifdef CONFIG_FLATMEM
    max_mapnr = IS_ENABLED(CONFIG_HIGHMEM) ? highend_pfn : max_low_pfn;
#endif
    __vmalloc_start_set = true;
 
    printk(KERN_NOTICE "%ldMB LOWMEM available.\n",
            pages_to_mb(max_low_pfn));
 
    setup_bootmem_allocator();
}
#endif /* !CONFIG_NEED_MULTIPLE_NODES */

The high_memory low memory is initialized to the address of the corresponding page frame max_low_pfn size, then call memblock_set_node, a function name, the function may be inferred for the algorithm to set node the node information memblock established earlier.

memblock_set_node implementation:

【file:/mm/memblock.c】
/**
 * memblock_set_node - set node ID on memblock regions
 * @base: base of area to set node ID for
 * @size: size of area to set node ID for
 * @type: memblock type to set node ID for
 * @nid: node ID to set
 *
 * Set the nid of memblock @type regions in [@base,@base+@size) to @nid.
 * Regions which cross the area boundaries are split as necessary.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
int __init_memblock memblock_set_node(phys_addr_t base, phys_addr_t size,
                      struct memblock_type *type, int nid)
{
    int start_rgn, end_rgn;
    int i, ret;
 
    ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
    if (ret)
        return ret;
 
    for (i = start_rgn; i < end_rgn; i++)
        memblock_set_region_node(&type->regions[i], nid);
 
    memblock_merge_regions(type);
    return 0;
}

memblock_set_node main function calls do three related operations: memblock_isolate_range, memblock_set_region_node and memblock_merge_regions.

Which memblock_isolate_range:

【file:/mm/memblock.c】
/**
 * memblock_isolate_range - isolate given range into disjoint memblocks
 * @type: memblock type to isolate range for
 * @base: base of range to isolate
 * @size: size of range to isolate
 * @start_rgn: out parameter for the start of isolated region
 * @end_rgn: out parameter for the end of isolated region
 *
 * Walk @type and ensure that regions don't cross the boundaries defined by
 * [@base,@base+@size). Crossing regions are split at the boundaries,
 * which may create at most two more regions. The index of the first
 * region inside the range is returned in *@start_rgn and end in *@end_rgn.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
static int __init_memblock memblock_isolate_range(struct memblock_type *type,
                    phys_addr_t base, phys_addr_t size,
                    int *start_rgn, int *end_rgn)
{
    phys_addr_t end = base + memblock_cap_size(base, &size);
    int i;
 
    *start_rgn = *end_rgn = 0;
 
    if (!size)
        return 0;
 
    /* we'll create at most two more regions */
    while (type->cnt + 2 > type->max)
        if (memblock_double_array(type, base, size) < 0)
            return -ENOMEM;
 
    for (i = 0; i < type->cnt; i++) {
        struct memblock_region *rgn = &type->regions[i];
        phys_addr_t rbase = rgn->base;
        phys_addr_t rend = rbase + rgn->size;
 
        if (rbase >= end)
            break;
        if (rend <= base)
            continue;
 
        if (rbase < base) {
            /*
             * @rgn intersects from below. Split and continue
             * to process the next region - the new top half.
             */
            rgn->base = base;
            rgn->size -= base - rbase;
            type->total_size -= base - rbase;
            memblock_insert_region(type, i, rbase, base - rbase,
                           memblock_get_region_node(rgn),
                           rgn->flags);
        } else if (rend > end) {
            /*
             * @rgn intersects from above. Split and redo the
             * current region - the new bottom half.
             */
            rgn->base = end;
            rgn->size -= end - rbase;
            type->total_size -= end - rbase;
            memblock_insert_region(type, i--, rbase, end - rbase,
                           memblock_get_region_node(rgn),
                           rgn->flags);
        } else {
            /* @rgn is fully contained, record it */
            if (!*end_rgn)
                *start_rgn = i;
            *end_rgn = i + 1;
        }
    }
 
    return 0;
}

The main function of splitting operation to do, while memblock algorithm builds, but the flags determines whether or not the same, then the contiguous memory do merge operation, but this time to establish node node, according to the reference base and the size range of the memory node memory tag demarcated Come. If memblock in the region just to the node memory range end divided off, then it will region index records to start_rgn, index is incremented recording to end_rgn return back; if memblock in the region spanning the node memory at the end of the boundary, then the border region will adjust the current node is a boundary node memory range, the other part () function is inserted into the through memblock_insert_region memblock management among regions, in order to complete the split.

Look at the way memblock_insert_region () function:

【file:/mm/memblock.c】
/**
 * memblock_insert_region - insert new memblock region
 * @type: memblock type to insert into
 * @idx: index for the insertion point
 * @base: base address of the new region
 * @size: size of the new region
 * @nid: node id of the new region
 * @flags: flags of the new region
 *
 * Insert new memblock region [@base,@base+@size) into @type at @idx.
 * @type must already have extra room to accomodate the new region.
 */
static void __init_memblock memblock_insert_region(struct memblock_type *type,
                           int idx, phys_addr_t base,
                           phys_addr_t size,
                           int nid, unsigned long flags)
{
    struct memblock_region *rgn = &type->regions[idx];
 
    BUG_ON(type->cnt >= type->max);
    memmove(rgn + 1, rgn, (type->cnt - idx) * sizeof(*rgn));
    rgn->base = base;
    rgn->size = size;
    rgn->flags = flags;
    memblock_set_region_node(rgn, nid);
    type->cnt++;
    type->total_size += size;
}

Here a memmove () will be moved back behind the region information, further call memblock_set_region_node () node node number of the original region remains in the region which are split out.

The memblock_set_region_node () function is assigned only to realize it:

【file:/mm/memblock.h】
static inline void memblock_set_region_node(struct memblock_region *r, int nid)
{
    r->nid = nid;
}

Thus, back memblock_set_node () function, which then memblock_isolate_range () memblock_set_region_node is called () is a known node acquires the node number, and memblock_merge_regions () is already analyzed, the region is for combined.

And finally back to initmem_init () function, after memblock_set_node () returns, then the function call is sparse_memory_present_with_active_regions ().

Here sparse memory involves a concept of linux memory model. There are three linux kernel memory model: Flat memory, Discontiguous memory and Sparse memory. Respectively, he said:

  • Flat memory: As the name suggests, flat continuous physical memory, the whole system has only one node node.
  • Discontiguous memory: physical memory discontinuities, voids exist in memory, and thus the system divides physical memory into a plurality of nodes, each node of internal memory but is flat continuous. It is worth noting that the model not only for NUMA environment, the same may be cases where multiple nodes on the environment UMA.
  • Sparse memory: the physical memory is not continuous, the internal memory of the node may be discontinuous, and thus systems may have one or more nodes. In addition, this model is the basis of hot-swappable memory.

Look sparse_memory_present_with_active_regions () implementation:

【file:/mm/page_alloc.c】
/**
 * sparse_memory_present_with_active_regions - Call memory_present for each active range
 * @nid: The node to call memory_present for. If MAX_NUMNODES, all nodes will be used.
 *
 * If an architecture guarantees that all ranges registered with
 * add_active_ranges() contain no holes and may be freed, this
 * function may be used instead of calling memory_present() manually.
 */
void __init sparse_memory_present_with_active_regions(int nid)
{
    unsigned long start_pfn, end_pfn;
    int i, this_nid;
 
    for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid)
        memory_present(this_nid, start_pfn, end_pfn);
}

Inside for_each_mem_pfn_range () is designed to loop a macro definition, and memory_present () due to the experimental environment is not defined CONFIG_HAVE_MEMORY_PRESENT, so is the empty function. Do not do in-depth study on hold for the time being.

Finally, look at the initmem_init (function setup_bootmem_allocator before it exits) ():

【file:/arch/x86/mm/init_32.c】
void __init setup_bootmem_allocator(void)
{
    printk(KERN_INFO " mapped low ram: 0 - %08lx\n",
         max_pfn_mapped<<PAGE_SHIFT);
    printk(KERN_INFO " low ram: 0 - %08lx\n", max_low_pfn<<PAGE_SHIFT);
}

The original function is used to initialize bootmem management algorithm, but now has been used x86 environment memblock management algorithm, just to keep the print section for information.

Guess you like

Origin www.cnblogs.com/linhaostudy/p/11622450.html
Recommended