2. Linux-3.14.12 memory management algorithm memblock notes [boot stage (2)]

memory: indicates that the memory available assignable;
complete the preparatory work before the end of memblock initialization algorithm, the algorithm is initialized and returned to memblock algorithm above. memblock is a very simple algorithm.

Achieve memblock algorithm is that it will all states are stored in a global variable __initdata_memblock, the initialization and application release the memory of algorithm changes are made in the state of the memory block. So start from the data structure,

__initdata_memblock is a memblock structure. Its structure is defined:

【file:/include/linux/memblock.h】
struct memblock {
    bool bottom_up; /* is bottom up direction? */
    phys_addr_t current_limit;
    struct memblock_type memory;
    struct memblock_type reserved;
};

Members of the body structure of meaning:

  • bottom_up: it is used to represent the memory allocator from a low address (low address is the kernel image tail, the same below) to a high or from high address to the lower address to the address allocation;
  • current_limit: it is used to indicate for limiting memblock_alloc () and memblock_alloc_base (..., MEMBLOCK_ALLOC_ACCESSIBLE) memory applications;
  • memory: represents the available memory can be allocated;
  • reserved: indicates that the memory has been allocated out of;

and reserved memory is critical to a data structure, memory initialization and application release memblock algorithms is that they turn around.

Look down structure memblock_type defined and reserved memory of:

【file:/include/linux/memblock.h】
struct memblock_type {
    unsigned long cnt; /* number of regions */
    unsigned long max; /* size of the allocated array */
    phys_addr_t total_size; /* size of all regions */
    struct memblock_region *regions;
};

cnt and max respectively represent the current state (memory / reserved) block of memory available number and the maximum number of supportable, total_size indicates the current state (memory / reserved) in the space (i.e., the memory block all of this information available sizes), and regions are used to store information in the memory block structure (including the base address, size, and markers, etc.):

【file:/include/linux/memblock.h】
struct memblock_region {
    phys_addr_t base;
    phys_addr_t size;
    unsigned long flags;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
    int nid;
#endif
};

The main structure memblock algorithm only so much of the overall relationship is shown:

image

Go back and look at the definition of __initdata_memblock:

【file:/mm/memblock.c】
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
struct memblock memblock __initdata_memblock = {
    .memory.regions = memblock_memory_init_regions,
    .memory.cnt = 1, /* empty dummy entry */
    .memory.max = INIT_MEMBLOCK_REGIONS,
    
    .reserved.regions = memblock_reserved_init_regions,
    .reserved.cnt = 1, /* empty dummy entry */
    .reserved.max = INIT_MEMBLOCK_REGIONS,
 
    .bottom_up = false,
    .current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};

It initializes some members expressed to apply from high memory addresses to lower addresses, and current_limit set to ~ 0, ie 0xFFFFFFFF, while ready to define the memory space management algorithm memblock in memory and reserved through global variables.

Then analyze memblock initialization algorithm, the initialization function is memblock_x86_fill (), initialization call stack location:

start_kernel()                          #/init/main.c

└->setup_arch()                        #/arch/x86/kernel/setup.c

└->memblock_x86_fill()                #/arch/x86/kernel/e820.c

Function to achieve:

【file:/arch/x86/kernel/e820.c】
void __init memblock_x86_fill(void)
{
    int i;
    u64 end;
 
    /*
     * EFI may have more than 128 entries
     * We are safe to enable resizing, beause memblock_x86_fill()
     * is rather later for x86
     */
    memblock_allow_resize();
 
    for (i = 0; i < e820.nr_map; i++) {
        struct e820entry *ei = &e820.map[i];
 
        end = ei->addr + ei->size;
        if (end != (resource_size_t)end)
            continue;
 
        if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
            continue;
 
        memblock_add(ei->addr, ei->size);
    }
 
    /* throw away partial pages */
    memblock_trim_memory(PAGE_SIZE);
 
    memblock_dump_all();
}

Implementation of the function, the call memblock_allow_resize () is a value only for a set memblock_can_resize; inside is a memory layout for the information e820 of the loop through the operation information do memblock_add; finally exit the loop after the call memblock_trim_memory () and memblock_dump_all () to do post-processing. Here first look at memblock_add () function to achieve:

【file:/mm/memblock.c】
int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size)
{
    return memblock_add_region(&memblock.memory, base, size,
                   MAX_NUMNODES, 0);
}

memblock_add () memblock_add_region main package (), it is particularly important to note the operation target is memblock.memory (available allocatable memory), which presumably is intended to be added here e820 memory information, and then look down memblock_add_region () of achieve:

【file:/mm/memblock.c】
/**
 * memblock_add_region - add new memblock region
 * @type: memblock type to add new region into
 * @base: base address of the new region
 * @size: size of the new region
 * @nid: nid of the new region
 * @flags: flags of the new region
 *
 * Add new memblock region [@base,@base+@size) into @type. The new region
 * is allowed to overlap with existing ones - overlaps don't affect already
 * existing regions. @type is guaranteed to be minimal (all neighbouring
 * compatible regions are merged) after the addition.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
static int __init_memblock memblock_add_region(struct memblock_type *type,
                phys_addr_t base, phys_addr_t size,
                int nid, unsigned long flags)
{
    bool insert = false;
    phys_addr_t obase = base;
    phys_addr_t end = base + memblock_cap_size(base, &size);
    int i, nr_new;
 
    if (!size)
        return 0;
 
    /* special case for empty array */
    if (type->regions[0].size == 0) {
        WARN_ON(type->cnt != 1 || type->total_size);
        type->regions[0].base = base;
        type->regions[0].size = size;
        type->regions[0].flags = flags;
        memblock_set_region_node(&type->regions[0], nid);
        type->total_size = size;
        return 0;
    }
repeat:
    /*
     * The following is executed twice. Once with %false @insert and
     * then with %true. The first counts the number of regions needed
     * to accomodate the new area. The second actually inserts them.
     */
    base = obase;
    nr_new = 0;
 
    for (i = 0; i < type->cnt; i++) {
        struct memblock_region *rgn = &type->regions[i];
        phys_addr_t rbase = rgn->base;
        phys_addr_t rend = rbase + rgn->size;
 
        if (rbase >= end)
            break;
        if (rend <= base)
            continue;
        /*
         * @rgn overlaps. If it separates the lower part of new
         * area, insert that portion.
         */
        if (rbase > base) {
            nr_new++;
            if (insert)
                memblock_insert_region(type, i++, base,
                               rbase - base, nid,
                               flags);
        }
        /* area below @rend is dealt with, forget about it */
        base = min(rend, end);
    }
 
    /* insert the remaining portion */
    if (base < end) {
        nr_new++;
        if (insert)
            memblock_insert_region(type, i, base, end - base,
                           nid, flags);
    }
 
    /*
     * If this was the first round, resize array and repeat for actual
     * insertions; otherwise, merge and return.
     */
    if (!insert) {
        while (type->cnt + nr_new > type->max)
            if (memblock_double_array(type, obase, size) < 0)
                return -ENOMEM;
        insert = true;
        goto repeat;
    } else {
        memblock_merge_regions(type);
        return 0;
    }
}

Analyze the behavior of the process memblock_add_region () function:

  1. If memblock algorithm to manage memory is empty, then the current space added to it;
  2. The case is not empty, then check whether there is memory overlap, if any, the overlapping portions removed, and then add the remaining portion into non-overlapping;
  3. If [] array space condition region occurs, by memblock_double_array () to add a new region [] space;
  4. Finally memblock_merge_regions () next to the memory merged.

It is now clear, it is to see to convert it to function inside the memory layout of FIG e820 memblock.memory memblock management algorithm to manage them, indicating that the memory is available.

Followed by two back memblock_x86_fill () to exit the loop for post-processing function memblock_trim_memory () and memblock_dump_all (), where memblock_trim_memory () implementation:

【file:/mm/memblock.c】
void __init_memblock memblock_trim_memory(phys_addr_t align)
{
    int i;
    phys_addr_t start, end, orig_start, orig_end;
    struct memblock_type *mem = &memblock.memory;
 
    for (i = 0; i < mem->cnt; i++) {
        orig_start = mem->regions[i].base;
        orig_end = mem->regions[i].base + mem->regions[i].size;
        start = round_up(orig_start, align);
        end = round_down(orig_end, align);
 
        if (start == orig_start && end == orig_end)
            continue;
 
        if (start < end) {
            mem->regions[i].base = start;
            mem->regions[i].size = end - start;
        } else {
            memblock_remove_region(mem, i);
            i--;
        }
    }
}

This function is mainly used for trimming memblock.memory do, do not remove portions aligned. The information collated sucked the last memblock_dump_all do dump output, there is not analyzed.

So far memblock memory management initialization be finished. Next, look at the memory allocation and release of algorithms, memory allocation and release under memblock algorithm interfaces are as follows:

memblock_alloc () 和 memblock_free ().

memblock_alloc () function is implemented (the size of the reference size and align for byte alignment):

【file:/mm/memblock.c】
phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
{
    return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
}

Flagged MEMBLOCK_ALLOC_ACCESSIBLE that the applicant can access the memory, the package calls memblock_alloc_base ():

【file:/mm/memblock.c】
phys_addr_t __init memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
    phys_addr_t alloc;
 
    alloc = __memblock_alloc_base(size, align, max_addr);
 
    if (alloc == 0)
        panic("ERROR: Failed to allocate 0x%llx bytes below 0x%llx.\n",
              (unsigned long long) size, (unsigned long long) max_addr);
 
    return alloc;
}

Continue __memblock_alloc_base () (encapsulates memblock_alloc_base_nid (), NUMA_NO_NODE new node to the Senate for no NUMA, after all, not the current initialization there yet):

【file:/mm/memblock.c】
phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t align, phys_addr_t max_addr)
{
    return memblock_alloc_base_nid(size, align, max_addr, NUMA_NO_NODE);
}

Continue memblock_alloc_base_nid ():

【file:/mm/memblock.c】
static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
                    phys_addr_t align, phys_addr_t max_addr,
                    int nid)
{
    phys_addr_t found;
 
    if (!align)
        align = SMP_CACHE_BYTES;
 
    found = memblock_find_in_range_node(size, align, 0, max_addr, nid);
    if (found && !memblock_reserve(found, size))
        return found;
 
    return 0;
}

Here the main attention to two key functions memblock_find_in_range_node () and memblock_reserve ().

Look at memblock_find_in_range_node () implementation:

【file:/mm/memblock.c】
/**
 * memblock_find_in_range_node - find free area in given range and node
 * @size: size of free area to find
 * @align: alignment of free area to find
 * @start: start of candidate range
 * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
 * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
 *
 * Find @size free area aligned to @align in the specified range and node.
 *
 * When allocation direction is bottom-up, the @start should be greater
 * than the end of the kernel image. Otherwise, it will be trimmed. The
 * reason is that we want the bottom-up allocation just near the kernel
 * image so it is highly likely that the allocated memory and the kernel
 * will reside in the same node.
 *
 * If bottom-up allocation failed, will try to allocate memory top-down.
 *
 * RETURNS:
 * Found address on success, 0 on failure.
 */
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
                    phys_addr_t align, phys_addr_t start,
                    phys_addr_t end, int nid)
{
    int ret;
    phys_addr_t kernel_end;
 
    /* pump up @end */
    if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
        end = memblock.current_limit;
 
    /* avoid allocating the first page */
    start = max_t(phys_addr_t, start, PAGE_SIZE);
    end = max(start, end);
    kernel_end = __pa_symbol(_end);
 
    /*
     * try bottom-up allocation only when bottom-up mode
     * is set and @end is above the kernel image.
     */
    if (memblock_bottom_up() && end > kernel_end) {
        phys_addr_t bottom_up_start;
 
        /* make sure we will allocate above the kernel */
        bottom_up_start = max(start, kernel_end);
 
        /* ok, try bottom-up allocation first */
        ret = __memblock_find_range_bottom_up(bottom_up_start, end,
                              size, align, nid);
        if (ret)
            return ret;
 
        /*
         * we always limit bottom-up allocation above the kernel,
         * but top-down allocation doesn't have the limit, so
         * retrying top-down allocation may succeed when bottom-up
         * allocation failed.
         *
         * bottom-up allocation is expected to be fail very rarely,
         * so we use WARN_ONCE() here to see the stack trace if
         * fail happens.
         */
        WARN_ONCE(1, "memblock: bottom-up allocation failed, "
                 "memory hotunplug may be affected\n");
    }
 
    return __memblock_find_range_top_down(start, end, size, align, nid);
}

Rough explain to determine the scope of the end, calling from the front to do with down, in fact, end MEMBLOCK_ALLOC_ACCESSIBLE, As a result the will to memblock.current_limit. Then to start to make adjustments in order to avoid the application to the first page. memblock_bottom_up () returns memblock.bottom_up, previous initialization, they also know the value is false (this is not certain, will be set to true when numa initialization), so in the end it should be called __memblock_find_range_top_down () to find the memory. Look __memblock_find_range_top_down () implementation:

【file:/mm/memblock.c】
/**
 * __memblock_find_range_top_down - find free area utility, in top-down
 * @start: start of candidate range
 * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
 * @size: size of free area to find
 * @align: alignment of free area to find
 * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
 *
 * Utility called from memblock_find_in_range_node(), find free area top-down.
 *
 * RETURNS:
 * Found address on success, 0 on failure.
 */
static phys_addr_t __init_memblock
__memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
                   phys_addr_t size, phys_addr_t align, int nid)
{
    phys_addr_t this_start, this_end, cand;
    u64 i;
 
    for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
        this_start = clamp(this_start, start, end);
        this_end = clamp(this_end, start, end);
 
        if (this_end < size)
            continue;
 
        cand = round_down(this_end - size, align);
        if (cand >= this_start)
            return cand;
    }
 
    return 0;
}

memblock_find_range_top_down () by using a macro package for_each_free_mem_range_reverse call __next_free_mem_range_rev () function, which one by one inside memblock.memory memory block information is extracted and the inspection information memblock.reserved ensure return this_start and not reserved and this_end the presence of the intersection of memory overlap. Then clamp an intermediate value, it is determined whether the magnitude satisfied, in the case of satisfying, since the forward end (as this is the top-down application mode) the start address space size SIZE (if the address does not exceed this_start) return back. Thus to meet the requirements of the memory block be found.

Some say, in fact, find the memory to achieve __memblock_find_range_bottom_up () and __memblock_find_range_top_down () is completely similar, there are differences only in the down-top and top-down above nothing.

Since the condition of the memory block is found, then another key function memblock_reserve back memblock_alloc_base_nid () call ():

【file:/mm/memblock.c】
int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
{
    return memblock_reserve_region(base, size, MAX_NUMNODES, 0);
}

Then look at the memblock_reserve_region ():

【file:/mm/memblock.c】
static int __init_memblock memblock_reserve_region(phys_addr_t base,
                           phys_addr_t size,
                           int nid,
                           unsigned long flags)
{
    struct memblock_type *_rgn = &memblock.reserved;
 
    memblock_dbg("memblock_reserve: [%#016llx-%#016llx] flags %#02lx %pF\n",
             (unsigned long long)base,
             (unsigned long long)base + size - 1,
             flags, (void *)_RET_IP_);
 
    return memblock_add_region(_rgn, base, size, nid, flags);
}

See memblock_reserve_region () by memblock_add_region () function which is added to the memory block memblock.reserved information.

Finally, take a look at memblock algorithm memblock_free () implementation:

【file:/mm/memblock.c】
int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
{
    memblock_dbg(" memblock_free: [%#016llx-%#016llx] %pF\n",
             (unsigned long long)base,
             (unsigned long long)base + size - 1,
             (void *)_RET_IP_);
 
    return __memblock_remove(&memblock.reserved, base, size);
}

The main function encapsulates __memblock_remove () for the operation of the memblock.reserved.

Then look __memblock_remove ():

【file:/mm/memblock.c】
static int __init_memblock __memblock_remove(struct memblock_type *type,
                         phys_addr_t base, phys_addr_t size)
{
    int start_rgn, end_rgn;
    int i, ret;
 
    ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
    if (ret)
        return ret;
 
    for (i = end_rgn - 1; i >= start_rgn; i--)
        memblock_remove_region(type, i);
    return 0;
}

The main function calls the two key functions memblock_isolate_range () and memblock_remove_region (). Look at memblock_isolate_range ():

【file:/mm/memblock.c】
/**
 * memblock_isolate_range - isolate given range into disjoint memblocks
 * @type: memblock type to isolate range for
 * @base: base of range to isolate
 * @size: size of range to isolate
 * @start_rgn: out parameter for the start of isolated region
 * @end_rgn: out parameter for the end of isolated region
 *
 * Walk @type and ensure that regions don't cross the boundaries defined by
 * [@base,@base+@size). Crossing regions are split at the boundaries,
 * which may create at most two more regions. The index of the first
 * region inside the range is returned in *@start_rgn and end in *@end_rgn.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
static int __init_memblock memblock_isolate_range(struct memblock_type *type,
                    phys_addr_t base, phys_addr_t size,
                    int *start_rgn, int *end_rgn)
{
    phys_addr_t end = base + memblock_cap_size(base, &size);
    int i;
 
    *start_rgn = *end_rgn = 0;
 
    if (!size)
        return 0;
 
    /* we'll create at most two more regions */
    while (type->cnt + 2 > type->max)
        if (memblock_double_array(type, base, size) < 0)
            return -ENOMEM;
 
    for (i = 0; i < type->cnt; i++) {
        struct memblock_region *rgn = &type->regions[i];
        phys_addr_t rbase = rgn->base;
        phys_addr_t rend = rbase + rgn->size;
 
        if (rbase >= end)
            break;
        if (rend <= base)
            continue;
 
        if (rbase < base) {
            /*
             * @rgn intersects from below. Split and continue
             * to process the next region - the new top half.
             */
            rgn->base = base;
            rgn->size -= base - rbase;
            type->total_size -= base - rbase;
            memblock_insert_region(type, i, rbase, base - rbase,
                           memblock_get_region_node(rgn),
                           rgn->flags);
        } else if (rend > end) {
            /*
             * @rgn intersects from above. Split and redo the
             * current region - the new bottom half.
             */
            rgn->base = end;
            rgn->size -= end - rbase;
            type->total_size -= end - rbase;
            memblock_insert_region(type, i--, rbase, end - rbase,
                           memblock_get_region_node(rgn),
                           rgn->flags);
        } else {
            /* @rgn is fully contained, record it */
            if (!*end_rgn)
                *start_rgn = i;
            *end_rgn = i + 1;
        }
    }
 
    return 0;
}

We can see memblock_isolate_range () is found mainly covers the specified index index memory block of memory items to find and return back to the parameter. Then look at the realization memblock_remove_region:

【file:/mm/memblock.c】
static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
{
    type->total_size -= type->regions[r].size;
    memmove(&type->regions[r], &type->regions[r + 1],
        (type->cnt - (r + 1)) * sizeof(type->regions[r]));
    type->cnt--;
 
    /* Special case for empty arrays */
    if (type->cnt == 0) {
        WARN_ON(type->total_size != 0);
        type->cnt = 1;
        type->regions[0].base = 0;
        type->regions[0].size = 0;
        type->regions[0].flags = 0;
        memblock_set_region_node(&type->regions[0], MAX_NUMNODES);
    }
}

Its main function is specified index index entry is removed from memblock.reserved memory management structure.

The two together and easier to understand. In __memblock_remove () inside, memblock_isolate_range () main role is memory-based information to be released will memblock.reserved divided into two sections, covering the memblock.reserved memory key entry to the index since the end of the term of the index to be released start_rgn and end_rgn return back. memblock_isolate_range () returns, followed memblock_remove_region () is removed from the management structure by means start_rgn memblock.reserved and several end_rgn this. At this point the memory release is completed.

Simply be a summary: memblock management algorithm will be available to manage the allocated memory for the memblock.memory, allocated memory is managed in memblock.reserved, as long as the memory block is added to the inside memblock.reserved it indicates that the memory has been applied took up. So there is a key point to note, memory allocation when only the application to be added to memblock.reserved in memory, and will not have to delete or change related operations in memblock.memory inside, which is why application and release operations are concentrated in the cause of the memblock.reserved. This algorithm is not very efficient, but it is reasonable, after all, not so much complicated memory operation scene in the initialization phase, many places are even apply for permanent use of memory

Guess you like

Origin www.cnblogs.com/linhaostudy/p/11579103.html