I. Introduction
From the beginning of this article, we enter the memory part of the study. First, I will follow the previous tasks to task_struct
explain the task space management structure mm_struct
, and briefly introduce the relevant knowledge of physical memory and virtual memory. For detailed basic knowledge and concepts, please refer to the CSAPP book. I will not go into too much detail here, but the default Learning based on the understanding of its mapping relationship. In the following article, we will continue to introduce the management of physical memory and the memory mapping of user mode and kernel mode.
2. Basic concepts combing
- The architecture of CPU, cache, memory, and main memory is derived from the fact that the faster the device, the more expensive, so for the sake of economy (qiong), the multi-layer architecture is designed, and the CPU has MMU
- Physical memory is limited, and there are security problems in multi-process sharing physical memory, so the design of virtual memory appears
- The virtual memory is designed according to the structure of ELF, and there are parts such as heap, mapping area, stack, data segment, etc.
- Taking into account the structure of virtual memory, there is a heap application that is dynamic memory
- Virtual memory allocates a separate address space for each process and maps it to physical memory for execution, so there is a mapping method for physical memory and virtual memory: page
- In order to manage virtual memory, page tables and multi-level page tables appear
- In order to speed up the mapping, the TLB in the CPU appeared
- In order to meet the needs of sharing, shared memory in the memory map appeared
- Due to the existence of memory fragmentation, a fragment management design and garbage collector appeared
3. Process memory management
For a process, it is necessary to consider the various structures that need to be stored in the kernel in user mode and kernel mode.
User mode includes
- Code snippet
- Global variable
- Constant string
- Function stack, including function calls, local variables, function parameters, etc.
- Heap: memory allocated by malloc, etc.
- Memory mapping, such as
glibc
the callglibc
code is in the form of so file also needs to be placed in memory.
Kernel mode includes
- Kernel part of the code
- Global variables in the kernel
- task_struct
- Kernel stack
- There is also dynamically allocated memory in the kernel
- Virtual address to physical address mapping table
The process is task_struct
managed in the kernel mode , and the task_struct
memory has the following member variables
struct mm_struct *mm;
struct mm_struct *active_mm;
/* Per-thread vma caching: */
struct vmacache vmacache;
The mm_struct
structure is also more complicated, we will introduce step by step. First, let's take a look at the address division between kernel mode and user mode. highest_vm_end
The maximum address of the current virtual memory address is stored here , but task_size
the size of the user mode.
struct mm_struct {
......
unsigned long task_size; /* size of task vm space */
unsigned long highest_vm_end; /* highest vma end address */
......
}
task_size
The definition is as follows. From the comments, it can be seen that the user mode allocates 3G space in the 4G virtual memory, and because the 64-bit space is huge, a free area is reserved between the kernel mode and the user mode for isolation. The user mode uses only 47 bits, which is 128TB. The kernel mode also allocates 128TB, which is the highest bit.
#ifdef CONFIG_X86_32
/*
* User space process size: 3GB (default).
*/
#define TASK_SIZE PAGE_OFFSET
#define TASK_SIZE_MAX TASK_SIZE
/*
config PAGE_OFFSET
hex
default 0xC0000000
depends on X86_32
*/
#else
/*
* User space process size. 47bits minus one guard page.
*/
#define TASK_SIZE_MAX ((1UL << 47) - PAGE_SIZE)
#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)
......
3.1 User mode memory structure
In user mode, mm_struct
there are the following member variables
mmap_base
: The starting address of the memory mapmmap_legacy_base
: Indicates the base address of the mapping, which is fixed in 32 bitsTASK_UNMAPPED_BASE
, and in 64 bits, there is a virtual address random mapping mechanism , so it isTASK_UNMAPPED_BASE + mmap_rnd()
hiwater_rss
: High-water usage of RSShiwater_vm
: High-water virtual memory usagetotal_vm
: Total number of pages mappedlocked_vm
: The number of pages that are locked and cannot be swapped outpinned_vm
: The number of pages that cannot be swapped out or moveddata_vm
: The number of pages storing dataexec_vm
: The number of pages storing executable filesstack_vm
: The number of pages storing the stackarg_lock
: Introducedspin_lock
to protect the parallel access to the following area variablesstart_code 和 end_code
: Start and end position of executable codestart_data 和 end_data
:Start position and end position of initialized datastart_brk
: The starting position of the heapbrk
: The current end position of the heapstart_stack
: The start position of the stack, the end position of the stack is in the top pointer of the registerarg_start 和 arg_end
: The position of the parameter list, located at the highest address in the stack.env_start 和 env_end
: The location of the environment variable, located at the highest address in the stack.
struct mm_struct {
......
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
......
unsigned long hiwater_rss; /* High-watermark of RSS usage */
unsigned long hiwater_vm; /* High-water virtual memory usage */
unsigned long total_vm; /* Total pages mapped */
unsigned long locked_vm; /* Pages that have PG_mlocked set */
atomic64_t pinned_vm; /* Refcount permanently increased */
unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
unsigned long stack_vm; /* VM_STACK */
spinlock_t arg_lock; /* protect the below fields */
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
......
}
According to these member variables, we can plan the position of each part in the user state, but we also need a structure to describe the attributes of these areas, namelyvm_area_struct
struct mm_struct {
......
struct vm_area_struct *mmap; /* list of VMAs */
struct rb_root mm_rb;
......
}
vm_area_struct
The specific structure definition of is as follows. It is actually a doubly linked list combined by vm_next
and vm_prev
, that is, a series of content vm_area_struct
to express a process in each area allocated in the user mode.
vm_start
Andvm_end
express the beginning and end of the block areavm_rb
Corresponding to a red-black tree, this red-black tree combines everythingvm_area_struct
for easy addition, deletion and search.rb_subtree_gap
Store the interval between the current area and the previous area for subsequent allocation.vm_mm
Point to the structure to which the structure belongsvm_struct
vm_page_prot
Manage the access authority of this page,vm_flags
as the mark bitrb
Andrb_subtree_last
: interval tree structure with free positionsano_vma 和 ano_vma_chain
: Anonymous mapping. The virtual memory area can be mapped to physical memory or to a file. When it is mapped to physical memory, it is called anonymous mapping. When mapping to a file, you need tovm_file
specify the mapped file andvm_pgoff
store the offset.vm_opts
: A function pointer to the structure, used to process the structurevm_private_data
: Private data storage
/*
* This struct defines a memory VMM memory area. There is one of these
* per VM-area/task. A VM area is any part of the process virtual memory
* space that has a special rule for the page-fault handlers (ie a shared
* library, the executable area etc).
*/
struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
/* linked list of VM areas per task, sorted by address */
struct vm_area_struct *vm_next, *vm_prev;
struct rb_node vm_rb;
/*
* Largest free memory gap in bytes to the left of this VMA.
* Either between this VMA and vma->vm_prev, or between one of the
* VMAs below us in the VMA rbtree and its ->vm_prev. This helps
* get_unmapped_area find a free area of the right size.
*/
unsigned long rb_subtree_gap;
/* Second cache line starts here. */
struct mm_struct *vm_mm; /* The address space we belong to. */
pgprot_t vm_page_prot; /* Access permissions of this VMA. */
unsigned long vm_flags; /* Flags, see mm.h. */
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree.
*/
struct {
struct rb_node rb;
unsigned long rb_subtree_last;
} shared;
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
* or brk vma (with NULL file) can only be in an anon_vma list.
*/
struct list_head anon_vma_chain; /* Serialized by mmap_sem & page_table_lock */
struct anon_vma *anon_vma; /* Serialized by page_table_lock */
/* Function pointers to deal with this struct. */
const struct vm_operations_struct *vm_ops;
/* Information about our backing store: */
unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE
units */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
atomic_long_t swap_readahead_info;
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
struct mempolicy *vm_policy; /* NUMA policy for the VMA */
#endif
struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
} __randomize_layout;
For a mm_struct
, its many vm_area_struct
will ELF file is loaded, that is, load_elf_binary()
when the construction. After the function parses the ELF file format, it will establish the memory map, mainly including
- Call
setup_new_exec
, set the memory map areammap_base
- Calls
setup_arg_pages
, set up the stackvm_area_struct
, there is providedmm->arg_start
a point of the bottom of the stack,current->mm->start_stack
is the bottom of the stack elf_map
Will map part of the code in the ELF file to memoryset_brk
The heapvm_area_struct
is setcurrent->mm->start_brk = current->mm->brk
, and the inside is set , that is, the heap is still emptyload_elf_interp
The dependentso
mapped into memory mapped area of memory
static int load_elf_binary(struct linux_binprm *bprm)
{
......
setup_new_exec(bprm);
......
/* Do this so that we can load the interpreter, if need be. We will
change some of these later */
retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
executable_stack);
......
error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
elf_prot, elf_flags, total_size);
......
/* Calling set_brk effectively mmaps the pages that we need
* for the bss and break sections. We must do this before
* mapping in the interpreter, to make sure it doesn't wind
* up getting placed where the bss needs to go.
*/
retval = set_brk(elf_bss, elf_brk, bss_prot);
......
elf_entry = load_elf_interp(&loc->interp_elf_ex,
interpreter,
&interp_map_addr,
load_bias, interp_elf_phdata);
......
current->mm->end_code = end_code;
current->mm->start_code = start_code;
current->mm->start_data = start_data;
current->mm->end_data = end_data;
current->mm->start_stack = bprm->p;
......
}
3.2 kernel state structure
Due to the large gap between 32-bit and 64-bit systems, there are some differences in structure. We discuss the structure of the two separately here.
3.2.1 32-bit kernel mode structure
The virtual space in the kernel mode is independent of the process, that is, after all processes enter the kernel through system calls, the virtual address space they see is the same. The following figure shows the virtual space distribution diagram of the 32-bit kernel state.
-
Direct mapping area
The first 896M is a direct mapping area, which is used for direct mapping with physical memory. Subtract 3G from the virtual memory address to get the location of the corresponding physical memory. In the kernel, there are two macros:
-
__pa(vaddr)
Returns the virtual addressvaddr
physical address associated; -
__va(paddr)
Calculating the physical address corresponding to thepaddr
virtual address.
The access to this part of the virtual address is also carried out in a paging manner, but the page table address is relatively simple, and one-to-one correspondence is sufficient.
When the system starts, the first 1M of the physical memory is already occupied. The kernel code segment is loaded from 1M, and then the kernel global variables, BSS, etc., are also covered in ELF. In this way, kernel code segments, global variables, and BSS will also be mapped into the virtual address space after 3G. The specific physical memory layout can be viewed /proc/iomem
, and the specific will be different due to each person's system and configuration.
- high_memory
The name of high-end memory comes from the division of the physical address space into three parts in the x86 architecture: ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM. ZONE_HIGHMEM is high memory.
High-end memory is the name that the memory management module looks at physical memory, which refers to the area above the 896M direct mapping area. **Except for the memory management module in the kernel, the rest operate on virtual addresses. **The memory management module directly manipulates the physical address and performs virtual address allocation and mapping. The meaning of its existence is to access the infinite physical memory space from the limited kernel space of the 32-bit system: borrow this logical address space, create a mapping to the physical memory that you want to access (that is, fill the kernel page table), and use it temporarily for a while , And return it after use.
- Kernel dynamic mapping space (noncontiguous memory allocation)
In VMALLOC_START
and VMALLOC_END
call the kernel dynamic mapping between the region of space, a process corresponding to the user mode malloc
application memory as in kernel mode can vmalloc
be apply. The kernel mode has a separate page table management, separate from the user mode.
- Permanent kernel mapping area (permanent kernel mapping)
PKMAP_BASE
To FIXADDR_START
the kernel space is called persistent mapping between the address range of 4G-8M to 4G-4M. Using the alloc_pages()
function when the physical memory in the upper memory resulting struct page
structure, can call kmap
be mapped to this region. Because the number of permanent mappings allowed is limited, when high-end memory is no longer needed, the mapping should be unmapped, which can be done through kunmap()
functions.
- Fixed mapping area
FIXADDR_START
The FIXADDR_TOP(0xFFFF F000)
space, called a fixed mapping area, mainly used to meet special needs.
- Temporary kernel mapping
Temporary kernel mapping is passed kmap_atomic
and kunmap_atomic
implemented, and is mainly used for operations when writing to physical memory or main memory is required, such as when writing files.
3.2.2 64-bit kernel mode structure
Because of the huge space in the 64-bit kernel mode, there is no need to plan carefully like the 32-bit mode, and a lot of free areas are directly divided for protection. The structure is shown in the figure below.
- Starting from 0xffff800000000000 is part of the kernel, but at the beginning there is a gap area of 8T.
- From the
__PAGE_OFFSET_BASE(0xffff880000000000)
virtual address space starting 64T is a direct mapping of the area, which is subtractedPAGE_OFFSET
is the physical address. In most cases, the mapping between virtual addresses and physical addresses will still be mapped by establishing a page table. - From
VMALLOC_START(0xffffc90000000000)
the beginning to theVMALLOC_END(0xffffe90000000000)
space of 32T is to givevmalloc
the. - From the
VMEMMAP_START(0xffffea0000000000)
1T space for storing the beginning physical page description of the structurestruct page
of. - From
__START_KERNEL_map(0xffffffff80000000)
512M beginning of the kernel code segment used to store global variables, and so the BSS. This corresponds to the start position of the physical memory minus__START_KERNEL_map
can obtain the physical memory address. This is a bit similar to the direct mapping area, but it is not contradictory, because there is an 8T empty area before the direct mapping area, which has long passed the location where the kernel code is loaded in the physical memory.
to sum up
This article analyzes the structure of the user mode and kernel mode in the memory in more detail. Based on this, we can start to analyze the management and mapping of the memory later.
Code information
[1] linux/include/linux/mm_types.h
Reference
[1] wiki
[3] woboq
[4] Linux-insides
[5] Deep understanding of Linux kernel
[6] The art of Linux kernel design
[7] Geek Time Talks about Linux Operating System