Linux operating system study notes (eight) task space management

I. Introduction

  From the beginning of this article, we enter the memory part of the study. First, I will follow the previous tasks to task_structexplain the task space management structure mm_struct, and briefly introduce the relevant knowledge of physical memory and virtual memory. For detailed basic knowledge and concepts, please refer to the CSAPP book. I will not go into too much detail here, but the default Learning based on the understanding of its mapping relationship. In the following article, we will continue to introduce the management of physical memory and the memory mapping of user mode and kernel mode.

2. Basic concepts combing

  • The architecture of CPU, cache, memory, and main memory is derived from the fact that the faster the device, the more expensive, so for the sake of economy (qiong), the multi-layer architecture is designed, and the CPU has MMU
  • Physical memory is limited, and there are security problems in multi-process sharing physical memory, so the design of virtual memory appears
  • The virtual memory is designed according to the structure of ELF, and there are parts such as heap, mapping area, stack, data segment, etc.
  • Taking into account the structure of virtual memory, there is a heap application that is dynamic memory
  • Virtual memory allocates a separate address space for each process and maps it to physical memory for execution, so there is a mapping method for physical memory and virtual memory: page
  • In order to manage virtual memory, page tables and multi-level page tables appear
  • In order to speed up the mapping, the TLB in the CPU appeared
  • In order to meet the needs of sharing, shared memory in the memory map appeared
  • Due to the existence of memory fragmentation, a fragment management design and garbage collector appeared

3. Process memory management

  For a process, it is necessary to consider the various structures that need to be stored in the kernel in user mode and kernel mode.

  User mode includes

  • Code snippet
  • Global variable
  • Constant string
  • Function stack, including function calls, local variables, function parameters, etc.
  • Heap: memory allocated by malloc, etc.
  • Memory mapping, such as glibcthe call glibccode is in the form of so file also needs to be placed in memory.

  Kernel mode includes

  • Kernel part of the code
  • Global variables in the kernel
  • task_struct
  • Kernel stack
  • There is also dynamically allocated memory in the kernel
  • Virtual address to physical address mapping table

  The process is task_structmanaged in the kernel mode , and the task_structmemory has the following member variables

	struct mm_struct		*mm;
	struct mm_struct		*active_mm;
	/* Per-thread vma caching: */
	struct vmacache			vmacache;

  The mm_structstructure is also more complicated, we will introduce step by step. First, let's take a look at the address division between kernel mode and user mode. highest_vm_endThe maximum address of the current virtual memory address is stored here , but task_sizethe size of the user mode.

struct mm_struct {
    
    
......
	unsigned long task_size;	/* size of task vm space */
	unsigned long highest_vm_end;	/* highest vma end address */
......
}

  task_sizeThe definition is as follows. From the comments, it can be seen that the user mode allocates 3G space in the 4G virtual memory, and because the 64-bit space is huge, a free area is reserved between the kernel mode and the user mode for isolation. The user mode uses only 47 bits, which is 128TB. The kernel mode also allocates 128TB, which is the highest bit.

#ifdef CONFIG_X86_32
/*
 * User space process size: 3GB (default).
 */
#define TASK_SIZE    PAGE_OFFSET
#define TASK_SIZE_MAX    TASK_SIZE
/*
config PAGE_OFFSET
        hex
        default 0xC0000000
        depends on X86_32
*/
#else
/*
 * User space process size. 47bits minus one guard page.
*/
#define TASK_SIZE_MAX  ((1UL << 47) - PAGE_SIZE)
#define TASK_SIZE    (test_thread_flag(TIF_ADDR32) ? \
          IA32_PAGE_OFFSET : TASK_SIZE_MAX)
......

3.1 User mode memory structure

  In user mode, mm_structthere are the following member variables

  • mmap_base: The starting address of the memory map
  • mmap_legacy_base: Indicates the base address of the mapping, which is fixed in 32 bits TASK_UNMAPPED_BASE, and in 64 bits, there is a virtual address random mapping mechanism , so it isTASK_UNMAPPED_BASE + mmap_rnd()
  • hiwater_rss: High-water usage of RSS
  • hiwater_vm: High-water virtual memory usage
  • total_vm: Total number of pages mapped
  • locked_vm: The number of pages that are locked and cannot be swapped out
  • pinned_vm: The number of pages that cannot be swapped out or moved
  • data_vm: The number of pages storing data
  • exec_vm: The number of pages storing executable files
  • stack_vm: The number of pages storing the stack
  • arg_lock: Introduced spin_lockto protect the parallel access to the following area variables
  • start_code 和 end_code: Start and end position of executable code
  • start_data 和 end_data :Start position and end position of initialized data
  • start_brk : The starting position of the heap
  • brk : The current end position of the heap
  • start_stack : The start position of the stack, the end position of the stack is in the top pointer of the register
  • arg_start 和 arg_end : The position of the parameter list, located at the highest address in the stack.
  • env_start 和 env_end : The location of the environment variable, located at the highest address in the stack.
struct mm_struct {
    
    
......    
	unsigned long mmap_base;	/* base of mmap area */
	unsigned long mmap_legacy_base;	/* base of mmap area in bottom-up allocations */    
......
	unsigned long hiwater_rss; /* High-watermark of RSS usage */
	unsigned long hiwater_vm;  /* High-water virtual memory usage */
	unsigned long total_vm;	   /* Total pages mapped */
	unsigned long locked_vm;   /* Pages that have PG_mlocked set */
	atomic64_t    pinned_vm;   /* Refcount permanently increased */
	unsigned long data_vm;	   /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
	unsigned long exec_vm;	   /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
	unsigned long stack_vm;	   /* VM_STACK */    
	spinlock_t arg_lock; /* protect the below fields */
	unsigned long start_code, end_code, start_data, end_data;
	unsigned long start_brk, brk, start_stack;
	unsigned long arg_start, arg_end, env_start, env_end;
	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */    
......
}

  According to these member variables, we can plan the position of each part in the user state, but we also need a structure to describe the attributes of these areas, namelyvm_area_struct

struct mm_struct {
    
    
......    
	struct vm_area_struct *mmap;		/* list of VMAs */
	struct rb_root mm_rb;  
......    
}

  vm_area_structThe specific structure definition of is as follows. It is actually a doubly linked list combined by vm_nextand vm_prev, that is, a series of content vm_area_structto express a process in each area allocated in the user mode.

  • vm_startAnd vm_endexpress the beginning and end of the block area
  • vm_rbCorresponding to a red-black tree, this red-black tree combines everything vm_area_structfor easy addition, deletion and search.
  • rb_subtree_gapStore the interval between the current area and the previous area for subsequent allocation.
  • vm_mmPoint to the structure to which the structure belongsvm_struct
  • vm_page_protManage the access authority of this page, vm_flagsas the mark bit
  • rbAnd rb_subtree_last: interval tree structure with free positions
  • ano_vma 和 ano_vma_chain: Anonymous mapping. The virtual memory area can be mapped to physical memory or to a file. When it is mapped to physical memory, it is called anonymous mapping. When mapping to a file, you need to vm_filespecify the mapped file and vm_pgoffstore the offset.
  • vm_opts: A function pointer to the structure, used to process the structure
  • vm_private_data: Private data storage
/*
 * This struct defines a memory VMM memory area. There is one of these
 * per VM-area/task.  A VM area is any part of the process virtual memory
 * space that has a special rule for the page-fault handlers (ie a shared
 * library, the executable area etc).
 */
struct vm_area_struct {
    
    
	/* The first cache line has the info for VMA tree walking. */
	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */
	/* linked list of VM areas per task, sorted by address */
	struct vm_area_struct *vm_next, *vm_prev;
	struct rb_node vm_rb;
	/*
	 * Largest free memory gap in bytes to the left of this VMA.
	 * Either between this VMA and vma->vm_prev, or between one of the
	 * VMAs below us in the VMA rbtree and its ->vm_prev. This helps
	 * get_unmapped_area find a free area of the right size.
	 */
	unsigned long rb_subtree_gap;
	/* Second cache line starts here. */
	struct mm_struct *vm_mm;	/* The address space we belong to. */
	pgprot_t vm_page_prot;		/* Access permissions of this VMA. */
	unsigned long vm_flags;		/* Flags, see mm.h. */
	/*
	 * For areas with an address space and backing store,
	 * linkage into the address_space->i_mmap interval tree.
	 */
	struct {
    
    
		struct rb_node rb;
		unsigned long rb_subtree_last;
	} shared;
	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_chain; /* Serialized by mmap_sem & page_table_lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */
	/* Function pointers to deal with this struct. */
	const struct vm_operations_struct *vm_ops;
	/* Information about our backing store: */
	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
					   units */
	struct file * vm_file;		/* File we map to (can be NULL). */
	void * vm_private_data;		/* was vm_pte (shared mem) */
	atomic_long_t swap_readahead_info;
#ifndef CONFIG_MMU
	struct vm_region *vm_region;	/* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
#endif
	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
} __randomize_layout;

  For a mm_struct, its many vm_area_structwill ELF file is loaded, that is, load_elf_binary()when the construction. After the function parses the ELF file format, it will establish the memory map, mainly including

  • Call setup_new_exec, set the memory map areammap_base
  • Calls setup_arg_pages, set up the stack vm_area_struct, there is provided mm->arg_starta point of the bottom of the stack, current->mm->start_stackis the bottom of the stack
  • elf_map Will map part of the code in the ELF file to memory
  • set_brkThe heap vm_area_structis set current->mm->start_brk = current->mm->brk, and the inside is set , that is, the heap is still empty
  • load_elf_interpThe dependent somapped into memory mapped area of memory
static int load_elf_binary(struct linux_binprm *bprm)
{
    
    
......
    setup_new_exec(bprm);
......
	/* Do this so that we can load the interpreter, if need be.  We will
	   change some of these later */    
    retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
         executable_stack);
......
    error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
        elf_prot, elf_flags, total_size);
......
	/* Calling set_brk effectively mmaps the pages that we need
	 * for the bss and break sections.  We must do this before
	 * mapping in the interpreter, to make sure it doesn't wind
	 * up getting placed where the bss needs to go.
	 */    
    retval = set_brk(elf_bss, elf_brk, bss_prot);
......
    elf_entry = load_elf_interp(&loc->interp_elf_ex,
              interpreter,
              &interp_map_addr,
              load_bias, interp_elf_phdata);
......
    current->mm->end_code = end_code;
    current->mm->start_code = start_code;
    current->mm->start_data = start_data;
    current->mm->end_data = end_data;
    current->mm->start_stack = bprm->p;
......
}

3.2 kernel state structure

  Due to the large gap between 32-bit and 64-bit systems, there are some differences in structure. We discuss the structure of the two separately here.

3.2.1 32-bit kernel mode structure

  The virtual space in the kernel mode is independent of the process, that is, after all processes enter the kernel through system calls, the virtual address space they see is the same. The following figure shows the virtual space distribution diagram of the 32-bit kernel state.

img
  1. Direct mapping area

    The first 896M is a direct mapping area, which is used for direct mapping with physical memory. Subtract 3G from the virtual memory address to get the location of the corresponding physical memory. In the kernel, there are two macros:

  • __pa(vaddr)Returns the virtual address vaddrphysical address associated;

  • __va(paddr)Calculating the physical address corresponding to the paddrvirtual address.

  The access to this part of the virtual address is also carried out in a paging manner, but the page table address is relatively simple, and one-to-one correspondence is sufficient.

  When the system starts, the first 1M of the physical memory is already occupied. The kernel code segment is loaded from 1M, and then the kernel global variables, BSS, etc., are also covered in ELF. In this way, kernel code segments, global variables, and BSS will also be mapped into the virtual address space after 3G. The specific physical memory layout can be viewed /proc/iomem, and the specific will be different due to each person's system and configuration.

  1. high_memory

  The name of high-end memory comes from the division of the physical address space into three parts in the x86 architecture: ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM. ZONE_HIGHMEM is high memory.

  High-end memory is the name that the memory management module looks at physical memory, which refers to the area above the 896M direct mapping area. **Except for the memory management module in the kernel, the rest operate on virtual addresses. **The memory management module directly manipulates the physical address and performs virtual address allocation and mapping. The meaning of its existence is to access the infinite physical memory space from the limited kernel space of the 32-bit system: borrow this logical address space, create a mapping to the physical memory that you want to access (that is, fill the kernel page table), and use it temporarily for a while , And return it after use.

  1. Kernel dynamic mapping space (noncontiguous memory allocation)

  In VMALLOC_STARTand VMALLOC_ENDcall the kernel dynamic mapping between the region of space, a process corresponding to the user mode mallocapplication memory as in kernel mode can vmallocbe apply. The kernel mode has a separate page table management, separate from the user mode.

  1. Permanent kernel mapping area (permanent kernel mapping)

  PKMAP_BASETo FIXADDR_STARTthe kernel space is called persistent mapping between the address range of 4G-8M to 4G-4M. Using the alloc_pages()function when the physical memory in the upper memory resulting struct pagestructure, can call kmapbe mapped to this region. Because the number of permanent mappings allowed is limited, when high-end memory is no longer needed, the mapping should be unmapped, which can be done through kunmap()functions.

  1. Fixed mapping area

  FIXADDR_STARTThe FIXADDR_TOP(0xFFFF F000)space, called a fixed mapping area, mainly used to meet special needs.

  1. Temporary kernel mapping

  Temporary kernel mapping is passed kmap_atomicand kunmap_atomicimplemented, and is mainly used for operations when writing to physical memory or main memory is required, such as when writing files.

3.2.2 64-bit kernel mode structure

  Because of the huge space in the 64-bit kernel mode, there is no need to plan carefully like the 32-bit mode, and a lot of free areas are directly divided for protection. The structure is shown in the figure below.

img
  • Starting from 0xffff800000000000 is part of the kernel, but at the beginning there is a gap area of ​​8T.
  • From the __PAGE_OFFSET_BASE(0xffff880000000000)virtual address space starting 64T is a direct mapping of the area, which is subtracted PAGE_OFFSETis the physical address. In most cases, the mapping between virtual addresses and physical addresses will still be mapped by establishing a page table.
  • From VMALLOC_START(0xffffc90000000000)the beginning to the VMALLOC_END(0xffffe90000000000)space of 32T is to give vmallocthe.
  • From the VMEMMAP_START(0xffffea0000000000)1T space for storing the beginning physical page description of the structure struct pageof.
  • From __START_KERNEL_map(0xffffffff80000000)512M beginning of the kernel code segment used to store global variables, and so the BSS. This corresponds to the start position of the physical memory minus __START_KERNEL_mapcan obtain the physical memory address. This is a bit similar to the direct mapping area, but it is not contradictory, because there is an 8T empty area before the direct mapping area, which has long passed the location where the kernel code is loaded in the physical memory.

to sum up

  This article analyzes the structure of the user mode and kernel mode in the memory in more detail. Based on this, we can start to analyze the management and mapping of the memory later.

Code information

[1] linux/include/linux/mm_types.h

Reference

[1] wiki

[2] elixir.bootlin.com/linux

[3] woboq

[4] Linux-insides

[5] Deep understanding of Linux kernel

[6] The art of Linux kernel design

[7] Geek Time Talks about Linux Operating System

Guess you like

Origin blog.csdn.net/u013354486/article/details/106960441