Detailed explanation of linux kernel space and high-end memory

Reprinted from: http://blog.csdn.net/tommy_wxie/article/details/17122923

summary:Linux operating system and drivers run in kernel space, applications run in user space, the two cannot simply use pointers to transfer data, because Linux uses the virtual memory mechanism, user space data may be swapped out, when the kernel space When using user-space pointers, the corresponding data may not be in memory. The memory mapping of user space adopts segment page style, and kernel space has its own rules; this article aims to discuss the address mapping of kernel space.

 
Linux kernel address space division


Usually 32-bit Linux kernel virtual address space is divided into 0~3G for user space, 3~4G for kernel space (note that the linear address that the kernel can use is only 1G). Note that this is the 32-bit kernel address space division, and the 64-bit kernel address space division is different.


The origin of Linux kernel high-end memory


When the kernel module code or thread accesses the memory, the memory addresses in the code are logical addresses, and corresponding to the real physical memory addresses, one-to-one mapping of addresses is required, such as the physical address corresponding to the logical address 0xc0000003. The address is 0×3, the physical address corresponding to 0xc0000004 is 0×4, … …, the relationship between the logical address and the physical address is

Physical address = logical address – 0xC0000000: This is the address translation relationship of the kernel address space, pay attention to the virtual The address is on the "high end", but the physical memory address mapped by ta is on the low end.


Logical address Physical memory address
0xc0000000 0×0
0xc0000001 0×1
0xc0000002 0×2
0xc0000003 0×3

0xe0000000 0×20000000

0xffffffff 0×40000000 ??
Assuming that according to the above simple address mapping relationship, the kernel logical address space access is 0xc0000000 ~ 0xffffffff, then the corresponding physical memory range is 0×0 ~ 0×40000000, that is, only 1G physical memory can be accessed. If 8G physical memory is installed in the machine, the kernel can only access the first 1G physical memory, and the latter 7G physical memory will not be accessible, because the address space of the kernel has been mapped to the physical memory address range 0×0 ~ 0×40000000. Even if 8G physical memory is installed, how should the kernel access the memory whose physical address is 0×40000001? The code must have a memory logical address. The address space of 0xc0000000 ~ 0xffffffff has been used up, so the memory after the physical address 0x40000000 cannot be accessed.


Obviously, the kernel address space 0xc0000000 ~ 0xfffffff cannot be used for simple address mapping. Therefore, the kernel address space is divided into three parts in the x86 architecture: ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM. ZONE_HIGHMEM is high-end memory, which is the origin of the concept of high-end memory.

In the x86 structure, the three types of areas (calculated from 3G) are as follows:


ZONE_DMA memory start 16MB


ZONE_NORMAL 16MB~896MB


ZONE_HIGHMEM 896MB ~ end (1G)


Understanding


of Linux kernel high-end memory We explained the origin of high-end memory earlier. Linux divides the kernel address space into three parts, ZONE_DMA, ZONE_NORMAL, and ZONE_HIGHMEM. The high-end memory HIGH_MEM address space ranges from 0xF8000000 to 0xFFFFFFFF (896MB to 1024MB). So how does the kernel use the 128MB high-end memory address space to achieve access to all physical memory?


When the kernel wants to access the memory with a physical address higher than 896MB, find a free logical address space of a corresponding size from the address space range of 0xF8000000 ~ 0xFFFFFFFF, and borrow it for a while. Borrow this logical address space, create a mapping to the physical memory you want to access (that is, fill the kernel PTE page table), temporarily use it for a while, and return it after use. In this way, others can also borrow this address space to access other physical memory, realizing the use of limited address space and accessing all physical memory. As shown below.



For example, the kernel wants to access a physical memory starting from 2G with a size of 1MB, that is, the physical address range is 0x80000000 ~ 0x800FFFFF. Before accessing, find a free address space of 1MB size. Suppose the found free address space is 0xF8700000 ~ 0xF87FFFFF, and use this 1MB logical address space to map to the physical address space 0x80000000 ~ 0x800FFFFF memory. Mapping relationship is as follows: the


logical address of the physical memory address
0xF8700000 0 × 80000000
0xF8700001 0 × 80000001
0xF8700002 0 × 80,000,002
... ...
0xF87FFFFF 0x800FFFFF
When the kernel after finished access 0 × 80000000 ~ 0x800FFFFF physical memory, the kernel will 0xF8700000 ~ 0xF87FFFFF release linear space. In this way, other processes or codes can also use the address 0xF8700000 ~ 0xF87FFFFF to access other physical memory.


From the above description, we can know the most basic idea of ​​high-end memory: borrow a segment of address space, establish a temporary address map, release it after use up, and reach this address space and use it cyclically to access all physical memory.


Seeing this, some people can't help but ask: What if a kernel process or module keeps occupying a certain logical address space and does not release it? If this happens, the kernel's high-end memory address space will become more and more tense. If it is occupied and not released, it cannot be accessed without mapping to physical memory.

Linux kernel high-end memory division The
kernel divides the high-end memory into three parts: VMALLOC_START~VMALLOC_END, KMAP_BASE~FIXADDR_START and FIXADDR_START~4G.




For high-end memory, you can obtain the corresponding page through alloc_page() or other functions, but to access the actual physical memory, you have to convert the page to a linear address (why? Think about how the MMU accesses physical memory), and also That is to say, we need to find a linear space for the page corresponding to the high-end memory, and this process is called high-end memory mapping.


Corresponding to the three parts of high-end memory, there are three ways of high-end memory mapping:
mapping to "kernel dynamic mapping space" (noncontiguous memory allocation)
This method is very simple, because through vmalloc(), when applying for memory in "kernel dynamic mapping space" , it is possible to obtain pages from high-end memory (see the implementation of vmalloc), so it is possible to map high-end memory to "kernel dynamic mapping space".


Persistent kernel mapping (permanent kernel mapping)
If the page corresponding to the high-end memory is obtained through alloc_page(), how to find a linear space for it?
The kernel reserves a linear space for this purpose, from PKMAP_BASE to FIXADDR_START, for mapping high-end memory. On the 2.6 kernel, this address range is between 4G-8M and 4G-4M. This space is called "kernel permanent mapping space" or "permanent kernel mapping space". This space uses the same page directory table as other spaces. For the kernel, it is swapper_pg_dir. For ordinary processes, it is pointed to by the CR3 register. Normally, this space is 4M in size, so only one page table is needed, and the kernel finds this page table through pkmap_page_table. Through kmap(), a page can be mapped to this space. Since this space is 4M in size, up to 1024 pages can be mapped at the same time. Therefore, for unused pages, they should be released from this space (that is, unmapped) when necessary. Through kunmap(), the linear address corresponding to a page can be released from this space.


Temporary kernel mapping
The kernel reserves some linear space between FIXADDR_START and FIXADDR_TOP for special needs. This space is called "fixed mapping space". In this space, there is a part for temporary mapping of high-end memory.


This space has the following characteristics:
(1) Each CPU occupies a space
(2) In the space occupied by each CPU, it is divided into multiple small spaces, each small space is 1 page, and each Small spaces are used for a purpose defined in km_type in kmap_types.h.


When a temporary mapping is to be performed, the purpose of the mapping needs to be specified. According to the purpose of the mapping, the corresponding small space can be found, and then the address of this space is used as the mapping address. This means that a temporary mapping will cause the previous mapping to be overwritten. Temporary mapping is achieved with kmap_atomic().






Frequently Asked Questions:


1. Does user space (process) have the concept of high-end memory?


User processes have no concept of high memory. Upper memory exists only in kernel space. User processes can only access 3G physical memory at most, while kernel processes can access all physical memory.


 


2. Is there high-end memory in the 64-bit kernel?


In reality, there is no high-end memory in the 64-bit Linux kernel, because the 64-bit kernel can support more than 512GB of memory. If the physical memory installed on the machine exceeds the range of the kernel address space, high memory exists.


 


3. How much physical memory can a user process access? How much physical memory can kernel code access?


A 32-bit system user process can access up to 3GB, and kernel code can access all physical memory.


A 64-bit system user process can access more than 512GB, and the kernel code can access all physical memory.


 


4. What is the relationship between high-end memory and physical addresses, logical addresses, and linear addresses?


High-end memory is only related to logical addresses, not directly related to logical addresses and physical addresses.


 


5. Why not allocate all the address space to the kernel?


If all the address space is given to memory, how does the user process use the memory? How to ensure that the kernel uses memory and user processes do not conflict?




(1) Let's ignore Linux's support for segmented memory mapping. In protected mode, we know that whether the CPU runs in user mode or core mode, the address accessed by the CPU execution program is a virtual address. The MMU must read the value in the control register CR3 as the pointer of the current page directory, and then according to the paging The memory mapping mechanism (see related documentation) converts the virtual address to a real physical address so that the CPU can actually access the physical address.


(2) For 32-bit Linux, each process has an addressing space of 4G, but when a process accesses an address in its virtual memory space, how can it be achieved not to be confused with the virtual space of other processes? ? Each process has its own page directory PGD, and Linux stores the pointer of this directory in the memory structure task_struct.(struct mm_struct)mm->pgd corresponding to the process. Whenever a process is scheduled (schedule()) and is about to enter the running state, the Linux kernel sets CR3 (switch_mm()) with the PGD pointer of the process.


(3) When a new process is created, a new page directory PGD must be created for the new process, and the kernel page directory entry is copied from the kernel page directory swapper_pg_dir to the corresponding location of the new process page directory PGD. The specific process As follows:
do_fork() --> copy_mm() --> mm_init() --> pgd_alloc() --> set_pgd_fast() --> get_pgd_slow() --> memcpy(&PGD + USER_PTRS_PER_PGD, swapper_pg_dir + USER_PTRS_PER_PGD, (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t))
In this way, the page directory of each process is divided into two parts, the first part is "user space", which is used to map its entire process space (0x0000 0000-0xBFFF FFFF), which is 3G words The virtual address of the section; the second part is "system space", which is used to map (0xC000 0000-0xFFFF FFFF) the virtual address of 1G bytes. It can be seen that the second part of the page directory of each process in the Linux system is the same, so from the process point of view, each process has 4G bytes of virtual space, and the lower 3G bytes are its own user space , and the highest 1G byte is the system space shared with all processes and the kernel.


(4) Now suppose we have the following scenario:
the "hostname" of the computer in the network is set by the system call sethostname(const char *name,seze_t len) in process A.
In this scenario, we are bound to involve the problem of transferring data from user space to kernel space. Name is an address in user space, which needs to be set to an address in the kernel through a system call. Let's look at some details in this process: the specific implementation of the system call is to store the parameters of the system call in the registers ebx, ecx, edx, esi, edi (up to 5 parameters, this scenario has two name and len ), then store the system call number into the register eax, and then make the process A enter the system space through the interrupt instruction "int 80". Since the CPU running level of the process is less than or equal to the entry level 3 of the trap door set for the system call, it can enter the system space unimpeded to execute the function pointer system_call() set for int 80. Since system_call() belongs to the kernel space and its run level DPL is 0, the CPU needs to switch the stack to the kernel stack, that is, the system space stack of process A. We know that when the kernel creates the task_struct structure for the new process, it allocates two consecutive pages, that is, the size of 8K, and uses the size of about 1k at the bottom for the task_struct (such as #define alloc_task_struct() ((struct task_struct *) __get_free_pages( GFP_KERNEL, 1))), while the rest of the memory is used for the stack space of the system space, that is, when the user space is transferred to the system space, the stack pointer esp becomes (alloc_task_struct()+8192), which is why the system space is usually The reason for using the macro definition current (see its implementation) to obtain the task_struct address of the current process. Every time the process enters the system space from the user space, the system stack has been pushed into the user stack SS, user stack pointer ESP, EFLAGS, user space CS, EIP in turn, and then system_call() pushes eax into it, and then calls SAVE_ALL pushes ES, DS, EAX, EBP, EDI, ESI, EDX, ECX, EBX in sequence, and then calls sys_call_table+4*%EAX, in this case sys_sethostname().


(5) In sys_sethostname(), after some protection considerations, call copy_from_user(to, from, n), where to points to kernel space system_utsname.nodename, such as 0xE625A000, and from points to user space, such as 0x8010FE00. Now process A enters the kernel and runs in the system space. The MMU completes the mapping from virtual addresses to physical addresses according to its PGD, and finally completes the copying of data from user space to system space. Before preparing to copy, the kernel must first determine the validity of the user space address and length. As for whether the entire range of a certain length starting from the user space address has been mapped, it is not checked. If an address in the range is not mapped or has read and write permissions When the problem occurs, it is regarded as a bad address, and a page exception is generated, which is handled by the page exception service program. The process is as follows: copy_from_user()->generic_copy_from_user()->access_ok()+__copy_user_zeroing().


(6) Summary:
*Process addressing space 0~4G  
*Process can only access 0~3G in user mode, only when entering kernel mode To access 3G~4G  
*Process enters kernel mode through system calls
*The 3G~4G part of each process virtual space is the same  
*Process entering kernel mode from user mode will not cause CR3 changes but will cause stack changes


Linux simplifies the segmentation mechanism, so that the virtual address and the linear address are always consistent. Therefore, the virtual address space of Linux is also 0 to 4G. The Linux kernel divides this 4Gbyte space into two parts. The highest 1G bytes (from virtual address 0xC0000000 to 0xFFFFFFFF) are used by the kernel, which is called "kernel space". And the lower 3G bytes (from virtual address 0x00000000 to 0xBFFFFFFF) are used by each process, called "user space). Because each process can enter the kernel through system calls, the Linux kernel is controlled by all processes in the system. Sharing. Therefore, from the perspective of specific processes, each process can have a virtual space of 4G bytes.
    Linux uses a two-level protection mechanism: level 0 is used by the kernel, and level 3 is used by user programs. As can be seen from the figure (The picture cannot be represented here), each process has its own private user space (0~3G), which is invisible to other processes in the system. The highest virtual kernel space of 1GB bytes is used by all processes and the kernel. Sharing.
1. Mapping from virtual kernel space to physical space
  Kernel code and data are stored in the kernel space, while user program code and data are stored in the user space of the process. Whether it is kernel space or user space, they are all in the In the virtual space. The reader will ask, when the system starts, the code and data of the kernel are not loaded into physical memory? Why are they also in virtual memory? This is related to the compiler, and we will understand this through specific discussions later. One point.
Although the kernel space occupies the highest 1GB bytes in each virtual space, mapping to physical memory always starts from the lowest address (0x00000000). For kernel space, its address mapping is a very simple linear mapping, 0xC0000000 is the displacement between the physical address and the linear address, which is called PAGE_OFFSET in the Linux code.




Let's take a look at the description and definition of the address mapping in the kernel space in include/asm/i386/page.h:
/*
* This handles the memory map.. We could make this a config
* option, but too many people screw it up, and too few need
* it.
*
* A __PAGE_OFFSET of 0xC0000000 means that the kernel has
* a virtual address space of one gigabyte, which limits the
* amount of physical memory you can use to about 950MB. 
*
* If you want more physical memory than this then see the CONFIG_HIGHMEM4G
* and CONFIG_HIGHMEM64G options in the kernel configuration.
*/


#define __PAGE_OFFSET           (0xC0000000)
……
#define PAGE_OFFSET             ((unsigned long)__PAGE_OFFSET)
#define __pa(x)                 ((unsigned long)(x)-PAGE_OFFSET)
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
The source code notes that if your physical memory is greater than 950MB, you need to add CONFIG_HIGHMEM4G and CONFIG_HIGHMEM64G options when compiling the kernel , we do not consider this situation for the time being. If the physical memory is less than 950MB, for the kernel space, given a virtual address x, its physical address is "x- PAGE_OFFSET", given a physical address x, its virtual address is "x+ PAGE_OFFSET".
Here again, the macro __pa() only maps a virtual address of a kernel space to a physical address, but never applies to user space, and the address mapping of user space is much more complicated.
2. Kernel image
  In the following description, we refer to the code and data of the kernel as the kernel image. When the system starts, the Linux kernel image is installed at the physical address 0x00100000, which is the interval starting with 1MB (the 1st M is reserved for other use). However, in normal operation, the entire kernel image should be in the virtual kernel space. Therefore, the linker adds an offset PAGE_OFFSET to all symbol addresses when linking the kernel image, so that the kernel image starts at the beginning of the kernel space. The address is 0xC0100000.
For example, a process's page directory PGD (which belongs to the kernel data structure) is in kernel space. When the process is switched, the register CR3 should be set to point to the page directory PGD of the new process, and the starting address of the directory is a virtual address in the kernel space, but the physical address required by CR3 is used. () for address translation. There is such a line in mm_context.h:
asm volatile(“movl %0,%%cr3”: :”r” (__pa(next->pgd));
This is a line of embedded assembly code, the meaning of which is to convert the page directory start address next_pgd of the next process into a physical address through __pa(), store it in a certain register, and then use the mov instruction to write it into the CR3 register middle. After this line of statement processing, CR3 points to the page directory table PGD of the new process next.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326327845&siteId=291194637