(tidying up) user space_kernel space and memory mapping

Kernel space and user space

  Modern operating systems use virtual memory, and for a 32-bit operating system, its addressing space (virtual storage space) is 4G (2 to the 32nd power). The core of the operating system is the kernel, which is independent of ordinary applications and has access to protected memory space and access to underlying hardware devices. In order to ensure that user processes cannot directly operate the kernel and ensure the security of the kernel, the operating system divides the virtual space into two parts, one is the kernel space and the other is the user space. For the Linux operating system, the highest 1G bytes (from virtual address 0xC0000000 to 0xFFFFFFFF) are used by the kernel, called kernel space, while the lower 3G bytes (from virtual address 0x00000000 to 0xBFFFFFFF) are used by each process, called for user space. Every process can enter the kernel through system calls. Among them, in the Linux system, the user space of the process is independent, while the kernel space is shared. When the process is switched, the user space is switched, and the kernel space remains unchanged.

"(Organizing) User Space_Kernel Space and Memory Mapping"

  With the division of user space and kernel space, the entire linux internal structure can be divided into three parts, from the bottom to the top: hardware -> kernel space -> user space, as shown in the following figure:

"(Organizing) User Space_Kernel Space and Memory Mapping"

User Mode and Kernel Mode

  When a process executes a system call and is caught in the execution of the kernel code, it is said to be in the kernel running state (kernel state). At this point the processor is executing in the most privileged (level 0) kernel code. When a process is in kernel state, the executed kernel code uses the kernel stack of the current process. Each process has its own kernel stack.

  When the user's own code is being executed, it is said to be in the user running state (user state). At this point the processor is running in the least privileged (level 3) user code. When the user program is being executed and is suddenly interrupted by the interrupt program, the user program can also be symbolically called the kernel state in progress, because the interrupt handler uses the kernel stack of the current process.

Statement: The above two parts are organized from: http://www.cnblogs.com/Anker/p/3269106.html

Logical, linear, and physical addresses

  Before explaining high-end memory and memory mapping, let's review what logical addresses, linear addresses, and physical addresses are. If you know it, you can skip it directly.

logical address

  The logical address (Logical Address) refers to the offset address part related to the segment generated by the program. For example, in C language pointer programming, you can read the value of the pointer variable itself (& operation). In fact, this value is the logical address, which is relative to the address of your current process data segment, not related to the absolute physical address. Only in Intel real mode, the logical address is equal to the physical address (because there is no segmentation or paging mechanism in real mode, Cpu does not perform automatic address translation); logic is the offset within the limit of the program execution code segment in Intel protected mode. Shift address (assuming the code segment and data segment are exactly the same). Application programmers only have to deal with logical addresses, while the segmentation and paging mechanisms are completely transparent to you and only covered by system programmers. Although application programmers can directly manipulate memory, they can only operate in the memory segment allocated to you by the operating system.

Linear address

Linear Address is an intermediate layer between logical address to physical address translation. The program code generates the logical address, or the offset address in the segment, and the base address of the corresponding segment is added to generate a linear address. If paging is enabled, the linear address can then be translated to generate a physical address. If the paging mechanism is not enabled, then the linear address is directly the physical address. The linear address space capacity of Intel 80386 is 4G (2 to the 32nd power is 32 address bus addressing).

physical address

Physical Address (Physical Address) refers to the address signal that addresses the physical memory currently on the external address bus of the CPU, and is the final result address of the address transformation. If paging is enabled, linear addresses are converted to physical addresses using entries in the page directory and page table. If the paging mechanism is not enabled, then the linear address becomes the physical address directly.

virtual address

  Virtual memory refers to the amount of memory that a computer presents to be much larger than it actually has. So it allows programmers to write and run programs with much more memory than the actual system has. This enables many large projects also to be implemented on systems with limited memory resources. A very apt analogy is: You don't have to have a very long track to get a train from Shanghai to Beijing. You just need long enough rails (say 3km) to do this. The approach taken is to immediately lay the trailing rails in front of the train, and as long as your operation is fast enough to meet demand, the train can run as if it were a complete track. This is what virtual memory management needs to do. In the Linux0.11 kernel, each program (process) is divided into a virtual memory space with a total capacity of 64MB. So the logical address range of the program is 0x0000000 to 0x4000000. Sometimes we also refer to logical addresses as virtual addresses. Because the concept of virtual memory space is similar, the logical address is also independent of the actual physical memory capacity. The "gap" between the logical address and the physical address is 0xC0000000, because the virtual address -> linear address -> physical address mapping is exactly this value. This value is specified by the operating system. Mechanism Logical addresses (or virtual addresses) are automatically converted to linear addresses by the CPU's segment mechanism. If paging management is not enabled, the linear address is the physical address. If paging management is enabled, then the system program needs to participate in the translation process of linear addresses to physical addresses. Specifically, by setting the page directory table and page table entries.

Statement: The above part is taken from: http://blog.csdn.net/do2jiang/article/details/4512417

high-end memory

The origin of high-end memory

  In the traditional Linux x86 32-bit system, when the code or thread of the kernel module accesses the memory, the memory address in the code is a logical address, and when it corresponds to the real physical memory address, a one-to-one mapping of addresses is also required. If the logical address bit is 0xC0000003, then the corresponding physical address is 0x3. If the logical address bit is 0xC0000004, then the corresponding physical address is 0x4, so the relationship between physical address and logical address is as follows:

Physical Address = Logical Address – 0xC0000000

  According to the address translation relationship of the kernel address space above, note that the virtual address of the kernel is at the "high end", but the physical memory address mapped by ta is at the low end. It will be found that the logical address that the kernel module can access is 0xC0000000-0xFFFFFFFF, the corresponding physical address is 0x00000000-0x40000000, a total of 1G of memory. That is to say, if the total physical memory of the computer is greater than 1G, according to the above mapping relationship, the part higher than 1G cannot be accessed by the kernel. In order to solve this situation, there is a talk about high-end memory.

  Because the 1G memory of the internal space cannot be directly mapped one by one, the Linux kernel divides the kernel space into three parts: ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM. The memory allocation of these three areas is as follows:

ZONE_DMA 16MB space to start with
ZONE_NORMAL 16MB-896MB
ZONE_HIGHMEM 896MB - End (1G)

Understanding of high-end memory

  The previous section mentioned that high-end memory is used to solve the problem that the kernel cannot access memory address space larger than 1G. So how exactly is it achieved? Generally speaking, it is very simple. When the kernel needs to access memory space higher than 1G, for example, when the kernel needs to access the 1MB memory space of 0x50000000-0x500FFFFF, it only needs to temporarily apply for a 1MB memory space in the area of ​​ZONE_HIGHMEM. Then map it to the memory area that needs to be accessed above. When the kernel is used up, the 1MB memory space applied for is released to complete the access to the memory space higher than 1G.

memory map (mmap)

The basic concept of mmap

  mmap is a method of memory mapping files, that is, a file or other object is mapped to the address space of the process, and the one-to-one mapping relationship between the file disk address and a virtual address in the process virtual address space is realized. After implementing such a mapping relationship, the process can use pointers to read and write this section of memory, and the system will automatically write back the dirty pages to the corresponding file disk, that is, the operation of the file is completed without calling read, write. and other system call functions. On the contrary, the modification of this area in the kernel space also directly reflects the user space, so that file sharing between different processes can be realized. As shown below:

"(Organizing) User Space_Kernel Space and Memory Mapping"

  由上图可以看出,进程的虚拟地址空间,由多个虚拟内存区域构成。虚拟内存区域是进程的虚拟地址空间中的一个同质区间,即具有同样特性的连续地址范围。上图中所示的text数据段(代码段)、初始数据段、BSS数据段、堆、栈和内存映射,都是一个独立的虚拟内存区域。而为内存映射服务的地址空间处在堆栈之间的空余部分。

  linux内核使用vm_area_struct结构来表示一个独立的虚拟内存区域,由于每个不同质的虚拟内存区域功能和内部机制都不同,因此一个进程使用多个vm_area_struct结构来分别表示不同类型的虚拟内存区域。各个vm_area_struct结构使用链表或者树形结构链接,方便进程快速访问,如下图所示:

"(Organizing) User Space_Kernel Space and Memory Mapping"

  vm_area_struct结构中包含区域起始和终止地址以及其他相关信息,同时也包含一个vm_ops指针,其内部可引出所有针对这个区域可以使用的系统调用函数。这样,进程对某一虚拟内存区域的任何操作需要用要的信息,都可以从vm_area_struct中获得。mmap函数就是要创建一个新的vm_area_struct结构,并将其与文件的物理磁盘地址相连。具体步骤请看下一节。

mmap内存映射原理

mmap内存映射的实现过程,总的来说可以分为三个阶段:

进程启动映射过程,并在虚拟地址空间中为映射创建虚拟映射区域

  • 进程在用户空间调用库函数mmap,原型:void mmap(void start, size_t length, int prot, int flags, int fd, off_t offset);
  • 在当前进程的虚拟地址空间中,寻找一段空闲的满足要求的连续的虚拟地址
  • 为此虚拟区分配一个vm_area_struct结构,接着对这个结构的各个域进行了初始化
  • 将新建的虚拟区结构(vm_area_struct)插入进程的虚拟地址区域链表或树中

调用内核空间的系统调用函数mmap(不同于用户空间函数),实现文件物理地址和进程虚拟地址的一一映射关系

  • 为映射分配了新的虚拟地址区域后,通过待映射的文件指针,在文件描述符表中找到对应的文件描述符,通过文件描述符,链接到内核“已打开文件集”中该文件的文件结构体(struct file),每个文件结构体维护着和这个已打开文件相关各项信息。
  • 通过该文件的文件结构体,链接到file_operations模块,调用内核函数mmap,其原型为:int mmap(struct file filp, struct vm_area_struct vma),不同于用户空间库函数。
  • 内核mmap函数通过虚拟文件系统inode模块定位到文件磁盘物理地址。
  • 通过remap_pfn_range函数建立页表,即实现了文件地址和虚拟地址区域的映射关系。此时,这片虚拟地址并没有任何数据关联到主存中。

进程发起对这片映射空间的访问,引发缺页异常,实现文件内容到物理内存(主存)的拷贝

注:前两个阶段仅在于创建虚拟区间并完成地址映射,但是并没有将任何文件数据的拷贝至主存。真正的文件读取是当进程发起读或写操作时。

  • 程的读或写操作访问虚拟地址空间这一段映射地址,通过查询页表,发现这一段地址并不在物理页面上。因为目前只建立了地址映射,真正的硬盘数据还没有拷贝到内存中,因此引发缺页异常。
  • 缺页异常进行一系列判断,确定无非法操作后,内核发起请求调页过程。
  • 调页过程先在交换缓存空间(swap cache)中寻找需要访问的内存页,如果没有则调用nopage函数把所缺的页从磁盘装入到主存中。
  • 之后进程即可对这片主存进行读或者写的操作,如果写操作改变了其内容,一定时间后系统会自动回写脏页面到对应磁盘地址,也即完成了写入到文件的过程。

注:修改过的脏页面并不会立即更新回文件中,而是有一段时间的延迟,可以调用msync()来强制同步, 这样所写的内容就能立即保存到文件里了。

Disclaimer: The content of the above chapter is taken from: http://www.cnblogs.com/huxiao-tee/p/4660352.html

vm_struct和vm_area_struct

  Regarding the two structures vm_struct and vm_area_struct, it is necessary to briefly explain that both vm_struct and vm_area_struct are used to represent a continuous virtual address space, but they can be discontinuous after being mapped to the physical address space. Secondly, the virtual address represented by vm_area_struct is used by the process, and the virtual address represented by vm_struct is used by the kernel. As can be seen from the above, the address of the kernel space is divided into three parts, ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM, of which the first two parts are used for one-to-one mapping with the physical address, and ZONE_HIGHMEM is managed by temporary borrowing and mapping. 1G of memory, the kernel virtual address used by vm_struct is the ZONE_HIGHMEM part of the address.


http://blog4jimmy.com/2018/01/348.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324816415&siteId=291194637