Linux virtual memory

Operating system CPU and main memory (Main memory) are scarce resources, all running in the current operating system process will share the system CPU and memory resources, the operating system will use the CPU scheduler allocates CPU time is 1 and the introduction of virtual memory The system manages physical memory. This article will analyze why the operating system needs virtual memory.

Before answering the necessity of virtual memory, we need to understand what virtual memory in the operating system is and what role it plays in the operating system. Just like other abstractions in software engineering, virtual memory is the middle layer between the physical memory of the operating system and the process . It hides the concept of physical memory for the process and provides a more concise and easy-to-use interface and more complex functions for the process. .

virtual-memory-layer

Figure 1-Process and the middle layer of the operating system

If we need to design an operating system from scratch, it should be a very natural decision to allow the processes in the system to directly access the physical address in the main memory. Early operating systems did implement this way. The process will use the physical address of the target memory ( Physical Address) directly accesses the contents of the memory. However, modern operating systems have introduced virtual memory. The virtual address held by the process will be converted to physical address 2 by the memory management unit (Memory Mangament Unit) , and then Then access the memory through the physical address:

virtual-memory-system

Figure 2-Virtual Memory System

Primary storage is a relatively scarce resource. Although sequential reading is only an order of magnitude faster than disk, it can provide extremely fast random access speed. Random reading of data from memory is 100,000 times that of disk 3 , making full use of memory. Random access speed is an effective way to improve the efficiency of program execution.

The operating system manages memory in units of pages. When the process finds that the data that needs to be accessed is not in the memory, the operating system may load the data into the memory in pages. This process is completed by the memory management unit (MMU) in the above figure. of. As an abstraction layer, the virtual memory of the operating system plays the following three very key roles:

Virtual memory can use memory to act as a cache to improve the speed of processes accessing disks;
Virtual memory can provide independent memory space for processes, simplify the process of program linking and loading, and share memory through dynamic libraries;
Virtual memory can control process access to physical memory, isolate access rights of different processes, and improve system security;

Cache

We can think of virtual memory as a piece of space on the disk. When a part of this space is accessed frequently, this part of the data will be cached in the main memory in units of pages to accelerate the performance of the CPU to access data. Virtual memory Use larger disk storage as "memory" and use the main storage cache for acceleration, so that the upper layer thinks that the operating system's memory is large and fast, but the disk with a large area is not fast, and the fast memory is not large. .

virtual-memory-cache

Figure 3-Virtual memory, main memory and disk

Virtual Pages (PP) in virtual memory may be in the following three states-Unallocated (Unallocated), Uncached (Uncached) and Cached (Cached), among which unallocated memory pages are not requested by the process. Yes, that is, free virtual memory, which does not occupy any space on the virtual memory disk. Uncached and cached memory pages respectively represent memory pages that are only loaded on disk and memory pages that have been loaded into main memory. As shown in the figure above, the green virtual memory page in the figure is supported by the physical memory page (PP) in the main memory, so it has been cached, while the yellow virtual memory page is only in the disk, so it is not Physical memory cache.

When the user program accesses a virtual page that is not cached, the hardware will trigger a page fault interrupt (Page Fault, PF). In some cases, the accessed page has been loaded into the physical memory, but the page table of the user program (Page Table) does not exist the corresponding relationship, then we only need to establish the relationship between virtual memory and physical memory in the page table; in other cases, the operating system needs to load the virtual page that is not cached on the disk into the physical memory 4 .

page-fault

Figure 4-Page fault interrupt of virtual memory

Because the space of the main memory is limited, when the main memory does not contain usable space, the operating system will evict the appropriate physical memory page back to the disk to make room for the new memory page. The process of selecting the page to be eviction The operating system is called Page Replacement. Page fault interruption and page replacement technologies are both part of the operating system's paging algorithm (Paging). The purpose of this algorithm is to make full use of memory resources as a disk cache to improve program efficiency.

Memory management

Virtual memory can provide independent memory space for running processes, creating an illusion that the memory of each process is independent. On a 64-bit operating system, each process will have 256 TiB of memory space, kernel space and accounting of TiB user space 128 . 5 , 57 parts of the operating system using a virtual address to provide address space 128 PiB 6 . Because the virtual memory space of each process is completely independent, they can use all the memory from 0x0000000000000000 to 0x00007FFFFFFFFFFF.

virtual-memory-space

Figure 5-Virtual memory space of the operating system

The virtual memory space is just a logical structure in the operating system. As we said above, the application ultimately needs to access the contents of the physical memory or disk. Because the operating system adds an intermediate layer of virtual memory, we also need to implement an address translator for the process to realize the conversion from virtual address to physical address. The page table is an important data structure in the virtual memory system. The page of each process The table stores the mapping relationship from virtual memory to physical memory page. In order to store the mapping data of 128 TiB virtual memory in 64-bit operating system, Linux introduced a four-layer page table to assist virtual address conversion in 2.6.10. 7 the introduction of a five-page table structures 4.11 8 , in the future may also be introduced into the page table structure more layers to support 64-bit virtual address.

four-level-page-tables

Figure 6-Four-level page table structure

In the four-level page table structure shown in the figure above, the operating system will use the lowest 12 bits as the page offset, and the remaining 36 bits will be divided into four groups to represent the index of the current level in the upper level. All virtual addresses can be found using the above-mentioned multi-layer page table to find the corresponding physical address.

Because there are multiple layers of page table structure that can be used to translate virtual addresses, multiple processes can share physical memory through virtual memory. We Why Redis snapshots using the child copy-on-write a paper presented on the use of this feature virtual memory, when we call in Linux fork when you create a child process, in fact, just copy the parent's page tables . As shown in the figure below, the parent and child processes will point to the same physical memory through different page tables:

process-shared-memory

Figure 7-Shared memory between processes

Virtual memory can not only be fork used to share the physical memory of the process at time, and provide a copy-on-write mechanism, but also share some common dynamic libraries to reduce the physical memory occupation. All processes may call the same operating system kernel code, and C Language programs also call the same standard library.

In addition to sharing memory, independent virtual memory space will also simplify the memory allocation process. When a user program requests heap memory from the operating system, the operating system can allocate several consecutive virtual pages, but these virtual pages can correspond to physical Discontinuous pages in memory.

Memory protection

The user program in the operating system should not modify the read-only code segment, nor should it read or modify the code and data structure in the kernel or access the memory of private and other processes. If the memory access of the user process cannot be restricted , The attacker can access and modify the memory of other processes to affect the security of the system.

If each process holds an independent virtual memory space, then the page table in the virtual memory can be understood as a "connection table" between the process and the physical page, which can store the access relationship between the process and the physical page, including read permissions, write Authority and execution authority:

virtual-memory-permission

Figure 8-Read permission, write permission and execute permission

The memory management unit can determine whether the current process has permission to access the target's physical memory, so that we finally converge all the permissions management functions into the virtual memory system, reducing the code path that may appear risky.

to sum up

The design method of virtual memory can be said to be a common method in software engineering. By combining the advantages of disk and memory, the middle layer is used to more rationally schedule resources to fully improve resource utilization and provide a harmonious and unified abstraction . In actual business scenarios, similar caching logic is also common.

The virtual memory of the operating system is a very complex component. No engineer can understand all the details. However, it is also very valuable to understand the overall design of the virtual memory. We can find many software design methods from it. Let's return to today's question-why is virtual memory needed in the Linux operating system:

Virtual memory can combine the advantages of disk and physical memory to provide storage that seems fast enough and large enough for the process;
Virtual memory can provide independent memory space for processes and introduce a multi-layer page table structure to translate virtual memory into physical memory. Physical memory can be shared between processes to reduce overhead, and it can also simplify program linking, loading, and memory allocation processes;
Virtual memory can control process access to physical memory, isolate access rights of different processes, and improve system security;

In the end, let’s take a look at some more open related issues. Interested readers can carefully consider the following issues:

Why is the page table structure of each layer only responsible for 9-bit virtual address addressing?
How many levels of page table structure does 64-bit virtual memory need in the operating system to be addressed?

If you have any questions about the content of the article or want to know more about the reasons behind some design decisions in software engineering, you can leave a message below the blog. The author will reply to the questions related to this article in time and choose the appropriate topic as the follow-up content.