The main function of the memory management unit MMU (memory management unit) is the conversion of virtual addresses (virtual memory addresses) to physical addresses (physical addresses). In addition, it can also implement functions such as memory protection, cache control, bus arbitration, and bank switching. This article will introduce in detail the historical reasons for the emergence of the MMU as well as its role and working principle.

Background of MMU generation

In the early days of the computer, its memory resources were very limited, generally only tens of hundreds of KB, and the program size at that time was also small. For the programs at that time, KB-level memory resources were still sufficient. However, with the development of computer technology, the scale of application programs continues to expand, and a problem finally appears in front of programmers, that is, the application program is too large, and the memory can no longer accommodate such a large program.

The original solution to this problem was to divide the program into many pieces called overlays. Overlay block 0 runs first and calls another overlay block at the end to continue running.

Although the exchange of overlay blocks is done by the OS, it must be divided by the programmer first. This is a time-consuming and laborious work, and it is quite boring.

People have to find a better way to fundamentally solve this problem.

Soon people found a way, which is virtual memory (virtual memory).

Virtual Memory

The basic idea of virtual memory is:

The total size of programs, data, and stacks can exceed the size of physical memory. The operating system keeps the currently used parts in memory, and saves other unused parts on disk.

For example, for a 16MB program and a machine with only 4MB of memory, the OS can choose which 4MB to keep in memory at any time, and exchange program fragments between memory and disk when needed, so that This 16MB program runs on a machine with 4MB of RAM. And this 16M program does not need to be divided by the programmer before running.

At any time, there is a set of addresses that a program can generate on a computer, which we call an address range.

The size of this range is determined by the number of bits of the CPU, for example:

For a 32-bit CPU, its address range is 0x0 ~ 0xFFFF FFFF (4G),

And for a 64-bit CPU, its address range is 0x0 ~ 0xFFFF FFFF FFFF FFFF (64T).

This range is the address range that our program can generate. We call this address range a virtual address space, and an address in this space is called a virtual address.

Corresponding to the virtual address space and the virtual address are the physical address space and the physical address. Most of the time, the physical address space of our system is only a subset of the virtual address space. Give a simple example to illustrate intuitively The difference between the two, for a 32Bit x86 host with 256MB of memory, its virtual address space range is 0x0 ~ 0xFFFF FFFF (4G), while the physical address space range is 0x0000 0000 ~ 0x0FFF FFFF ( 256MB ).

On machines that do not use virtual addresses, virtual addresses are sent directly to the memory bus, allowing physical storage with the same address to be read from or written to. In the case of virtual storage, the virtual address is not sent directly to the memory address bus, but to the memory management unit— MMU .

The MMU is composed of one or a group of chips, and generally exists in the coprocessor, and its function is to map virtual addresses to physical addresses.

What the CPU sees is the Virtual Address (the logical address in the program)
Caches and MMU use MVA (actual virtual address MVA = (pid << 25) | VA)
The actual physical device uses Physical Address (physical address)

MMU physical composition

The memory management unit MMU is a hardware unit in the processor, usually there is one MMU for each core. MMU includes TLB (Translation Lookaside Buffer) and Table Walk Unit. In layman's terms, the TLB is the cache of the page table, which is used to cache the translation table from virtual address to physical address. The function of the Table Walk Unit is to walk the memory page table to look up the table and complete the conversion from virtual address to physical address.

The TLB is used to cache recently performed translation page tables. The TLB can store the full page table size and can split the page table into smaller units when needed.

In the MMU, the two-level TLB is divided into: Instruction L1 TLB, Data L1 TLB and L2 TLB. The L2 TLB is instruction-data shared.

When the virtual address VA sent by the program cannot find the corresponding conversion page table in the TLB, the Table Walk Unit needs to go to the system Cache or memory to fetch a new page table. Update the old page table in the TLB with the new page table fetched.

MMU working mechanism

Most systems that use virtual memory use a process called paging.

The virtual address space is divided into page units, and the corresponding physical address space is also divided into page frames.

Pages and page frames must be the same size.

Next, with the pictures, an example is used to illustrate how the page and page frame are mapped under the scheduling of the MMU:

In this example, we have a machine that can generate 16-bit addresses, its virtual address ranges from 0x0000 ~ 0xFFFF(64k), and this machine has only 32K physical addresses, so it can run a 64K program, but the program cannot be loaded into memory at one time run.

This machine must have an external memory (such as disk or Flash) that can store 64K programs to ensure that program segments can be called when needed.

In this example, the page size is 64K, and the page frame size is the same as the page (this must be guaranteed, the transfer between the memory and peripheral memory is always in units of pages), corresponding to 64K virtual address and 32K physical memory, they Contains 16 pages and 8 page frames respectively.

Execute the following commands:

MOVE REG,0// 将 0 号地址的值传递进寄存器 REG

The virtual address 0 will be sent to the MMU, and the MMU sees that the virtual address falls within the range of page 0 (the range of page 0 is 0 to 4095). From the above figure, we can see that the (mapped) page frame corresponding to page 0 is 2 (the address range of page frame 2 is 8192 to 12287).

Therefore, the MMU translates the virtual address into a physical address 8192, and sends address 8192 on the address bus.

The memory doesn't know anything about the MMU's mapping, it just sees a read request to address 8192 and executes it, and the MMU thus resolves the virtual addresses 8192 to 12287 to the corresponding physical addresses 0 to 4096.

MOVE REG , 20500
被转换为---->  MOVE REG, 12308

Because the virtual address 20500 is 20 bytes away from the beginning of virtual page 5 (the virtual address range is 20480 to 24575), virtual page 5 is mapped to page frame 3 (the address range of page frame 3 is 12288 to 16383), so it is mapped to physical address 12288 + 20 = 12308.

MOV REG , 32780

The virtual address 32780 falls within the range of page 8. From the above figure, we can see that page 8 has not been effectively mapped (the page is marked with X), what will happen at this time?

The MMU notices that this page is not mapped, so it notifies the CPU that a page fault has occurred. In this case, the operating system must deal with the page fault. It must find one of the 8 physical page frames that is rarely used. Use the page frame, and write the contents of the page frame to the peripheral memory (this action is called page copy), and then map the page that needs to be referenced (page 8 in this example) to the page frame that was just released (this The action is called modifying the mapping relationship), and then re-executing the faulting instruction ( MOV REG, 32780).

Assume that the operating system decides to free page frame 1 so that any subsequent access to virtual addresses 4K to 8K will cause a fault and cause the operating system to take appropriate action.

Secondly, it changes the page frame number corresponding to virtual page 8 from X to 1, so MOV REG, 32780 must be newly executed, and MMU maps 32780 to 4180.

We already know that most systems that use virtual memory use a technique called paging. Just like the example we just gave, the virtual address space is divided into a set of pages of the same size, and each page has A page number used to mark it (this page number is generally its index in the group, which is similar to arrays in C/C++).

In the above example, the page number of 0~4K is 0, the page number of 4~8K is 1, the page number of 8~12K is 2, and so on.

The virtual address (note: it is a definite address, not a space) is divided into two parts by the MMU, the first part is the page index (page Index), and the second part is the offset relative to the address of the page head ( offset).

We still use the 16-bit machine just now to illustrate an example with the following figure. In this example, the virtual address 8196 is sent to the MMU, and the MMU maps it into a physical address. The address range that a 16-bit CPU can generate in total is 0~64K. According to the size of each page of 4K, the space must be divided into 16 pages. And the range that can be expressed by the first part of our virtual address must also be equal to 16 (so that every page in the page group can be indexed), that is to say, this part needs at least 4 bits.

The page number index of this address is 0010(binary code), that is, the indexed page is page 2, the second part is 000000000100(binary code), and the offset is 4.

The page frame number in page 2 is 6 (page 2 is mapped to page frame 6, see the figure above), and we see that the physical address of page frame 6 is 24~28K. So the MMU calculates that the virtual address 8196 should be mapped to the physical address 24580 (page frame header address + offset = 24576 + 4 = 24580).

Similarly, if we read the virtual address 1026, the binary code of 1026 is 0000010000000010, page index="0000"=0,offset=010000000010=1026.

The page number is 0, the page frame number mapped to this page is 2, and the physical address range of page frame 2 is 8192~12287, so the MMU maps the virtual address 1026 to the physical address 9218 (page frame head address + offset = 8192+ 1026=9218).

The above is the working process of the MMU. In one sentence, the CPU will issue a virtual address at any time. After the virtual address is sent to the MMU, the MMU will query the physical address corresponding to the virtual address through the page table, and send the physical address to the memory bus. This process may be referred to as virtual address to physical address mapping.

virtual memory management

Modern operating systems generally adopt a virtual memory management (Virtual Memory Management) mechanism, which requires the support of the MMU (Memory Management Unit, memory management unit) in the processor.

First introduce two concepts, virtual address and physical address.

If the processor does not have an MMU, or has an MMU but is not enabled, the memory address issued by the CPU execution unit will be directly transmitted to the chip pins and received by the physical memory chip, which is called a physical address.
If the processor has MMU enabled, the memory address issued by the CPU execution unit will be intercepted by the MMU. The address from the CPU to the MMU is called a virtual address, and the MMU translates this address into another address and sends it to the external address pin of the CPU chip. Above, that is, VA is mapped to PA.

If it is a 32-bit processor, the memory address bus is 32 bits and connected to the CPU execution unit, but the external address bus converted by the MMU is not necessarily 32 bits.

That is to say, the virtual address space and the physical address space are independent, the virtual address space of a 32-bit processor is 4GB, and the physical address space can be larger or smaller than 4G.

The MMU maps VA to PA in units of pages, and the page size of a 32-bit processor is usually 4KB.

For example:

0xB7001000 - 0xB7001FFFFThe MMU can map a page of the VA to a page of the PA through a mapping entry 0x2000 ~ 0x2FFF.

If the CPU execution unit wants to access the virtual address 0xB7001008, the actually accessed physical address is 0x2008.

A page in physical memory is called a physical page frame (page frame). Which page of virtual memory is mapped to which page frame of physical memory is described by a page table (Page Table). The page table is stored in physical memory, and the MMU will Look up the page table to determine what PA a VA should map to.

The cooperation between the operating system and the MMU is like this: when the operating system initializes or allocates and releases memory, it will execute some instructions to fill in the page table in the physical memory, and then set the MMU with instructions to tell the MMU where the page table is in the physical memory.

After setting, every time the CPU executes an instruction to access the memory, it will automatically trigger the MMU to do table lookup and address translation operations. The address translation operation is automatically completed by the hardware, and there is no need to use instructions to control the MMU to do it.

The variables and functions we use in the program have their own addresses. After the program is compiled, these addresses become the addresses in the instructions. The addresses in the instructions are interpreted and executed by the CPU and become the memory addresses issued by the execution unit of the CPU. , so when the MMU is enabled, the addresses used in the program are all virtual addresses, which will cause the MMU to perform table lookup and address conversion operations.

Then why design such a complicated memory management mechanism? What are the benefits of adding an extra layer of VA to PA conversion?

In addition to address translation, the MMU also provides a memory protection mechanism. Various architectures are divided into User Mode and Privileged Mode. The operating system can set the access of each memory page in the page table. Permissions, some pages are not allowed to be accessed, some pages are only allowed to be accessed when the CPU is in privileged mode, some pages can be accessed in both user mode and privileged mode, and access rights are divided into readable, writable and executable.

After this setting, when the CPU wants to access a VA, the MMU will check whether the CPU is currently in the user mode or the privileged mode, and whether the purpose of accessing the memory is to read data, write data, or fetch instructions. If it matches, the access is allowed and converted into PA; if the access is not allowed, an exception (Exception) is generated.

The exception handling process is similar to the interrupt. The difference is that the interrupt is generated by the external device and the exception is generated by the CPU. The cause of the interrupt has nothing to do with the instruction currently executed by the CPU, and the exception is caused by a problem with the instruction currently executed by the CPU. For example , the instruction to access the memory is detected by the MMU as a permission error, and the divisor of the division instruction is 0, which will cause an exception.

User Space and Kernel Space

Usually the operating system divides the virtual address into user space and kernel space. For example, the Linux system virtual address space of the X86 platform is 0x00000000 - 0xFFFFFFFF, the first 3GB ( 0x00000000 - 0xBFFFFFFF) is the user space, and the last 1GB ( 0xC0000000 - 0xFFFFFFFF) is the kernel space.

The user program is loaded into the user space and executed in user mode. It cannot access the data in the kernel, nor can it jump to the kernel code for execution.

This can protect the kernel. If a process accesses an illegal address, at most this process will crash without affecting the stability of the kernel and the entire system.

When the CPU generates interrupts and exceptions, it will not only jump to the interrupt or exception service routine, but also automatically switch modes from user mode to privileged mode, so the interrupt or exception service routine can jump to the kernel code for execution.

In fact, the entire kernel is composed of various interrupt and exception handlers.

To sum up:

Under normal circumstances, the processor executes the user program in the user mode. In the event of an interrupt or exception, the processor switches to the privileged mode to execute the kernel program. After the interrupt or exception is processed, the processor returns to the user mode to continue executing the user program.

segment fault

Segmentation fault is generated in this way: the user program wants to access a VA, but the MMU checks that it has no right to access, the MMU generates an exception, the CPU switches from the user mode to the privileged mode, and jumps to the kernel code to execute the exception service program. The kernel interprets this exception as a segment fault and terminates the process that sent the exception.

Linux kernel--memory management