In-depth understanding of the principle of caching TLB

Today I will share a good article about TLB. I hope that everyone will solidify their basic skills and let us understand the computer system in depth.

TLB is the abbreviation of translation lookaside buffer. First of all, we know that the role of the MMU is to convert virtual addresses into physical addresses.

How the MMU works

The mapping relationship between virtual address and physical address is stored in the page table, and now the page table is hierarchical. 64-bit systems are generally 3 to 5 levels. A common configuration is a 4-level page table, so let's take a 4-level page table as an example. They are PGD, PUD, PMD, and PTE four-level page tables. There will be a page table base address register on the hardware, which stores the first address of the PGD page table.

Linux paging mechanism

The MMU searches all the way from the PGD page table to the PTE according to the page table base address register, and finally finds the physical address (the physical address is stored in the PTE page table). It’s like showing where your home is on a map. In order to find your home address, I first make sure you are in China, then make sure you are in a certain province, continue down to a certain city, and finally find your home. It’s the same principle. Find it step by step. You have also seen this process, it is very cumbersome. If I find out the specific location of your home for the first time, I will write down your name and your home address. The next time you search, do you just need to tell me what your name is, and I will be able to tell you the address directly, without having to search level by level.

The four-level page table lookup process requires four memory accesses. The delay can be imagined, which greatly affects the performance. An example of the page table lookup process is shown in the figure below. There will be a chance to expand in detail in the future, just learn about it here.

page table walk

What is the nature of TLB

The TLB is actually a cache.

Data cache cache address (virtual address or physical address) and data. The TLB caches virtual addresses and their mapped physical addresses. The TLB looks up the cache based on the virtual address. It has no choice but to look up based on the virtual address.

So the TLB is a virtual cache. After the TLB exists in the hardware, the translation process from virtual address to physical address has changed. The virtual address is first sent to the TLB to confirm whether it hits the cache, and if the cache hits, the physical address can be obtained directly.

Otherwise, look up the page table level by level to obtain the physical address. And cache the mapping relationship between the virtual address and the physical address in the TLB. Since the TLB is a virtual cache (VIVT), are there aliasing and ambiguity issues? If so, how do software and hardware work together to solve these problems?

TLB special

The smallest unit of virtual address mapping physical address is 4KB. So the TLB does not actually need to store the lower 12 bits of the virtual address and the physical address (because the lower 12 bits are the same, there is no need to store them at all).

In addition, if we hit the cache, we must take out the entire data from the cache at one time. So the virtual address does not need the offset field. Is the index field required? It depends on how the cache is organized.

If it is a fully associative cache. Then there is no need for index. If using a multi-way set-associated cache, an index is still required.

The figure below is an example of a four-way set-associated TLB. Today's 64-bit CPU addressing range is not extended to 64 bits. The 64-bit address space is very large, and it is not that large today.

Therefore, in order to simplify the design or solve the cost of the hardware, only a part of the actual virtual address bits are used. Here we take the 48-bit address bus as an example.

Aliasing problem with TLB

Let me think about the first question first, whether the alias exists. We know that there is no alias problem in the data cache of PIPT. The physical address is unique, and a physical address must correspond to a data. But different physical addresses may store the same data.

That is to say, the data corresponding to the physical address is a one-to-one relationship, and vice versa is a many-to-one relationship. Due to the particularity of the TLB, the correspondence between virtual addresses and physical addresses is stored.

Therefore, for a single process, one virtual address corresponds to one physical address at the same time, and one physical address can be mapped by multiple virtual addresses.

Comparing PIPT data cache to TLB, we can know that there is no alias problem in TLB. However, there is an alias problem in VIVT Cache, because VA needs to be converted into PA, and data is stored in PA. There are many stories in the middle, so some problems have been introduced.

TLB ambiguity problem

We know that the virtual address range seen between different processes is the same, so under multiple processes, the same virtual address of different processes can map to different physical addresses. This creates ambiguity problems.

For example, process A maps address 0x2000 to physical address 0x4000. Process B maps address 0x2000 to physical address 0x5000. When process A executes, cache the mapping relationship between 0x2000 and 0x4000 in TLB. When process B is switched, process B accesses the data at 0x2000, and it will fetch data from physical address 0x4000 due to a TLB hit.

This creates ambiguity. How to eliminate this ambiguity, we can learn from the processing method of VIVT data cache and invalidate the entire TLB during process switching. None of the switched processes will hit the TLB, but will result in performance loss.

  Information through train: Linux kernel source code technology learning route + video tutorial kernel source code

Learning through train: Linux kernel source code memory tuning file system process management device driver/network protocol stack

How to avoid flush TLB as much as possible

The first thing that needs to be explained is that the flush here is understood as invalidating. We know that when the process is switched, in order to avoid ambiguity, we need to actively flush the entire TLB. Flush TLB can be avoided if we can distinguish TLB entries of different processes.

We know how Linux distinguishes different processes, and each process has a unique process ID. It would be great if the TLB compares the process ID in addition to the tag when judging whether it is hit or not! In this way, TLB entries of different processes can be distinguished.

Although process A and process B have the same virtual addresses, but process IDs are different, naturally process B will not hit process A's TLB entry. Therefore, the TLB adds an ASID (Address Space ID) match.

ASID is similar to process ID, which is used to distinguish TLB entries of different processes. In this way, there is no need to flush the TLB when the process is switched. But software is still required to manage and assign ASIDs.

How to manage ASIDs

ASID and process ID are definitely not the same, don't confuse the two. The process ID can take a wide range of values. But ASID is generally 8 or 16 bits. So only 256 or 65536 processes can be distinguished. Our example is illustrated with an 8-bit ASID.

Therefore, it is impossible for us to have a one-to-one correspondence between the process ID and the ASID. We must assign an ASID to each process, and the process ID and the ASID of each process are generally not equal.

Every time a new process is created, it is assigned a new ASID. After the ASID is allocated, flush all TLBs and reallocate the ASID.

Therefore, if you want to avoid flush TLB completely, ideally, the number of running processes must be less than or equal to 256. However, this is not the case, so a combination of software and hardware is required to manage ASIDs.

In order to manage each process, the Linux kernel will have a task_struct structure, where we can store the ASID assigned to the current process. The page table base address register has free bits that can also be used to store the ASID. When the process is switched, the page table base address and ASID (which can be obtained from task_struct) can be stored together in the page table base address register.

When looking up the TLB, the hardware can compare the tag and the ASID for equality (compare the ASID stored in the page table base address register with the ASID stored in the TLB entry). If they are all equal, it means TLB hit. Otherwise TLB miss. When the TLB miss, it is necessary to traverse the page table in multiple levels to find the physical address. Then it is cached in the TLB, and the current ASID is cached at the same time.

shared by multiple processes

We know that kernel space and user space are separate, and kernel space is shared by all processes. Since the kernel space is shared, when process A switches process B, if the address accessed by process B is in the kernel space, the TLB cached by process A can be used. But now because the ASID is different, it leads to TLB miss.

Our global shared mapping for the kernel space is called global mapping. Mappings for each process are called non-global mappings.

Therefore, we introduce a bit (non-global (nG) bit) in the last level of page table to represent whether it is a global mapping. When the virtual address mapping physical address relationship is cached to the TLB, the nG bit is also stored.

When judging whether to hit the TLB, when comparing the tags are equal, then judge whether it is a global mapping, if so, directly judge the TLB hit without comparing the ASID. When it is not a global mapping, the ASID is finally compared to determine whether the TLB hit.

When should the TLB be flushed

Let's come to the final summary, when should flush TLB.

  • When the ASIDs are allocated, all TLBs need to be flushed. ASIDs can be managed using bitmaps, and the entire bitmaps should be cleared after flushing the TLBs.
  • When we create a page table mapping, we need to flush the TLB entry corresponding to the virtual address. The first impression may be that the flush TLB is only required when the page table mapping is modified, but the actual situation is that the flush TLB is required as long as the mapping is established. The reason is that when you create a mapping, you don't know whether there is a mapping before. For example, if you create a mapping from virtual address A to physical address B, we don't know whether there is a mapping from virtual address A to physical address C before, so it is unified in Flush TLB when establishing a mapping relationship.

Author of the original text: [ Learn Embedded Together

 

Guess you like

Origin blog.csdn.net/youzhangjing_/article/details/132088646