In-depth understanding of the working principle of Cache

The main content of this article is as follows, which basically involves the concept of Cache, its working principle, and the introductory content of maintaining consistency.

1. Why do you need Cache

1.1 Why do you need Cache

Let's start with a picture to talk about why Cache is needed.

The graph above is the development of CPU performance and Memory memory access performance.

We can see that with the evolution of technology and design, CPU computing performance has actually undergone earth-shaking changes, but the development of DRAM storage performance is not so fast.

So it creates a problem that storage limits the development of computing.

Capacity and speed are not compatible.

How to solve this problem? You can start by calculating the laws of accessing data.

Let's just paste a piece of code:

for (j = 0; j < 100; j = j + 1)
    for( i = 0; i < 5000; i = i + 1)
        x[i][j] = 2 * x[i][j];

It can be seen that due to the existence of a large number of cycles, the data we access is actually located in the same memory.

In other words, professionally speaking, the data we access has locality.

We just need to put this data into a small and fast storage so that related data can be accessed quickly.

To sum up, Cache is a small storage unit designed to provide high-speed storage access to the CPU and take advantage of data locality.

1.2 Cache in the actual system

Let's show the Cache in the actual system.

As shown in the figure above, the storage architecture of the entire system includes CPU registers, L1/L2/L3 CACHE, DRAM and hard disk.

When accessing data, look for the register first, if there is no L1 Cache in the register, there is no L2 Cache in the L1 Cache, and so on, and finally find the hard disk.

At the same time, we can see the trade-off between speed and storage capacity. The smaller the capacity, the faster the access speed!

Among them, a concept needs to be clarified.

The CPU and Cache are transferred by word, while the transfer from Cache to main memory is by block, a block is about 64Byte.

The general composition of the Cache in the existing SOC is as follows.

1.3 Cache classification

Cache can be divided into several categories according to different standards.

  • Divided by data type: I-Cache and D-Cache. Among them, I-Cache is responsible for placing instructions, and D-Cache is responsible for mode data. The biggest difference between the two is that the data in D-Cache can be written back, while I-Cache is read-only.

  • Divided by size: divided into small Cache and large Cache. The no-way group (the group will be introduced later) <4KB is called small Cache, which is mostly used for L1 Cache, and larger than 4KB is called large Cache. Mostly used for L2 and other Cache.

  • Divided by location: Inner Cache and Outer Cache. Generally, the Inner Cache belongs to the CPU microarchitecture, such as the L1 L2 CACHE in the above figure. Those that do not belong to the CPU microarchitecture are called outer Cache.

  • Divided according to data relationship: Inclusive/exclusive Cache: The lower-level Cache contains the upper-level data called inclusive Cache. Does not contain exclusive Cache. For example, if there is data in L2 Cache in L3 Cache, then L2 Cache is called exclusive Cache.

2. The working principle of Cache

To clarify the working principle of Cache, you need to answer 4 questions:

  • How the data is placed

  • How to query data

  • How data is replaced

  • If a write operation occurs, how does the Cache handle it

2.1 How the data is placed

This problem is also easy to solve. Let's take a simple chestnut to illustrate the problem.

Suppose we have 32 blocks in the main memory, and our Cache has a total of 8 Cache lines (one Cache line holds one line of data).

Suppose we want to put block 12 in the main memory into the Cache.

So where should it be placed in the Cache?

three methods:

  • Fully associative. Can be placed anywhere in the Cache.

  • Direct mapped. It is only allowed to be placed in a certain line of the Cache. Such as 12 mod 8

  • Group connected (set associative). It can be placed in certain lines of the Cache. For example, 2-way groups are connected, and there are 4 groups in total, so it can be placed in one of the 0 and 1 positions.

It can be seen that fully associative and direct mapping are two extreme cases of Cache set associative.

There are two main impacts of different placement methods:

1. The larger the number of group-associated groups, the larger the comparison circuit, but the Cache utilization rate is higher, and the probability of Cache miss occurrence is small. 2. The number of group connections becomes smaller, and the Cache is often replaced, but the comparison circuit is relatively small.

This is also easy to understand. There are many places where blocks in memory can be placed in the Cache, so it is naturally troublesome to find them.

2.2 How to find data in the Cache

In fact, finding data is a comparison process, as shown in the figure below.

Our addresses are all in Byte.

However, the unit of data exchange between main caches is a block (block, a block of a modern cache is generally about 64Byte). So the address is the block offset for the last few bits.

Since we use group connection, there are still a few bits that represent which group it is stored in.

There are some data in the group, and we need to compare the Tags. If there are Tags in the group, it means that the data we are accessing is in the cache and can be used happily.

For example, give an example of 2-way group connection, as shown in the figure below.

T stands for Tag. By directly comparing the Tag, you can know whether it is a hit or not. If it hits, just take out the corresponding block according to the index (group number).

As shown in FIG. Use index to select which group is located in the group connected. Then compare the Tags in parallel to determine whether they are in the Cache at the end. The above picture is a 2-way group connection, that is to say, two groups are compared in parallel.

What if it is not in the cache? This involves another issue.

How to replace Cache if it is not in the cache?

2.3 How to replace the data in the Cache

How is the data in the cache replaced? This is relatively simple and straightforward.

  • Replace randomly. If a Cache miss occurs, a block is randomly replaced.

  • Least recently used. LRU. Most recently used blocks are replaced last.

  • First in, first out (FIFO), first in first out.

In fact, the first one is not used very much, LRU and FIFO can be selected according to the actual situation.

When will the data be replaced in the Cache? There are also several strategies.

  • Not replaced in this Cache . If the Cache misses, the access address is directly forwarded to the main memory, and the fetched data will not be written to the Cache.

  • Replace when reading MISS . If the data does not exist in the Cache when reading, the data is read from the main memory and written into the Cache.

  • Substitute when writing MISS . If there is no such data in the Cache when writing, the data will be loaded into the Cache and then written.

 Information through train: Linux kernel source code technology learning route + video tutorial kernel source code

Learning through train: Linux kernel source code memory tuning file system process management device driver/network protocol stack

2.4 What to do if a write operation occurs

Cache is a temporary cache after all.

If a write operation occurs, the data in the Cache and main memory will be inconsistent. How to ensure that the write data operation is correct?

There are also three strategies.

  • Write through : directly write data back to the Cache and write back to the main memory at the same time. It greatly affects the writing speed.

  • Write-back : first write the data back to the Cache, and then write it back to the main memory when the data in the Cache is replaced.

  • Write-through queue : a combination of write-through and write-back. First write back to a queue, and then slowly write to the main storage. If you write the same data multiple times, write directly to this queue. Avoid frequent writes to main memory.

3. Cache consistency

Cache consistency is a problem encountered in Cache.

Why do you need Cache to handle consistency?

Mainly in a multi-core system, if core 0 reads the data in the main storage and writes the data. Core 1 also reads the master-slave data. At this time, core 1 does not know that the data has been changed, that is to say, the data in core 1 Cache is outdated, and an error will occur.

The guarantee of Cache consistency is to make multi-core access error-free.

There are two main strategies for cache consistency.

Strategy 1: Consistency strategy based on monitoring

This strategy is that all caches monitor the write operations of each cache. If the data in a cache is written, there are two ways to deal with it.

Write update protocol: when a cache is written, all caches are simply updated.

Write invalidation protocol: When a cache is written, the data block in other caches is invalidated.

Strategy 1 is only used in very simple systems because of the high cost of monitoring.

Strategy 2: directory-based consistency strategy

This strategy is to maintain a table in main memory. Record which Cache each data block has been written to, so as to update the corresponding state. Generally speaking, this strategy is used more often. It is further divided into the following common strategies.

  • SI : For a data block, there are two states: share and invalid. If it is in the share state, directly notify other Cache to invalidate the corresponding block.

  • MSI : For a data block, there are three states: share, invalid, and modified. Among them, the modified status table indicates that the data only belongs to this Cache and has been modified. Update main memory when this data is evicted from Cache. The advantage of doing this is to avoid a large number of master-slave writes. At the same time, if the data is written when it is invalid, it is necessary to ensure that the flag bit of the data in all other caches is not M, and it is responsible to write it back to the main storage first.

  • MESI : For a piece of data, there are 4 states. modified, invalid, shared, exclusive. The exclusive state is used to identify that the data does not depend on other caches. When you want to write, just change the Cache status to M directly.

We focus on MESI. Black line in the figure: CPU access. Red line: bus access, other Cache access.

When the current state is the I state, if a processor read operation prrd occurs.

  • If there is this data in other Cache, if there is M state in other Cache, first write the M state back to the main memory and then read it. Otherwise read directly. The final state becomes S.

  • If there is no such data in other caches, it directly changes to the E state.

The current state is S state.

  • If a processor read operation occurs, it is still in the S state.

  • If a processor write has occurred, then jump to the M state.

  • If other caches have write operations, jump to the I state.

Current state E state

  • A processor read has occurred or E.

  • A processor write to M occurred.

  • If other caches have read operations, they change to the S state.

Current state M state

  • When a read operation occurs, it is still in the M state.

  • If a write operation occurs, it is still in the M state.

  • If other caches have read operations, write the data back to the main storage and change to the S state.

4. Summary

Cache has a very important position in computer architecture. This article talks about the main content of Cache, and the specific details can be further studied according to a certain point.

Guess you like

Origin blog.csdn.net/youzhangjing_/article/details/131555709