Detailed Explanation of Cache Consistency Protocol MESI

CPU cache structure

We know that the computing speed of the CPU is very fast, because the speed of data obtained from the disk seriously affects the efficiency, so there is DRAM (memory), but even so, the performance of the memory is far behind the operation of the CPU Speed, so CPU designers began to add SRAM (cache) internally to solve the mismatch between CPU computing speed and memory IO.

As a cache, its size and cost make the cache space limited. We put the data we will use closer to the CPU, so that we don't need to fetch it from the memory every time we fetch it. Generally, the CPU provides us with three levels of cache, generally called L1, L2, and L3. The L1 layer is divided into two blocks: L1P-used to store programs (program), and L1D-used to store data (data). L1 and L2 are exclusive to a single core (there are also some sayings that L2 will be shared by two adjacent cores, but I didn’t pay attention to it carefully), while L3 is shared by multiple cores in a single CPU (after all, a rich computer may have 2 CPUs) . When the CPU acquires data, it will first search from L1, and if it cannot find it, it will go to L2, L3, and DRAM in order to find it.

image.png

image-20210827003905479.png

Cache coherency issues

Because the CPU has multiple cores, and each core has its own independent workspace, when both cores need to process data in the same memory, they will copy the data from the memory to their own cache. Let's call it working memory first, so if the threads corresponding to the two cores need to modify this value, then there will be problems. After the processing is completed, we still have to write the data back to the cache. , but because these two cores are independent of each other, if they both write back, one will definitely be covered by the other (and a series of related problems caused by concurrent reading and writing).

Solution

Here are some solutions that can be used to deal with the above cache coherency issues:

share a cache

This method is similar to solving a multi-threading problem through synchronization. If a cache is shared, when a core is using the data, other cores will be blocked and wait, which will also reduce CPU performance. efficiency.

The way to add a lock on the bus (bus)

This is to allow DRAM data to be loaded into its own cache by only one core without modifying the corresponding hardware structure. From the working principle of the previous CPU, we can probably know that the CPU obtains data from the RAM basically by connecting the CPU and the RAM through the bus to communicate. This method is as bad as above. Locking the bus will prevent other CPUs that do not want to do it from accessing RAM, which is equivalent to a pessimistic lock, and the efficiency is relatively low.

Cache coherency protocol method MESI

By locking the cache line to ensure the consistency of multi-core CPU and memory read and write. The scheme is similar to the way of read-write lock, so that the read memory operation for the same address is concurrent, and the write memory operation for the same address is exclusive.

MONTHS

MESI is the first letter of the four words Modified, Exclusive, Shared, and Invalid. These 4 letters represent 4 states respectively:

state describe monitoring task
Modified The cache line is valid, but the cache data has been modified by the current core , which is inconsistent with the data in DRAM at this time. We set it to M, and all other cache lines in the core will be set to I. Monitor all operations on the bus to write the cache line back to DRAM (do not want others to write), and you need to delay the operation until you write the cache line back to the main memory and become S state.
Exclusive The cache line is valid, and the data is consistent with the data in RAM. The data only exists in the current kernel working memory, and only it is used exclusively. Monitor all operations on the bus to read the cache line from the DRAM. Once there is a read, the state needs to be set to the S state.
Shared The cache line is valid, but the current cache line is in multiple cores , and it is the same in everyone and in DRAM . Listen to the events of setting the cache to I or E in other caches, and set the state to I state.
Invalid Indicates that the cache line is invalid . If you want to get data, go to DRAM to load the latest . No monitoring is required.

For the time being, there is no need to understand the content written after the description. We must first understand two concepts:

cache line

The smallest storage unit in the CPU cache is called a cache line (cache line), generally 64B in size. We need to save the above 4 states in the cache line, so there are only 2 Bits in a cache line to store the state (the flag flag in the figure below, the tag is used to locate the cache line position).

image.png

monitor

And above we can know that when a core modifies its own cache line, it needs to synchronize to other cores and update their status. Therefore, each cache controller in MESI not only needs to know its own operations, but also monitors the operations of other caches .

We can summarize the operation of each core of the CPU on the cache as four operations:

  • local read: CPU core reads its own local cache
  • local write: The CPU core writes to its own local cache
  • remote read: Other CPU cores read the cache line of the current core in DRAM
  • remote write: other CPU cores write to the cache line of the current core in DRAM

The cache in the CPU core will listen to these events to modify the Flag flag in the cache line of its own cache. Then use this flag to determine how the CPU handles this cached data.

MESI state transition diagram

The above four states of MESI, after listening to those four operations, will change the state according to the type of operation:

image.pngHere we use the two cores CA (coreA) and CB (coreB) of the CPU and the cache line data X to explain the transition of the above picture, when X exists in CA:

Status is M (Modified): At this time, only X exists inside the CA, and the X value of X and RAM are inconsistent.

event Behavior next state
local read Read directly from the cache of the CA, the state does not change M
local write Directly modify the cache data of the CA without changing the state M
remote read CB needs the latest data, writes the X value of CA back to RAM, CB reads X from RAM again, and the cache line flags of CA and CB are set to S S
remote write First write the X value of CA back to RAM, CB reads X and modifies it, the state of CA becomes I, and the state of CB becomes M I

The state is S (shared): At this time, both CA and CB have X, and they are consistent with the value of RAM.

event Behavior next state
local read Read directly from the cache of the CA, the state does not change S
local write The CA modifies the cache directly, and the state changes to M. CB becomes I M
remote read CB reads the same data as CA, and the state remains unchanged S
remote write CB corresponds to the local write of the above CA, the cache of CB becomes M, and the cache of CA becomes I I

Status is E (exclusive): At this time, only CA has X, and the X value of X is consistent with that of RAM (the same value means that it does not need to be written back to RAM)

event Behavior next state
local read Read directly from the cache of the CA, the state does not change E
local write Directly modify the cache of the CA, and the state changes to M M
remote read CB sends a read event, CA and CB need to share X, so the state changes to S S
remote write CA sets X to I I

The status is I (failed): the operation needs to be determined based on whether the CB has X and the status of the response.

event Behavior next state
local read If CB does not have X, then CA reads X, state is E If CB has X, state is M, then CB needs to be written back to RAM, then CA reads, state becomes S If CB has X, state is S or E , then CA reads directly, and both CA and CB become S E or S
local write CA needs to pull data from RAM. If CB does not have X, then CA directly pulls it and sets it to M after modification. If CB has X and the status is M, then CB needs to write back to RAM first, and then CA reads the latest data to the cache and modifies And set to M, CB becomes I If CB has X, state is S or E, then CA reads and sets to M, CB becomes I M
remote read It has expired. It is only related to reading and writing by yourself, and has nothing to do with other reading and writing. The status remains unchanged. I
remote write It has expired. It is only related to reading and writing by yourself, and has nothing to do with other reading and writing. The status remains unchanged. I

The above may be complicated, but the principle is actually very simple! As long as you understand the principles, you can calculate these states by yourself, mainly by grasping a few rules:

  1. When there is a core to read, you need to pay attention to whether the status of other cores has M. If there is M, you must first write M to RAM, and then read the latest data. That is: when the read operation needs to be fetched from RAM, the entire CPU level cannot have the cache line whose state is M, and some must be written back to RAM first.
  2. When there is a core to write, there will be more involved, but the core is: when writing, no other cores at the CPU level can be in the M state. If there is, it needs to be written back to RAM first, get the latest data modification, and the modification is completed. All cores other than this one become disabled.
  3. All these operations are guaranteed: any core modification cannot be overwritten! Any core reading needs to get the latest value of the current CPU cache level!
  4. Many of the above states are corresponding cases of each other, such as the remote read of CA, which corresponds to the local read of this CB.

Briefly describe the process again:

CA and CB are cache line X, and CB can represent several other CPU cores.

First look at the CA single-core case:

  • CA reads X for the first time, and the state of X is E at this time
  • CA modifies the value of X, and the state of X is M at this time
  • If CA writes the value of X back to RAM, then the state of X becomes E again.

At this time, join CB to participate:

  • CA reads X for the first time, and the state of X is E at this time
    • CB also reads X, then X should be shared, so both CA and CB are S
    • CB wants to write X, then CB needs to read X first, then both of them will become S first, then B modifies X, the state becomes M, and sends the modification message to the bus, after CA monitors it, it will rewrite it locally X is set to I, because CB has the latest data, but it has not been written to RAM, so we can neither get nor use the data in our own cache. (Personal understanding, not sophisticated)
  • CA modifies the value of X, and the state of X is M at this time
    • CB wants to read X, and there cannot be data that has modified X but has not been submitted to RAM in the CPU, so CA needs to write X back to RAM first, and the state will change to E, which is CB and then read, and CA listens to the read event , CA and CB become S again.
    • CB wants to write X, because CA modified X before CB, so it needs to write CA back to RAM first, then CB re-reads X, and then modifies X, then CA becomes invalid I , the exclusive right obtained by CB becomes M.

Guess you like

Origin blog.csdn.net/wgzblog/article/details/125977687