Multicore Programming the CPU Cache

Cache Profile

Cache, namely the cache. Cache can improve read performance, the principle is content with the better performance of the storage medium storing a portion of the high-frequency access, access speed to enhance the overall probability.

In development, our mouths cache can be a variable, or redis. Inside the computer CPU, CPU often refers to all levels of CPU cache.

CPU Cache principle

Cache works is when the CPU to read a data, first from the CPU cache, found immediately read and sent to CPU; not found, the rate of reading from relatively slow memory and sent to CPU processing while the data block where the data transferred to the cache, may cause later on the whole the data read from the cache, without having to call the memory. It is this mechanism allows the CPU reads the read cache hit rate is very high (most CPU up to 90%), that is CPU next read data 90% of the CPU cache, only about 10 % needs to be read from memory. This saves the CPU directly read the memory of the time, when the CPU reads the data base without waiting. In general, a CPU reads the data sequence after the first cache memory. (Excerpt from Wikipedia)

The model simplification later, if you want to access CPU memory contents:

CPU Core1 --> L1 Cache --> L2 Cache --> L3 Cache --> RAM

CPU Core2 --> L1 Cache --> L2 Cache --> L3 Cache --> RAM

It should be noted that the simple cases, each CPU core has its own independent multi-level cache, there are three common. The access speed, L1> L2> L3, capacity is generally inversely proportional to the speed. Popular point that you somewhere in the variable declaration int foo = 1;, CPU foo is to obtain the value from the L1 ~ L3 cache in the case where there is, no multi-level cache memory hit before going to take.

Now a relatively new Intel CPU models, the cache is no longer independent of each other designed, dual-core will be shared L2 cache, namely "Smart cache" shared cache technology.

Cache Line

The Cache in a length cut open, there is a lot of Cache Line. Cache Cache Line is the smallest unit, usually 64 bytes. If the L1 cache is 6400 bytes, then he can be divided into 100 Cache Line. In the C language, you can sense the smallest memory units should be variable, int, long long, etc., they usually only 4 bytes or 8 bytes. CPU cache for performance, and is usually Cache Line in units of a large breath cache memory. Cache Cache Line will be a value in a number of variables. If the Cache Line have dirty data, but also to update it for the entire unit.

Cache consistency

The computer must ensure that the data in the cache is always new. If the memory value has changed, CPU Cache is not synchronized in time, there have been inconsistent data. Under the multi-core CPU architecture, to ensure consistency is more complex, such as multiple CPU Cache are cached value of a variable, but that variable is one of the core changes the value of the cache in the CPU core how other timely perception and refresh cache?

MESI protocol can solve the multi-core CPU Cache Coherence.

MESI（Modified Exclusive Shared Or Invalid） 摘自 https://www.cnblogs.com/shangxiaofei/p/5688296.html

MESI is also known as the Illinois agreement, because the agreement proposed support is a widely used by the Illinois State University, write-back policy of cache coherency protocol, the protocol is used in Intel Pentium series CPU.

MESI protocol state

Each CPU cache line (caceh line) labeled using four states (two additional (bit) shown):

M: is modified (Modified)

The cache line is only cached in the cache of the CPU, and is a modified (Dirty), that is inconsistent with the data in the main memory, the cache line in the memory needs some future point in time (CPU allows other Please read the main memory before the corresponding memory) write-back (write back) main memory.

After being written back to main memory, the state of the cache line will become exclusive (exclusive) state.

E: exclusive (Exclusive)

The cache line is only cached in the cache of the CPU, which is not modified (Clean), data consistent with the main memory. This state may be a state when there becomes a shared memory to read the other CPU (shared) at any time.

Similarly, when the CPU modifying the contents of the cache line in the Modified state may become state.

S: Shared (the Shared)

This state means that the cache line may be a plurality of CPU cache, and data consistency in each cache and main memory data (Clean), when there is a CPU to modify the cache line,

Other CPU in the cache line may be canceled (becomes inactive (Invalid)).

I: invalid (Invalid)

The cache is invalid (other CPU might have to modify the cache line).

A brief summary:

There is a communication between the various core CPU mode, used to notify other cores on a Cache Line has expired. CPU before the write operation of the Cache, the Cache will determine which state the Line.

When the Cache in Shared state, a core CPU write operations, will be broadcast to inform other CPU core.

In this way, ensure the consistency of the Cache.

Cache Miss

We need to know CPU Cache to improve the data read speed. If you want to access a memory, CPU, Cache was not, we call him Cache Miss. Cache Miss will make your head big, CPU had to spend a lot of time in the memory data loaded into Cache. Generally, miss rate L1 is about 10%. Upside down and think about it, there are actually about 90 percent from the field, have to admire the ability of programmers Great God.

False Sharing False Sharing

I.e., false sharing MESI unhealthy the Shared / Invalid state. Consider this scenario.

struct {

int thread1_data; // 线程1只读写它

int thread2_data; // 线程2只读写它

};

At the same time there are two threads (thread1 and thread2) only to read and write his own that variable. Seemingly all play each independently of each other, because the two variables actually get very close, tend to be placed in a Cache Line. thread1 read and write thread1_data will cause the core2 core of thread2_data cache is marked as invalid Invalid, so as to refresh the Cache. We know that will be loaded into memory Cache is very time-consuming, if the trigger too frequently will cause performance degradation.

On the array of read and write multi-threaded, with particular attention to the false sharing problem.

The nature of false sharing is that the concept of higher language, among seemingly variables are independent, but at the level of CPU Cache, two variables addresses suffer too close (in a Cache Line range) can only Cache Line as a whole point of view.

CPU instructions out of order with the barrier Barrier

Because of Cache Miss and other time-consuming work, but the CPU can load data at the same time, doing something else, then the directive will certainly be disrupted.

In short, the CPU instructions are executed out of order for faster performance and higher efficiency.

Instructions out of order took place around us, give a popular example,

1 2	`a = 1；` `b = 2；`

In the case of CPU instructions out of order, a and b are assigned whoever is not known. It will cause very serious problem if multiple threads depend on its order, unlocked the case.

barrier command can solve the problem of CPU instructions out of order. It tells the CPU in some places not out of order. This is a low-level instructions for advanced language users, the lock should be used, atomic operations to solve.

Tiger-Li

Published 74 original articles · won praise 337 · Views 1.3 million +

His message board concerns

Multicore Programming the CPU Cache

Guess you like