Memory barriers ensure cache coherence

 In the previous memory system reordering , it was mentioned that "the write cache is not flushed to the memory in time, resulting in different values ​​in the cache of different processors" . This situation is bad. Fortunately, the processor follows the cache coherence protocol to ensure sufficient visibility. Sex without sacrificing performance too much.

 The cache coherence protocol defines a state for a cache line (usually 64 bytes): exclusive, shared, modified, invalid, which is used to describe whether the cache line is used by a multiprocessor. Share and modify. Therefore, the cache coherence protocol is also called the MESI protocol .

  • Exclusive (exclusive): Only the current processor owns the cache line, and it has not been modified and is the latest value.
  • Shared (share): There are multiple processors owning the cache line, each processor has not modified the cache and is the latest value.
  • Modified: The cache line has been modified and needs to be written back to main memory, and other owners are notified that "this cache has been invalidated".
  • Invalid: The cache line has been modified by other processors, the value is not the latest value, and the latest value on the main memory needs to be read.

optimization

 Processing and modifying the state is a time-consuming operation. It not only sends invalidation messages to other owners and writes them back to main memory, but also waits for other owners to process invalidation information until a response to the invalidation message is received. It would be a luxury if the processor was sitting idle during this period of time. So introduce caches and invalidate caches to keep the processor from "waiting".

store cache

 Store Buffers, also known as write caches , when the processor modifies the cache and puts the new value in the store cache, the processor can do other things, leaving the rest to the store cache.

invalidation queue

 Dealing with invalid caches is also not trivial and requires reading from main memory. And the storage cache is not infinite, so when the storage cache is full, the processor still has to wait for the invalidation response. In order to solve the above two problems, an invalidate queue (invalidate queue0) is introduced.

 The work of handling failures is as follows:

  1. When an invalidation message is received, it is put into the invalidation queue.
  2. In order to prevent the processor from waiting for the invalidation response for a long time, it needs to reply to the invalidation response immediately after receiving the invalidation message.
  3. In order not to block the processor frequently, the main memory will not be read immediately and the cache will be set to invlid, and the invalidation queue will be processed together when appropriate.

Trigger memory reordering

 The following is the timing diagram of processors A and B, writing and reading memory a in turn. Both A and B cache a.

 It can be seen that even if the cache coherence protocol is adhered to, there will be a period of time cache inconsistency (①-⑥).

 If the operation to read a is within this time period, then processor B will see a as 0. The processor execution order is write a > read a, and the order in memory is read a > write a, resulting in reordering . Reordering may lead to invisibility . If threads A and B are executed on processors A and B respectively, after thread A performs a write operation, thread B cannot see the result of thread A's execution, and shared memory a does not . It can be seen that the result of running the program has been changed.

Avoid memory reordering

 Triggering a reorder is bad and can make shared memory invisible, changing program results. So what to do, without MESI optimization? Neither the pursuit of performance, resulting in reordering, nor the pursuit of visibility (non-shared data visibility is not required), reduce performance.

 The processor still provides a weapon - the memory barrier instruction (Memory Barrier):

  1. Write memory barrier (Store Memory Barrier): The processor writes the value of the current store cache back to main memory in a blocking manner.
  2. Load Memory Barrier: The processor handles the invalidation queue in a blocking manner.

 It can be seen that the memory barrier can prevent the memory system from reordering and ensure visibility. But its overhead is also very large, the processor needs to block and wait, and it is generally used in the acquisition and release of locks.

The above paragraphs of processors A and B write and read memory a in turn. After adding a memory barrier, they will not be reordered.

boolean finish = false;
int a = 0;

//处理器A:
a = 1;
storeMemoryBarrer(); //保证a一定在主存中,且处理器B中a为invlid
finish = true;

//处理器B:
while(!finish);
loadMemoryBarrier(); //保证缓存到a最新的值,执行后a为share
assert a == 1;

Abstract Memory Barrier in JMM

 

 为了更好的理解如何实现同步的可见性,JMM抽象出了内存屏障Memory Barrier。
内存屏障

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326050955&siteId=291194637