Java multi-threaded high concurrency advanced articles (1) - volatile implementation principle analysis

We know that in the class loading mechanism of the JVM, the peripheral source code files are compiled into bytecode files (. Execution. The concurrent programming mechanism we are talking about is of course inseparable from the implementation of the JVM and the instruction set of the CPU.

Anyone who knows JMM (Java Memory Model, Java Memory Model) knows that JMM revolves around atomicity, orderliness, and visibility . Later, I will write a special article to explain the Java memory model and its relationship with the processor memory model. And the relationship of the sequential consistency memory model.

1. The definition of volatile

In the Java Language Specification, volatile is defined as follows:

The Java programming language allows threads to access shared variables. In order to ensure accurate access and consistent updates of shared variables , threads should ensure that the variable is obtained separately through an exclusive lock, so the Java language provides volatile. If a variable is defined as volatile, Then the memory model of the Java thread (also called the thread's local memory) sees that the content of this variable is consistent. In other words, the Java language provides volatile to ensure the visibility of shared variables between threads.

 

2. The realization principle of volatile

Before understanding volatile, we need to understand some technical terms in CPU.


 

After understanding the above terminology, we understand how volatile guarantees visibility.

When a shared variable with volatile modification is written, when it is converted into an assembly instruction, there will be an extra line of lock addl $0×0, (%esp);

 

The lock prefix instruction will cause two things under multi-core processors:

① Write all the data of the current processor cache line back to the system memory (main memory in JMM).

②In this operation of writing back to main memory, the data cached at this memory address in other processors (CPUs) will be invalid.

Here we explain.

In order to improve the processing speed, the processor does not communicate directly with the main memory, but first reads the data from the main memory into the processor's internal cache (first-level or second-level cache or other). If the variable declared volatile is For a write operation, the JVM will send a lock-prefixed instruction to the processor , such as the red font part, and then write the cache line data where the variable is located back to main memory . If the value of the shared variable cached by other processors is still old , then there will be problems in subsequent operations. Therefore, in the case of multiple processors, in order to ensure that the contents of the shared variable cached by each processor are consistent, it is necessary to follow the cache coherence protocol . Propagated data to check whether its own cache line has expired. If it is found to be expired, it will set its own cache line storing the shared variable to an invalid state. When the processor needs to modify this data, it needs to go to The main memory re-reads the value of this shared variable , at this time of course the content modified by the previous processor is read.

 

3. A detailed description of the two principles of the volatile implementation principle

In 2, we talked about two things that the lock instruction causes in the multiprocessor situation. Let's introduce these two things in detail.

① The lock prefix instruction will cause the processor cache to be written back to memory

In older processors , during the execution of the Lock prefix instruction, a LOCK# signal is asserted, which ensures that the processor has exclusive access to shared memory (that is, main memory) during the execution of the instruction. When a processor asserts on the bus When the LOCK# signal, the requests of other processors will be blocked. This method is called " bus lock "

But in the latest models of processors , the LOCK # signal is no longer used to lock the bus , but the lock cache , because the overhead of the lock bus is too large. If the content accessed by the processor is already cached inside the processor, it will lock the The cache line of the memory area where the content is located, and the cache line content is written back to the shared memory (that is, the main memory), and the atomicity of the modification is guaranteed by the cache coherence protocol . This process is called " cache locking ".

One thing to note here : the simultaneouscache modification of data in memory regions cached by more than two processors .

2. Writing back the cache of one processor to main memory will invalidate the cache of other processors

In the processor, the MESI (modify, exclusive, share, invalidate) control protocol  is used to maintain the cache coherence between the internal processor of the processor and the cache coherence of other processors. In a multi-core processor system, the processor can sniff other processing The processor accesses main memory and their internal caches. The processor uses sniffing techniques to ensure that its internal cache, main memory, and other processors' caches are consistent with the data on the bus.

For example, in the Pentium and P6 family processors , if one processor detects through sniffing that the other processor intends to write back the shared variable to the main memory address, then this address is used for the processing of the cache line that cached the data. If the processor is in a shared state , the processor will invalidate the corresponding cache data line in its own processor cache.

 

Come to a picture, let the circled friends wake up. A schematic diagram of the memory structure of JMM.



 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326005470&siteId=291194637