[Concurrency Topic] In-depth understanding of concurrent visibility, orderliness, atomicity and JMM memory model

Pre-knowledge

You may need to have a little understanding of computer composition, especially the three-level cache structure. " [Concurrent Topic] Operating System Model and L3 Cache Architecture "

Course content

1. JMM model

1. What is the JMM model

Java memory model (Java Memory Model referred to as JMM),It's an abstract concept that doesn't really exist, which describes a set of rules or specifications,Through this set of specifications, the access methods of various variables in the program (including instance fields, static fields and elements constituting array objects) are defined.
We know that the essence of JVM running Java programs is to run threads. When each thread is created, the JVM will create a working memory (thread stack) for it to store thread-private data. The Java memory model stipulates that all variables are stored in the main memory, which is a shared memory area. All threads All can be accessed, but the operation of the thread on the variable (read assignment, etc.) must be carried out in the working memory. First, the variable must be copied from the main memory to its own working memory space, and then the variable is operated. After the operation is completed, the variable Write back to the main memory, you cannot directly manipulate the variables in the main memory, the copy of the variable in the main memory is stored in the working memory, as mentioned earlier, the working memory is the private data area of ​​each thread, so different threads cannot access each other The working memory, the communication (passing value) between threads must be done through the main memory.

2. JMM memory area model

The JMM memory area model is different from the JVM memory area model. The division of JMM and JVM memory areas is a different conceptual level.JMM revolves around atomicity, orderliness, and visibilityof. The only similarity between JMM and JVM memory areas is the logical concept of shared data areas and private data areas. In JMM, the main memory belongs to the shared data area (including the heap and method area in the JVM), while the working memory belongs to the thread private data area (corresponding to the thread stack in the JVM).
The working interaction diagram of threads, working memory and main memory is as follows (based on the JMM specification):
insert image description here

  • Main memory : Java instance objects are mainly stored. All instance objects created by threads are stored in the main memory, regardless of whether the instance object is a member variable or a local variable (also called local variable) in the method, of course, it also includes shared ones. Class information, constants, static variables. Since it is a shared data area, multiple threads may have thread safety issues when accessing the same variable
  • Working memory : mainly stores all the local variable information of the current method (the copy of the variable in the main memory is stored in the working memory), and each thread can only access its own working memory.That is, local variables in a thread are not visible to other threads, even if the two threads execute the same piece of code, they will each create a local variable belonging to the current thread in their own working memory, of course, including the bytecode line number indicator and related Native method information. Note that since the working memory is the private data of each thread, threads cannot access the working memory, so the data stored in the working memory does not have thread safety issues.
    According to the data storage type and operation mode of the main memory and working memory of the JVM virtual machine specification, for a member method in an instance object, if the method contains local variables that are basic data types (boolean, byte, short, char, int, long, float, double), will be directly stored in the frame stack structure of the working memory, but if the local variable is a reference type, then the reference of the variable will be stored in the frame stack of the functional memory, and the object instance will be stored in the main memory (shared data area, heap). But for the member variables of the instance object, no matter it is a basic data type or a wrapper type (Integer, Double, etc.) or a reference type, it will be stored in the heap area. As for static variables and information about the class itself, it will be stored in main memory. It should be noted that the instance object in the main memory can be shared by multiple threads. If two threads call the same method of the same object at the same time, the two threads will copy the data to be operated to their own working memory. In , it is not flushed to main memory until the operation is completed.

The model is shown in the figure below:
insert image description here

3. The relationship between the JMM memory model and the hardware memory architecture

Through the understanding of the previous hardware memory architecture, Java memory model, and Java multithreading implementation principles, we should have realized that the execution of multithreading will eventually be mapped to the hardware processor for execution, but the Java memory model and hardware memory architecture Not exactly.
For hardware memory, there are only the concepts of registers, cache, and main memory, and there is no distinction between working memory and main memory. That is to say, the division of memory by the Java memory model does not have any substantial impact on hardware memory, because JMM is just An abstract concept that is a set of rules that does not actually exist.Regardless of the data in the working memory or the data in the main memory, for computer hardware, it will be stored in the main memory of the computer. Of course, with the execution of the code, the data will flow and then be stored in the CPU cache or register.. Therefore, in general, the Java memory model and the computer hardware memory architecture are a cross relationship, which is a cross between abstract concept division and real physical hardware:
insert image description here

4. The necessity of the existence of JMM

After understanding the specific relationship between Java memory area division, hardware memory architecture, Java multithreading implementation principle and Java memory model, let's talk about the necessity of Java memory model.
Since the entity of the Java running program is a thread, and when each thread is created, the JVM will create a working memory (thread stack) for it to store thread-private data, and the variable operations between the thread and the main memory must be done indirectly through the working memory , the main process is to copy the variable from the main memory to each thread’s respective working memory space, and then operate on the variable, and then write the variable back to the main memory after the operation is completed. If there are two threads simultaneously operating on an instance in the main memory Operations on object variables may induce thread safety issues.

For example:
Assume that there is a shared variable x in the main memory, and now there are two threads A and B that operate on the variable x=1 respectively, and there is a copy of the shared variable x in the working memory of the A/B threads.
Now thread A wants to modify the value of x to 2, but thread B wants to read the value of x, so is the value read by thread B the value 2 after the update of thread A or the value 1 before the update? The answer is, not sure.
Thread B may read the value 1 before the update of thread A, or read the value 2 after the update of thread A. This is because the working memory is a private data area for each thread, and when thread A variable x, first It is to copy the variable from the main memory to the working memory of the A thread, and then operate on the variable. After the operation is completed, the variable x is written back to the main memory. The same is true for the B thread, which may cause the main memory and There is a consistency problem in the data between the working memory. If the A thread is writing the data back to the main memory after modification, and the B thread is reading the main memory at this time, it will copy x=1 to its own working memory, so the value read by the B thread is x=1, But if thread B starts to read after thread A has written x=2 back to the main memory, then thread B reads x=2 at this time, but which situation happens first? There is no way to be sure.
As shown in the example diagram below: the
insert image description here
specific interaction protocol between the main memory and the working memory above, that is, the implementation details of how a variable is copied from the main memory to the working memory and how to synchronize from the working memory to the main memory, Java memory The model defines the following eight operations to complete.

5. Eight atomic operations for data synchronization

  1. lock (lock): acts on the variables of the main memory, marking a variable as a thread exclusive state
  2. unlock (unlock): Acts on variables in the main memory, releases a variable that is in a locked state, and the released variable can be locked by other threads
  3. read (read): Acts on the variable of the main memory, transfers a variable value from the main memory to the working memory of the thread, so that the subsequent load action can be used
  4. load (loading): a variable acting on the working memory, which puts the variable value obtained from the main memory by the read operation into the variable copy of the working memory
  5. use (use): acts on variables in the working memory, and passes a variable value in the working memory to the execution engine
  6. assign (assignment): a variable acting on the working memory, which assigns a value received from the execution engine to a variable in the working memory
  7. Store (storage): Acts on the variable of the working memory, transfers the value of a variable in the working memory to the main memory for subsequent write operations
  8. write (write): acts on a variable in working memory, which transfers the store operation from the value of a variable in working memory to a variable in main memory

If you want to copy a variable from the main memory to the working memory, you need to execute the read and load operations sequentially. If you want to synchronize the variable from the working memory to the main memory, you need to execute the store and write operations sequentially. However, the Java memory model only requires that the above operations must be executed in order, and there is no guarantee that they must be executed sequentially.
insert image description here

6. The phenomenon of instruction rearrangement and the visibility of concurrent programming, atomicity and orderliness

command rearrangement

If you want to understand the orderliness of concurrent programming, you must first understand the [instruction rearrangement phenomenon]. What is instruction rearrangement? And why is there such a thing? It can only be said that in order to optimize the execution process and improve execution efficiency, Java applications will perform instruction reordering operations during execution, and even the CPU level also has instruction reordering operations. For specific related content, please Baidu, the author has general knowledge of computer principles, and I don’t quite understand it.
Let me show you a classic instruction rearrangement scene, new Objectwhen we all think that the semantics should be as follows:

memory = allocate();//1.分配对象内存空间
instance(memory);//2.初始化对象
instance = memory;//3.设置instance指向刚分配的内存地址,此时instance!=null

But in fact, the JIT may reorder it into the following steps:

memory = allocate();//1.分配对象内存空间
instance = memory;//3.设置instance指向刚分配的内存地址,此时instance!=null
instance(memory);//2.初始化对象

This is why, when we implement a singleton, we say that the [double check lock] method is not thread-safe, because there is reordering!

Visibility, atomicity and ordering

  1. Atomicity:Atomicity means that an operation is uninterruptible. Even in a multi-threaded environment, once an operation starts, it will not be affected by other threads..
    In java, the reading and assignment operations of variables of basic data types are atomic operations. It should be noted that for 32-bit systems, long type data and double type data (for basic data types, byte, short , int, float, boolean, char read and write are atomic operations), their reading and writing are not atomic, that is to say, if there are two threads reading and writing long or double data at the same time, there will be mutual interference. Because for a 32-bit virtual machine, each atomic read and write is 32-bit, while long and double are 64-bit storage units, which will cause a thread to write after the atomic operation of the first 32 bits. When it is thread B's turn to read, only the last 32 bits of data are read, so a variable that is neither the original value nor the modified value of the thread may be read, it may be the value of "half variable", that is The 64-bit data is divided into two reads by two threads. But don't worry too much, because it is rare to read "half a variable". At least in the current commercial virtual machines, almost all read and write operations of 64-bit data are performed as atomic operations, so for this Don’t worry too much about the problem, just know what’s going on;
  2. Orderliness:Orderedness means that for single-threaded execution code, we always think that the execution of the code is executed sequentially, there is nothing wrong with this understanding, after all, it is true for single-threaded,However, for a multi-threaded environment, there may be out-of-order phenomena, because after the program is compiled into machine code instructions, instruction rearrangement may occur, and the order of the rearranged instructions may not be consistent with the original instructions., what you need to understand is that in a Java program, if within this thread, all operations are regarded as orderly behavior. If it is a multi-threaded environment, if one thread observes another thread, all operations are out of order. The first sentence refers to ensuring the consistency of serial semantic execution in a single thread, and the second half of the sentence refers to the phenomenon of instruction rearrangement and synchronization delay between working memory and main memory;
  3. Visibility: After understanding the phenomenon of instruction rearrangement, visibility is easy. Visibility refers to whether other threads can immediately know the modified value when a thread modifies the value of a shared variable. For serial programs, visibility does not exist, because we modify the value of a variable in any operation, and the variable value can be read in subsequent operations, and it is a new modified value.
    But it is not necessarily true in a multi-threaded environment. As we have analyzed before, since the operations of threads on shared variables are copied to their respective working memory and then written back to the main memory, there may be a Thread A modifies the value of the shared variable x, and before writing it back to the main memory, another thread B operates on the same shared variable x in the main memory, but at this time, the shared variable x in the working memory of the thread A will affect the thread B. It is not visible. This synchronization delay between working memory and main memory causes visibility problems. In addition, instruction rearrangement and compiler optimization may also cause visibility problems. Through the previous analysis, we know that whether it is compiler optimization or The rearrangement phenomenon of processor optimization, in a multi-threaded environment, will indeed lead to the problem of program execution in turn, which will also lead to visibility problems.

7. How does JMM solve the problem of atomicity & visibility & orderliness

atomicity problem

In addition to the atomicity of reading and writing operations on basic data types provided by the JVM itself, atomicity can be achieved through synchronized and Lock. Because synchronized and Lock can guarantee that only one thread accesses the code block at any one time.

visibility problem

The volatile keyword guarantees visibility. When a shared variable is modified by volatile, it will ensure that the modified value is immediately seen by other threads, that is, the modified value is immediately updated to the main memory, and when other threads need to read it, it will go to the memory to read the new value. value. Synchronized and Lock can also guarantee visibility, because they can ensure that only one thread can access shared resources at any time, and flush the modified variables to memory before releasing the lock.

Expansion: In fact, there are two ways to ensure visibility:

  1. Memory barriers (AQS locks, volatile, and synchronized are all implemented based on memory barriers, and the memory barriers will be introduced in detail below)
  2. context switch

order problem

In Java, you can use the volatile keyword to ensure a certain "order" (the specific principle is described in the next section about the volatile keyword). In addition, synchronized and Lock can be used to ensure orderliness. Obviously, synchronized and Lock guarantee that one thread executes synchronization code at each moment, which is equivalent to letting threads execute synchronization code sequentially, which naturally guarantees orderliness.
Java Memory Model : Each thread has its own working memory (similar to the previous cache). All operations on variables by threads must be performed in working memory, and cannot directly operate on main memory. And each thread cannot access the working memory of other threads. The Java memory model has some innate "orderness", that is, the orderliness that can be guaranteed without any means, which is usually called happens-beforea principle. If the execution order of two operations cannot be deduced from the happens-before principle, then they cannot guarantee their order, and the virtual machine can reorder them at will.
Instruction reordering : The Java language specification stipulates that JVM threads maintain sequential semantics. That is, as long as the final result of the program is equal to the result of its serialization situation, the execution order of instructions can be inconsistent with the order of codes. This process is called reordering of instructions. What is the significance of instruction reordering? The JVM can properly reorder the machine instructions according to the characteristics of the processor (CPU multi-level cache system, multi-core processor, etc.), so that the machine instructions can be more in line with the execution characteristics of the CPU, and the machine performance can be maximized.
The following figure is a schematic diagram of the sequence of instructions from source code to final execution:
insert image description here

8. as-if-serial semantics

The as-if-serial semantics mean:No matter how reordering (compiler and processor to improve parallelism), the execution result of (single-threaded) program cannot be changed. The compiler, runtime, and processor must all obey as-if-serial semantics.
In order to comply with as-if-serial semantics, the compiler and processor will not reorder operations with data dependencies, because such reordering will change the execution result. However, operations may be reordered by compilers and processors if no data dependencies exist between them.

9. happens-before principle

Only relying on the sychronized and volatile keywords to ensure atomicity, visibility, and ordering, then writing concurrent programs may seem very troublesome. Fortunately, starting from JDK 5, Java uses the new JSR-133 memory model, which provides The happens-before principle is used to assist in ensuring the atomicity, visibility, and order of program execution. It is the basis for judging whether there is competition in data and whether threads are safe. The contents of the happens-before principle are as follows:

  1. The principle of program order, that is, semantic seriality must be guaranteed within a thread, that is to say, it is executed in code order.
  2. Lock rules The unlocking (unlock) operation must occur before the subsequent locking (lock) of the same lock, that is to say, if a lock is unlocked and then locked again, then the locking action must be after the unlocking action (same as a lock).
  3. Volatile rules The writing of volatile variables occurs first before reading, which ensures the visibility of volatile variables. The simple understanding is that every time a volatile variable is accessed by a thread, it is forced to read the value of the variable from the main memory, and when When the variable changes, it will force the latest value to be refreshed to the main memory. At any time, different threads can always see the latest value of the variable.
  4. Thread start rules The start() method of a thread precedes each of its actions, that is, if thread A modifies the value of the shared variable before executing the start method of thread B, then when thread B executes the start method, thread A will change the value of the shared variable The modification is visible to thread B
  5. Transitive A precedes B, B precedes C, then A must precede C
  6. Thread termination rules All operations of a thread precede the termination of the thread. The function of the Thread.join() method is to wait for the currently executing thread to terminate. Assuming that the shared variable is modified before thread B terminates, after thread A successfully returns from thread B's join method, thread B's modification of the shared variable will be visible to thread A.
  7. Thread Interruption Rules The call to the thread interrupt() method occurs first when the code of the interrupted thread detects the occurrence of an interrupt event, and the Thread.interrupted() method can be used to detect whether the thread is interrupted.
  8. Object finalization rules The constructor of the object is executed, and the end is preceded by the finalize() method

Two, volatile memory semantics

Volatile is a lightweight synchronization mechanism provided by the Java virtual machine. The volatile keyword has the following two functions:

  • It is guaranteed that the shared variable modified by volatile is visible to the total number of all threads, that is, when a thread modifies the value of a shared variable modified by volatile, the new value can always be known immediately by other threads.
  • Disable instruction reordering optimizations.

1. The visibility of volatile

Regarding the visibility of volatile, we must realize that variables modified by volatile are immediately visible to the total number of all threads, and all write operations on volatile variables can always be immediately reflected in other threads.
Code example: (After thread A changes the initFlag attribute, thread B immediately perceives it)

public class VolatileVisibilitySample {
    
    
    volatile boolean initFlag = false;

    public void save() {
    
    
        this.initFlag = true;
        String threadname = Thread.currentThread().getName();
        System.out.println("线程:" + threadname + ":修改共享变量initFlag");
    }

    public void load() {
    
    
        String threadname = Thread.currentThread().getName();
        while (!initFlag) {
    
    
            //线程在此处空跑,等待initFlag状态改变
        }
        System.out.println("线程:" + threadname + "当前线程嗅探到initFlag的状态的改变");
    }

    public static void main(String[] args) {
    
    
        VolatileVisibilitySample sample = new VolatileVisibilitySample();
        Thread threadA = new Thread(() -> {
    
    
            sample.save();
        }, "threadA");
        Thread threadB = new Thread(() -> {
    
    
            sample.load();
        }, "threadB");
        threadB.start();
        try {
    
    
            Thread.sleep(1000);
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        threadA.start();
    }
}

// 系统输出:
// 线程:threadA:修改共享变量initFlag
// 线程:threadB当前线程嗅探到initFlag的状态的改变

2. volatile cannot guarantee atomicity

public class VolatileVisibility {
    
    
    public static volatile int i = 0;

    public static void increase() {
    
    
        i++;
    }

    public static void main(String[] args) throws InterruptedException {
    
    
        Thread thread1 = new Thread(() -> {
    
    
            for (int j = 0; j < 1000; j++) {
    
    
                increase();
            }
        });

        Thread thread2 = new Thread(() -> {
    
    
            for (int j = 0; j < 1000; j++) {
    
    
                increase();
            }
        });

        thread1.start();
        thread2.start();

        Thread.sleep(3000);
        System.out.println("累加2000次后,i=" + i);
    }
//    系统输出:
//    累加2000次后,i=1974
}

In a concurrent scenario, any change in the i variable will immediately be reflected in other threads, but if there are multiple threads calling the increase() method at the same time, there will be thread safety issues. After all, the i++; operation is not atomic. The operation is to read the value first, and then write back a new value, which is equivalent to the original value plus 1, and it is done in two steps. If the second thread reads the old value and writes back the new value during the first thread i field value, then the second thread will see the same value as the first thread, and perform the same value plus 1 operation, which also causes a thread safety failure, so the increase method must be modified with synchronized , in order to ensure thread safety, it should be noted that once the synchronized modification method is used, since synchronized itself has the same characteristics as volatile, that is, visibility, the volatile modification variable can be completely omitted in this case.

3.volatile prohibits rearrangement optimization

Another function of the volatile keyword is to prohibit instruction rearrangement optimization, so as to avoid the phenomenon of out-of-order execution of programs in a multi-threaded environment. The instruction rearrangement optimization has been analyzed in detail before. Here is a brief explanation of how volatile realizes the prohibition of instruction reordering. row optimized. First understand a concept, memory barrier (Memory Barrier).

4. Memory barrier at the hardware layer

Intel hardware provides a series of memory barriers, mainly: (fence, meaning fence, fence)

  1. lfence, is a Load Barrier read barrier
  2. sfence, is a Store Barrier write barrier
  3. mfence, is an all-around barrier with the ability of ifence and sfence
  4. Lock prefix, Lock is not a memory barrier, but it can perform functions similar to memory barriers. Lock will lock the CPU bus and cache, which can be understood as a lock at the CPU instruction level. It can be followed by instructions such as ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG.

Different hardware implements memory barriers in different ways. The Java memory model shields the differences of the underlying hardware platforms, and the JVM generates corresponding machine codes for different platforms. Four types of memory barrier instructions are provided in the JVM:
insert image description here
memory barrier, also known as memory fence, is a CPU instruction with two functions, one is to ensure the execution order of specific operations, and the other is to ensure the memory visibility of certain variables ( Use this feature to achieve volatile memory visibility). Since both compiler and processor can perform instruction rearrangement optimization. If a Memory Barrier is inserted between instructions, it will tell the compiler and CPU that no instruction can be reordered with this Memory Barrier instruction, that is to say, by inserting a memory barrier, instructions before and after the memory barrier are prohibited from performing reordering optimization. Another function of Memory Barrier is to force the cache data of various CPUs to be flushed, so threads on any CPU can read the latest version of these data. In short, volatile variables realize their semantics in memory through memory barriers, that is, visibility and prohibition of rearrangement optimization. Let's look at a very typical example DCL that prohibits rearrangement optimization, as follows:

public class DoubleCheckLock {
    
    
    private volatile static DoubleCheckLock instance;

    private DoubleCheckLock() {
    
    
    }

    public static DoubleCheckLock getInstance() {
    
    
        //第一次检测
        if (instance == null) {
    
    
//同步
            synchronized (DoubleCheckLock.class) {
    
    
                if (instance == null) {
    
    
                    //多线程环境下可能会出现问题的地方
                    instance = new DoubleCheckLock();
                }
            }
        }
        return instance;
    }
}

The above code is a classic singleton double detection code. This code has no problem in a single-threaded environment, but if it is in a multi-threaded environment, thread safety problems may occur. The reason is that when a certain thread executes the first detection and the read instance is not null, the reference object of the instance may not be initialized.

memory = allocate();//1.分配对象内存空间
instance(memory);//2.初始化对象
instance = memory;//3.设置instance指向刚分配的内存地址,此时instance!=null

But in fact, the JIT may reorder it into the following steps:

memory = allocate();//1.分配对象内存空间
instance = memory;//3.设置instance指向刚分配的内存地址,此时instance!=null
instance(memory);//2.初始化对象

Since there is no data dependency between step 2 and step 3, and the execution result of the program does not change in a single thread no matter before or after rearrangement, this kind of rearrangement optimization is allowed. However, instruction rearrangement only guarantees the consistency of execution of serial semantics (single thread), but does not care about the semantic consistency between multiple threads. Therefore, when a thread accesses an instance that is not null, since the instance instance may not have been initialized, it also causes a thread safety problem. So how to solve it, it is very simple, we use volatile to prohibit the instance variable from being executed by order rearrangement optimization.

private volatile static DoubleCheckLock instance;

3. Realization of volatile memory semantics

As mentioned earlier, reordering is divided into compiler reordering and processor reordering. To implement volatile memory semantics, the JMM restricts these two types of reordering types separately.
The figure below is the volatile reordering rule table formulated by JMM for the compiler.
insert image description here
For example, the last cell in the second row means: In the program, when the first operation is reading or writing of ordinary variables, if the second operation is volatile writing, the compiler cannot reorder the two variables. operation. As can be seen from the figure above:

  • When the second operation is a volatile write, no reordering is possible regardless of the first operation. This rule ensures that operations before volatile writes are not reordered by the compiler to follow volatile writes.
  • When the first operation is a volatile read, no reordering is possible regardless of the second operation. This rule ensures that operations following a volatile read are not reordered by the compiler to precede a volatile read.
  • When the first operation is a volatile write and the second operation is a volatile read or write, no reordering is possible.

In order to achieve the memory semantics of volatile, when the compiler generates bytecode, it inserts memory barriers in the instruction sequence to prohibit certain types of processor reordering. It is nearly impossible for the compiler to find an optimal placement that minimizes the total number of insertion barriers. To this end, JMM adopts a conservative strategy. The following is the JMM memory barrier insertion strategy based on the conservative strategy.

  • Insert a StoreStore barrier in front of every volatile write operation.
  • Insert a StoreLoad barrier after each volatile write operation.
  • Insert a LoadLoad barrier after each volatile read operation.
  • Insert a LoadStore barrier after each volatile read operation.

The following is a schematic diagram of the instruction sequence generated after the volatile write is inserted into the memory barrier under the conservative strategy: the
insert image description here
StoreStore barrier in the above figure can ensure that before the volatile write, all the normal write operations before it are already visible to any processor. This is because the StoreStore barrier will ensure that all ordinary writes above are flushed to main memory before volatile writes.

What is more interesting here is that the StoreLoad barrier behind the volatile write. The function of this barrier is to avoid reordering of volatile writes and possible subsequent volatile read/write operations. Because the compiler often cannot accurately determine whether a StoreLoad barrier needs to be inserted after a volatile write (for example, the method returns immediately after a volatile write). In order to ensure the correct implementation of volatile memory semantics, JMM adopts a conservative strategy: insert a StoreLoad barrier after each volatile write, or in front of each volatile read. From the perspective of overall execution efficiency, JMM finally chose to insert a StoreLoad barrier after each volatile write. Because a common usage pattern for volatile write-read memory semantics is: one writer thread writes a volatile variable, and multiple reader threads read the same volatile variable. When the number of read threads greatly exceeds that of write threads, choosing to insert the StoreLoad barrier after volatile writes will bring considerable improvement in execution efficiency. From here, we can see a feature of JMM implementation: first ensure correctness, and then pursue execution efficiency.
The following figure is a schematic diagram of the instruction sequence generated after the volatile read is inserted into the memory barrier under the conservative strategy: the
insert image description here
LoadLoad barrier in the above figure is used to prevent the processor from reordering the above volatile read and the following normal read. The LoadStore barrier is used to prevent the processor from reordering the upper volatile reads with the lower ordinary writes.
The above memory barrier insertion strategy for volatile writes and volatile reads is very conservative. In actual execution, as long as the memory semantics of volatile write-read are not changed, the compiler can omit unnecessary barriers according to specific circumstances. The following is illustrated by specific sample code:

   class VolatileBarrierExample {
    
    
       int a;
       volatile int v1 = 1;
       volatile int v2 = 2;
       void readAndWrite() {
    
    
           int i = v1;      // 第一个volatile读
           int j = v2;       // 第二个volatile读
           a = i + j;         // 普通写
           v1 = i + 1;       // 第一个volatile写
            v2 = j * 2;       // 第二个 volatile写
       }
    }

For the readAndWrite() method, the compiler can perform the following optimizations when generating bytecode.
insert image description here
Note that the final StoreLoad barrier cannot be omitted. Because after the second volatile is written, the method returns immediately. At this time, the compiler may not be able to accurately determine whether there will be volatile reads or writes later. For safety reasons, the compiler usually inserts a StoreLoad barrier here.
The above optimization is aimed at any processor platform. Since different processors have different "tightness" processor memory models, the insertion of memory barriers can be further optimized according to the specific processor memory model. Taking the X86 processor as an example, in Figure 3-21 except for the last StoreLoad barrier, other barriers will be omitted.
The volatile read and write under the previous conservative strategy can be optimized as shown in the figure below on the X86 processor platform. As mentioned earlier, the x86 processor only reorders write-read operations. X86 does not reorder read-read, read-write, and write-write operations, so the memory barriers corresponding to these three types of operations are omitted in the X86 processor. In X86, JMM only needs to insert a StoreLoad barrier after the volatile write to correctly implement the volatile write-read memory semantics. This means that on an x86 processor, volatile writes are much more expensive than volatile reads (due to the higher overhead of executing the StoreLoad barrier).
insert image description here

summarize

grateful

Thanks to the author of Zhihu [Blog Viewpoint Broadview] " Under what circumstances will java command rearrangement? " provides the answer

Guess you like

Origin blog.csdn.net/qq_32681589/article/details/132012411