2. Detailed explanation of volatile in concurrent programming

Visibility, Atomicity and Order of Concurrent Programming

Atomicity
Atomicity means that an operation is uninterruptible. Even in a multi-threaded environment, once an operation starts, it will not be affected by other threads.

In java, the reading and assignment operations of variables of basic data types are atomic operations. It should be noted that for 32-bit systems, long type data and double type data (for basic data types, byte, short , int, float, boolean, char read and write are atomic operations), their reading and writing are not atomic, that is to say, if there are two threads reading and writing long or double data at the same time, there will be mutual interference. Because for a 32-bit virtual machine, each atomic read and write is 32-bit, while long and double are 64-bit storage units, which will cause a thread to write after the atomic operation of the first 32 bits. When it is thread B's turn to read, only the last 32 bits of data are read, so a variable that is neither the original value nor the modified value of the thread may be read, which may be the value of "half variable", that is The 64-bit data is divided into two reads by two threads. But don't worry too much, because it is rare to read "half a variable". At least in the current commercial virtual machines, almost all read and write operations of 64-bit data are performed as atomic operations, so for this Don't worry too much about the problem, just know what's going on.

X=10; //原子性(简单的读取、将数字赋值给变量) Y = x; //变量之间的相互赋值,不是原子操作
X++; //对变量进行计算操作
X = x+1;

Visibility
After understanding the phenomenon of instruction rearrangement, visibility is easy. Visibility refers to whether other threads can immediately know the modified value when a thread modifies the value of a shared variable. For serial programs, visibility does not exist, because we modify the value of a variable in any operation, and the variable value can be read in subsequent operations, and it is a new modified value.

But this is not necessarily the case in a multi-threaded environment. As we have analyzed before, since the operations of threads on shared variables are copied to their respective working memory and then written back to the main memory, there may be a Thread A modifies the value of the shared variable x, and before writing it back to the main memory, another thread B operates on the same shared variable x in the main memory, but at this time, the shared variable x in the working memory of the thread A will affect the thread B. It is not visible. This synchronization delay between working memory and main memory causes visibility problems. In addition, instruction rearrangement and compiler optimization may also cause visibility problems. Through the previous analysis, we know that whether it is compiler optimization or The rearrangement phenomenon of processor optimization, in a multi-threaded environment, will indeed lead to the problem of program execution in turn, which will also lead to visibility problems.

Orderedness
Orderedness means that for single-threaded execution code, we always think that the execution of the code is executed sequentially. This understanding is not wrong, after all, it is true for single-threaded, but for multi-threaded environment , then there may be out-of-order phenomenon, because the instruction rearrangement phenomenon may occur after the program is compiled into machine code instructions, and the order of the rearranged instructions may not be consistent with the original instructions. It should be understood that in a Java program, if in this In a thread, all operations are regarded as ordered behaviors. If it is a multi-threaded environment, one thread observes another thread, and all operations are out of order. The first half of the sentence refers to the consistent execution of serial semantics in a single thread. The second half of the sentence refers to the phenomenon of instruction rearrangement and synchronization delay between working memory and main memory.

volatile

Atomicity issues
In addition to the atomicity of reading and writing operations on basic data types provided by the JVM itself, atomicity can be achieved through synchronized and Lock. Because synchronized and Lock can guarantee that only one thread accesses the code block at any one time. volatile does not guarantee atomicity

Visibility issues
The volatile keyword guarantees visibility. When a shared variable is modified by volatile, it will ensure that the modified value is immediately seen by other threads, that is, the modified value is immediately updated to the main memory, and when other threads need to read it, it will go to the memory to read the new value. value. Synchronized and Lock can also guarantee visibility, because they can ensure that only one thread can access shared resources at any time, and flush the modified variables to memory before releasing the lock.

The problem of order
In Java, you can use the volatile keyword to ensure a certain "order" (the specific principle is described in the next section about the volatile keyword). In addition, the order can be guaranteed through synchronized and Lock. Obviously, synchronized and Lock guarantee that one thread executes the synchronization code at each moment, which is equivalent to letting the threads execute the synchronization code sequentially, which naturally guarantees the order.
Java Memory Model: Each thread has its own working memory (similar to the previous cache). All operations on variables by threads must be performed in working memory, and cannot directly operate on main memory. And each thread cannot access the working memory of other threads. The Java memory model has some innate "orderness", that is, the orderliness that can be guaranteed without any means. This is usually called the happens-before principle. If the execution order of two operations cannot be deduced from the happens-before principle, then they cannot guarantee their order, and the virtual machine can reorder them at will.
Instruction reordering: The java language specification stipulates that JVM threads maintain sequential semantics. That is, as long as the final result of the program is equal to the result of its serialization, the execution order of the instructions can be inconsistent with the order of the code. This process is called reordering of instructions. What is the meaning of instruction reordering? JVM can properly reorder machine instructions according to processor characteristics (CPU multi-level cache system, multi-core processor, etc.), so that machine instructions can be more in line with CPU execution characteristics and maximize machine performance.

The following figure is a schematic diagram of the instruction sequence from source code to final execution
insert image description here
as-if-serial
as-if-serial semantics means: no matter how reordering (compiler and processor to improve parallelism), the execution of (single-threaded) program The result cannot be changed. The compiler, runtime, and processor must all obey as-if-serial semantics.

In order to comply with as-if-serial semantics, the compiler and processor will not reorder operations with data dependencies, because such reordering will change the execution result. However, operations may be reordered by compilers and processors if no data dependencies exist between them.

The happens-before principle
only relies on the sychronized and volatile keywords to ensure atomicity, visibility, and order, so writing concurrent programs may seem very troublesome. Fortunately, starting from JDK 5, Java uses the new JSR-133 memory The model provides the happens-before principle to assist in ensuring the atomicity, visibility, and order of program execution. It is the basis for judging whether there is competition in data and whether threads are safe. The contents of the happens-before principle are as follows:

  1. The principle of program order, that is, semantic seriality must be guaranteed within a thread, that is to say, it is executed in code order.
  2. Volatile rules The writing of volatile variables occurs first before reading, which ensures the visibility of volatile variables. The simple understanding is that every time a volatile variable is accessed by a thread, it is forced to read the value of the variable from the main memory, and when When the variable changes, it will force the latest value to be refreshed to the main memory. At any time, different threads can always see the latest value of the variable.
  3. Thread start rules The start() method of a thread precedes each of its actions, that is, if thread A modifies the value of the shared variable before executing the start method of thread B, then when thread B executes the start method, thread A will change the value of the shared variable The modification is visible to thread B
  4. Transitive A precedes B, B precedes C, then A must precede C
  5. Thread termination rules All operations of a thread precede the termination of the thread. The function of the Thread.join() method is to wait for the currently executing thread to terminate. Assuming that the shared variable is modified before thread B terminates, after thread A successfully returns from thread B's join method, thread B's modification of the shared variable will be visible to thread A.
  6. Thread Interruption Rules The call to the thread interrupt() method occurs first when the code of the interrupted thread detects the occurrence of an interrupt event, and the Thread.interrupted() method can be used to detect whether the thread is interrupted.
  7. Object finalization rules The constructor of the object is executed, and the end is preceded by the finalize() method
  8. Lock rules The unlocking (unlock) operation must occur before the subsequent locking (lock) of the same lock, that is, if a lock is unlocked, and then the lock is added, then the locking action must be after the unlocking action (same as a lock).

volatile memory semantics
volatile is a lightweight synchronization mechanism provided by the Java virtual machine. The volatile keyword has the following two functions

  • Ensure that the shared variable modified by volatile is visible to the total number of all threads, that is, when a thread modifies
    the value of a shared variable modified by volatile, the new value can always be known immediately by other threads
  • Disable instruction reordering optimizations.

Visibility of volatile
Regarding the visibility of volatile, we must realize that variables modified by volatile are immediately visible to the total number of all threads, and all write operations on volatile variables can always be immediately reflected in other threads

public class VolatileVisibilitySample {
    
    
    private volatile boolean initFlag = false;
    static Object object = new Object();

    public void refresh(){
    
    
        this.initFlag = true; //普通写操作,(volatile写)
        String threadname = Thread.currentThread().getName();
        System.out.println("线程:"+threadname+":修改共享变量initFlag");
    }

    public void load(){
    
    
        String threadname = Thread.currentThread().getName();
        int i = 0;
        while (!initFlag){
    
    
            synchronized (object){
    
    
                i++;
            }
            //i++;
        }
        System.out.println("线程:"+threadname+"当前线程嗅探到initFlag的状态的改变"+i);
    }

    public static void main(String[] args){
    
    
        VolatileVisibilitySample sample = new VolatileVisibilitySample();
        Thread threadA = new Thread(()->{
    
    
            sample.refresh();
        },"threadA");

        Thread threadB = new Thread(()->{
    
    
            sample.load();
        },"threadB");

        threadB.start();
        try {
    
    
             Thread.sleep(2000);
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        threadA.start();
    }

}

Print the result:
insert image description here
threadA changes the value of initFlag, and threadB perceives it immediately

volatile cannot guarantee atomicity

public class VolatileAtomicSample {
    
    

    private volatile static int counter = 0;

    public static void main(String[] args) {
    
    
        for (int i = 0; i < 10; i++) {
    
    
            Thread thread = new Thread(()->{
    
    
                for (int j = 0; j < 1000; j++) {
    
    
                    counter++; //不是一个原子操作,第一轮循环结果是没有刷入主存,这一轮循环已经无效
                    //1 load counter 到工作内存
                    //2 add counter 执行自加
                    //其他的代码段?
                }
            });
            thread.start();
        }

        try {
    
    
            Thread.sleep(1000);
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }

        System.out.println(counter);
    }

}

Print result:
insert image description here
If the atomicity can be guaranteed, the printed result should be 10000, but there is no such thing here, indicating that volatile cannot guarantee atomicity. This involves the contention discarding of the MESI cache consistency protocol, which will be explained in detail later

volatile prohibits rearrangement optimization
Another function of the volatile keyword is to prohibit instruction rearrangement optimization, so as to avoid the phenomenon of out-of-order
execution of programs in a multi-threaded environment. The instruction rearrangement optimization has been analyzed in detail before. Here is a brief explanation of what volatile is How to realize the prohibition of instruction rearrangement optimization. First understand a concept, memory barrier (Memory Barrier).

A memory barrier, also known as a memory barrier, is a CPU instruction that has two functions. One is to ensure the execution order of specific operations, and the other is to ensure the memory visibility of certain variables (using this feature to achieve volatile memory visibility) . Since both compiler and processor can perform instruction rearrangement optimization. If a Memory Barrier is inserted between instructions, it will tell the compiler and CPU that no instruction can be reordered with this Memory Barrier instruction, that is to say, by inserting a memory barrier, instructions before and after the memory barrier are prohibited from performing reordering optimization. Another function of Memory Barrier is to force the cache data of various CPUs to be flushed, so threads on any CPU can read the latest version of these data. In short, volatile variables realize their semantics in memory through memory barriers, that is, visibility and prohibition of rearrangement optimization. Let's look at a very typical example DCL that prohibits rearrangement optimization, as follows:

public class DoubleCheckLock {
    
    

    private static DoubleCheckLock instance;

    public static DoubleCheckLock getInstance() {
    
    
        if (instance == null) {
    
    
            synchronized (Singleton.class) {
    
    
                if (instance == null) {
    
    
                	// 多线程环境下可能会出现问题的地方
                    instance = new DoubleCheckLock();
                }
            }
        }
        return instance;
    }

The above code is a classic singleton double detection code. There is no problem with this code in a single-threaded environment, but thread safety problems may occur in a multi-threaded environment. The reason is that when a certain thread executes the first detection and the read instance is not null, the reference object of the instance may not be initialized.

Because instance = new DoubleCheckLock(); can be divided into the following 3 steps (pseudo code)

memory = allocate();//1.分配对象内存空间 
instance(memory);//2.初始化对象
instance = memory;//3.设置instance指向刚分配的内存地址,此时 instance!=null

Since step 1 and step 2 may be reordered, as follows:

memory=allocate();//1.分配对象内存空间 
instance=memory;//3.设置instance指向刚分配的内存地址,此时instance! =null,但是对象还没有初始化完成!
instance(memory);//2.初始化对象

Since there is no data dependency between step 2 and step 3, and the execution result of the program does not change in a single thread no matter before or after rearrangement, this kind of rearrangement optimization is allowed. However, instruction rearrangement only guarantees the consistency of execution of serial semantics (single thread), but does not care about the semantic consistency between multiple threads. Therefore, when a thread accesses an instance that is not null, since the instance instance may not have been initialized, it also causes a thread safety problem. So how to solve it, it is very simple, we use volatile to prohibit the instance variable from being executed by order rearrangement optimization.

// 禁止指令重排序优化
private volatile static DoubleCheckLock instance;

Implementation of volatile memory semantics
As mentioned earlier, reordering is divided into compiler reordering and processor reordering. To implement volatile memory semantics, the JMM restricts these two types of reordering types separately.
The figure below is the volatile reordering rule table formulated by JMM for the compiler.
insert image description here
For example, the meaning of the last cell in the third line is: In the program, when the first operation is reading or writing of ordinary variables, if the second operation is volatile writing, the compiler cannot reorder the two operation.
It can be seen from the above figure:
When the second operation is volatile write, no matter what the first operation is, it cannot be reordered. This rule ensures that operations before volatile writes are not reordered by the compiler to follow volatile writes.
∙When the first operation is a volatile read, no matter what the second operation is, it cannot be reordered. This rule ensures that operations following a volatile read are not reordered by the compiler to precede a volatile read.
∙ When the first operation is a volatile write and the second operation is a volatile read, reordering is not possible.
In order to achieve the memory semantics of volatile, when the compiler generates bytecode, it inserts memory barriers in the instruction sequence to prohibit certain types of processor reordering. It is nearly impossible for the compiler to find an optimal placement that minimizes the total number of insertion barriers. To this end, JMM adopts a conservative strategy. The following is a JMM memory barrier insertion strategy based on a conservative strategy.
A StoreStore barrier is inserted in front of each volatile write operation.
Insert a StoreLoad barrier after each volatile write operation.
Insert a LoadLoad barrier after each volatile read operation.
Insert a LoadStore barrier after each volatile read operation.
The above memory barrier insertion strategy is very conservative, but it can guarantee that the correct volatile memory semantics can be obtained in any program on any processor platform.

The following is a schematic diagram of the instruction sequence generated after the volatile write is inserted into the memory barrier under the conservative strategy: the
insert image description here
StoreStore barrier in the above figure can ensure that before the volatile write, all the normal write operations before it are already visible to any processor. This is because the StoreStore barrier will ensure that all ordinary writes above are flushed to main memory before volatile writes.

What is more interesting here is that the StoreLoad barrier behind the volatile write. The function of this barrier is to avoid reordering of volatile writes and possible subsequent volatile read/write operations. Because the compiler often cannot accurately determine whether a StoreLoad barrier needs to be inserted after a volatile write (for example, the method returns immediately after a volatile write). In order to ensure the correct implementation of volatile memory semantics, JMM adopts a conservative strategy: insert a StoreLoad barrier after each volatile write, or before each volatile read. From the perspective of overall execution efficiency, JMM finally chose to insert a StoreLoad barrier after each volatile write. Because a common usage pattern for volatile write-read memory semantics is: one writer thread writes a volatile variable, and multiple reader threads read the same volatile variable. When the number of read threads greatly exceeds that of write threads, choosing to insert the StoreLoad barrier after volatile writes will bring considerable improvement in execution efficiency. From here, we can see a feature of JMM implementation: first ensure correctness, and then pursue execution efficiency.

The following figure is a schematic diagram of the instruction sequence generated after the volatile read is inserted into the memory barrier under the conservative strategy. The
insert image description here
LoadLoad barrier in the above figure is used to prevent the processor from reordering the volatile read above and the normal read below. The LoadStore barrier is used to prevent the processor from reordering volatile reads above with ordinary writes below.
The above memory barrier insertion strategy for volatile writes and volatile reads is very conservative. In actual execution, as long as the memory semantics of volatile write-read are not changed, the compiler can omit unnecessary barriers according to specific circumstances. The following specific examples

public class VolatileBarrierExample {
    
    
    int a;
    volatile int m1 = 1;
    volatile int m2 = 2;

    void readAndWrite() {
    
    
        int i = m1;   // 第一个volatile读
        int j = m2;   // 第二个volatile读

        a = i + j;    // 普通写

        m1 = i + 1;   // 第一个volatile写
        m2 = j * 2;   // 第二个 volatile写
    }

}

For the readAndWrite() method, the compiler can perform the following optimizations when generating bytecode.
insert image description here
Note that the final StoreLoad barrier cannot be omitted. Because after the second volatile is written, the method returns immediately. At this time, the compiler may not be able to accurately determine whether there will be volatile reads or writes later. For safety reasons, the compiler usually
inserts a StoreLoad barrier here.

The above optimization is aimed at any processor platform. Since different processors have different "tightness" processor memory models, the insertion of memory barriers can be further optimized according to the specific processor memory model. Taking the X86 processor as an example, except for the last StoreLoad barrier in the figure, other barriers will be omitted.

The volatile read and write under the previous conservative strategy can be optimized as shown in the figure below on the X86 processor platform. As mentioned earlier, the x86 processor only reorders write-read operations. X86 does not reorder read-read, read-write, and write-write operations, so the memory barriers corresponding to these three types of operations are omitted in the X86 processor. In X86, JMM only needs to insert a StoreLoad barrier after volatile write to correctly implement volatile write-read memory semantics. This means that on an x86 processor, volatile writes are much more expensive than volatile reads (because executing the StoreLoad barrier is more expensive).
insert image description here

Guess you like

Origin blog.csdn.net/qq_39513430/article/details/109429689