Detailed explanation of volatile principle

Volatile is a lightweight synchronization mechanism provided by the Java virtual machine

The volatile keyword has the following two functions

  • Ensure that the shared variable modified by volatile is visible to the total number of all threads, that is, when a thread modifies the value of a shared variable modified by volatile, the new value can always be immediately known by other threads.

  • Prohibit instruction reordering optimization.

Visibility of volatile

public class VolatileVisibilitySample {
    
    
    private boolean initFlag = false;
    static Object object = new Object();

    public void refresh(){
    
    
        this.initFlag = true; //普通写操作,(volatile写)
        String threadname = Thread.currentThread().getName();
        System.out.println("线程:"+threadname+":修改共享变量initFlag");
    }

    public void load(){
    
    
        String threadname = Thread.currentThread().getName();
        int i = 0;
        while (!initFlag){
    
    
            synchronized (object){
    
    
                i++;
            }
            //i++;
        }
        System.out.println("线程:"+threadname+"当前线程嗅探到initFlag的状态的改变"+i);
    }

    public static void main(String[] args){
    
    
        VolatileVisibilitySample sample = new VolatileVisibilitySample();
        Thread threadA = new Thread(()->{
    
    
            sample.refresh();
        },"threadA");

        Thread threadB = new Thread(()->{
    
    
            sample.load();
        },"threadB");

        threadB.start();
        try {
    
    
             Thread.sleep(2000);
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        threadA.start();
    }

}

After thread A changes the initFlag attribute, thread B immediately perceives

Volatile cannot guarantee atomicity

public class VolatileAtomicSample {
    
    

    private static volatile int counter = 0;

    public static void main(String[] args) {
    
    
        for (int i = 0; i < 10; i++) {
    
    
            Thread thread = new Thread(()->{
    
    
                for (int j = 0; j < 1000; j++) {
    
    
                    counter++; //不是一个原子操作,第一轮循环结果是没有刷入主存,这一轮循环已经无效
                    //1 load counter 到工作内存
                    //2 add counter 执行自加
                    //其他的代码段?
                }
            });
            thread.start();
        }

        try {
    
    
            Thread.sleep(1000);
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }

        System.out.println(counter);
    }
}

You can execute it, the final result will not be 10000

Volatile prohibits rearrangement optimization

Order rearrangement

Reordering refers to a means by which the compiler and processor reorder the instruction sequence in order to optimize the performance of the program. The Java language specification stipulates that sequential semantics are maintained within the JVM thread. That is, as long as the final result of the program is equal to the result of its sequentialization, then the execution order of the instructions can be inconsistent with the code order. This process is called instruction reordering. What is the meaning of instruction reordering? JVM can appropriately reorder machine instructions according to the processor characteristics (CPU multi-level cache system, multi-core processor, etc.), so that the machine instructions can be more in line with the execution characteristics of the CPU, and the machine performance can be maximized.

Insert picture description here
There are two main stages of instruction rearrangement:

1. Compiler compilation stage: the compiler will rearrange instructions when the class file is loaded and compiled into machine code

2. CPU execution stage: When the CPU executes assembly instructions, the instructions may be reordered

public class VolatileReOrderSample {
    
    
    private static int x = 0, y = 0;
    private static int a = 0, b =0;
    static Object object = new Object();

    public static void main(String[] args) throws InterruptedException {
    
    
        int i = 0;

        for (;;){
    
    
            i++;
            x = 0; y = 0;
            a = 0; b = 0;
            Thread t1 = new Thread(new Runnable() {
    
    
                public void run() {
    
    
                    //由于线程one先启动,下面这句话让它等一等线程two. 读着可根据自己电脑的实际性能适当调整等待时间.
                    shortWait(10000);
                    a = 1; //是读还是写?store,volatile写
                    //storeload ,读写屏障,不允许volatile写与第二部volatile读发生重排
                    //手动加内存屏障
                    //UnsafeInstance.reflectGetUnsafe().storeFence();
                    x = b; // 读还是写?读写都有,先读volatile,写普通变量
                    //分两步进行,第一步先volatile读,第二步再普通写
                }
            });
            Thread t2 = new Thread(new Runnable() {
    
    
                public void run() {
    
    
                    b = 1;
                    //手动增加内存屏障
                    //UnsafeInstance.reflectGetUnsafe().storeFence();
                    y = a;
                }
            });
            t1.start();
            t2.start();
            t1.join();
            t2.join();
            String result = "第" + i + "次 (" + x + "," + y + ")";
            if(x == 0 && y == 0) {
    
    
                System.err.println(result);
                break;
            } else {
    
    
                System.out.println(result);
            }
        }
    }
    public static void shortWait(long interval){
    
    
        long start = System.nanoTime();
        long end;
        do{
    
    
            end = System.nanoTime();
        }while(start + interval >= end);
    }
}

The final execution result may be 0 0, which is caused by the execution of reordering, because reordering in a single thread will not affect the execution result as-if-serial, but it is not necessarily in a multi-thread.

as-if-serial

    public static void main(String[] args) {
    
    
        /**
         * as-if-serial语义的意思是:不管怎么重排序(编译器和处理器为了提高并行度),(单线程)
         * 程序的执行结果不能被改变。编译器、runtime和处理器都必须遵守as-if-serial语义。
         *
         * 以下例子当中1、2步存在指令重排行为,但是1、2不能与第三步指令重排
         * 也就是第3步不可能先于1、2步执行,否则将改变程序的执行结果
         */
        double p = 3.14; //1
        double r = 1.0; //2
        double area = p * r * r; //3计算面积
    }

public class DoubleCheckLock {
    
    

	private static DoubleCheckLock instance;
	
	private DoubleCheckLock(){
    
    }
	
	public static DoubleCheckLock getInstance(){
    
    
		//第一次检测
		if (instance==null){
    
    
			//同步
			synchronized (DoubleCheckLock.class){
    
    
				if (instance == null){
    
    
					//多线程环境下可能会出现问题的地方
					instance = new DoubleCheckLock();
				}
			}
		}
		return instance;
	}
}

The above code is a classic singleton double detection code. This code has no problem in a single-threaded environment, but thread safety problems can occur in a multi-threaded environment. The reason is that when a thread executes to the first detection, when the instance read is not null, the instance reference object may not be initialized. Because instance = new DoubleCheckLock(); can be divided into the following 3 steps to complete (pseudo code)

memory = allocate();//1.分配对象内存空间
instance(memory);//2.初始化对象
instance = memory;//3.设置instance指向刚分配的内存地址,此时
instance!=null

As there may be reordering between step 1 and step 2, as follows:

memory=allocate();//1.分配对象内存空间
instance=memory;//3.设置instance指向刚分配的内存地址,此时instance!
=null,但是对象还没有初始化完成!
instance(memory);//2.初始化对象

Since there is no data dependency between steps 2 and 3, and the execution result of the program does not change in a single thread before or after the rearrangement, this rearrangement optimization is allowed. However, instruction rearrangement will only ensure the consistency of the execution of the serial semantics (single thread), but does not care about the semantic consistency between multiple threads. Therefore, when a thread accesses instance is not null, because the instance instance may not have been initialized, thread safety issues are caused. So how to solve it, it is very simple, we can use volatile to prohibit the instance variable from being executed and reorder and optimize the instruction.

//禁止指令重排优化
private volatile static DoubleCheckLock instance;

Memory barrier

Memory barrier (Memory Barrier), also known as memory barrier, is a CPU instruction. It has two functions. One is to ensure the execution order of specific operations, and the other is to ensure the memory visibility of certain variables (using this feature to achieve volatile memory Visibility). Because both the compiler and the processor can perform instruction rearrangement optimization. If a Memory Barrier is inserted between instructions, it will tell the compiler and CPU that no instructions can be reordered with this Memory Barrier instruction, that is to say, by inserting a memory barrier, instructions before and after the memory barrier are prohibited from performing reordering optimization. Another function of Memory Barrier is to force the cache data of various CPUs to be flushed out, so any thread on the CPU can read the latest version of these data. In short, volatile variables achieve their semantics in memory through the memory barrier (lock instruction), visibility and optimization of the prohibition of rearrangement.

The figure below is a table of volatile reordering rules formulated by JMM for the compiler.

Insert picture description here
For example, the last cell in the third row means: In the program, when the first operation is the reading or writing of ordinary variables, if the second operation is volatile writing, the compiler cannot reorder the two Operations.
As can be seen from the figure above:

  • When the second operation is a volatile write, no matter what the first operation is, it cannot be reordered. This rule ensures that operations before volatile writes will not be reordered by the compiler to after volatile writes.
  • When the first operation is a volatile read, no matter what the second operation is, it cannot be reordered. This rule ensures that operations after volatile read will not be reordered by the compiler to before volatile read.
  • When the first operation is a volatile write and the second operation is a volatile read, it cannot be reordered.

In order to realize the memory semantics of volatile, the compiler inserts a memory barrier in the instruction sequence to prohibit specific types of processor reordering when generating bytecode. For the compiler, it is almost impossible to find an optimal arrangement to minimize the total number of insertion barriers. For this reason, JMM adopts a conservative strategy. The following is a JMM memory barrier insertion strategy based on a conservative strategy.

  • Insert a StoreStore barrier before each volatile write operation.
  • Insert a StoreLoad barrier after each volatile write operation.
  • Insert a LoadLoad barrier after each volatile read operation.
  • Insert a LoadStore barrier after each volatile read operation.

The following is a schematic diagram
Insert picture description here
of the instruction sequence generated after a volatile write is inserted into the memory barrier under a conservative strategy. The following figure is a schematic diagram of the instruction sequence generated after a volatile read is inserted into the memory barrier under a conservative strategy.
Insert picture description here
Code example

public class VolatileBarrierExample {
    
    
    int a;
    volatile int m1 = 1;
    volatile int m2 = 2;

    void readAndWrite() {
    
    
        int i = m1;   // 第一个volatile读
        int j = m2;   // 第二个volatile读

        a = i + j;    // 普通写

        m1 = i + 1;   // 第一个volatile写
        m2 = j * 2;   // 第二个 volatile写
    }

}

Insert picture description here
Note that the last StoreLoad barrier cannot be omitted. Because after the second volatile is written, the method returns immediately. At this time, the compiler may not be able to accurately determine whether there will be a volatile read or write later. For safety reasons, the compiler usually inserts a StoreLoad barrier here.

The above optimization is for any processor platform. Since different processors have different "tightness" processor memory models, the insertion of memory barriers can also be optimized according to the specific processor memory model. Take the X86 processor as an example, except for the last StoreLoad barrier in Figure 3-21, other barriers will be omitted. The volatile read and write under the previous conservative strategy can be optimized as shown in the figure below on the X86 processor platform. As mentioned earlier, the X86 processor only reorders write-read operations. X86 does not reorder read-read, read-write, and write-write operations, so the memory barriers corresponding to these three types of operations will be omitted in the X86 processor. In X86, JMM only needs to insert a StoreLoad barrier after volatile write to correctly implement volatile write-read memory semantics. This means that in X86 processors, the cost of volatile writes will be much larger than that of volatile reads (because the execution of the StoreLoad barrier will be more expensive)
Insert picture description here

The underlying principle of volatile

Variables modified by the volatile keyword can guarantee visibility and order, but cannot guarantee atomicity. Let's take a look at the singleton mode of double check and lock. This global variable must be volatile. Let's print out the assembly instructions and see what the volatile keyword does.

How to print out assembly instructions

  • -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -Xcomp
  • Hsdis plugin
public class Singleton {
    
    
    private volatile static Singleton myinstance;
 
    public static Singleton getInstance() {
    
    
        if (myinstance == null) {
    
    
            synchronized (Singleton.class) {
    
    
                if (myinstance == null) {
    
    
                    myinstance = new Singleton();//对象创建过程,本质可以分文三步
                }
            }
        }
        return myinstance;
    }
 
    public static void main(String[] args) {
    
    
        Singleton.getInstance();
    }
}
0x00000000038064dd: mov    %r10d,0x68(%rsi)
0x00000000038064e1: shr    $0x9,%rsi
0x00000000038064e5: movabs $0xf1d8000,%rax
0x00000000038064ef: movb   $0x0,(%rsi,%rax,1)  ;*putstatic myinstance
                                                ; - com.it.edu.jmm.Singleton::getInstance@24 (line 22)

0x0000000003cd6edd: mov    %r10d,0x68(%rsi)
0x0000000003cd6ee1: shr    $0x9,%rsi
0x0000000003cd6ee5: movabs $0xf698000,%rax
0x0000000003cd6eef: movb   $0x0,(%rsi,%rax,1)
0x0000000003cd6ef3: lock addl $0x0,(%rsp)     ;*putstatic myinstance
                                                ; - com.it.edu.jmm.Singleton::getInstance@24 (line 22)

Through comparison, it is found that the key change is the variable with volatile modification. After the assignment (movb $0x0, (%rsi,%rax,1) is the assignment operation), one more "lock addl $0x0,(%rsp )" operation, this operation is equivalent to a memory barrier.

The key here is the lock prefix. Its function is to write the cache of this processor into the memory. This write action will also cause other processors or other cores to invalidate its cache (Invalidate, the I state of the MESI protocol). , This operation is equivalent to doing the "store and write" operations mentioned in the previous introduction to the Java memory model on the variables in the cache. Therefore, through such an operation, the modification of the previous volatile variable can be immediately visible to other processors. The lower-level implementation of the lock instruction: if the cache line is supported, a cache lock (MESI) will be added; if the cache lock is not supported, a bus lock will be added.

Manually add memory barrier

public class UnsafeInstance {
    
    

    public static Unsafe reflectGetUnsafe() {
    
    
        try {
    
    
            Field field = Unsafe.class.getDeclaredField("theUnsafe");
            field.setAccessible(true);
            return (Unsafe) field.get(null);
        } catch (Exception e) {
    
    
            e.printStackTrace();
        }
        return null;
    }
}
UnsafeInstance.reflectGetUnsafe().loadFence();//读屏障

UnsafeInstance.reflectGetUnsafe().storeFence();//写屏障

UnsafeInstance.reflectGetUnsafe().fullFence();//读写屏障

Guess you like

Origin blog.csdn.net/qq_37904966/article/details/112548739