5 cases and flow charts help you understand the volatile keyword from 0 to 1

[Yuan Chuang Conference Preview] 1024 Programmers’ Day (two days before), meet at the Open Source China office, let’s talk about AI! >>>

volatile

With the improvement of hardware, the number of cores of the machine has changed from single core to multi-core. In order to improve the utilization of the machine, concurrent programming has become more and more important and has become a top priority at work and in interviews. In order to better understand and use concurrent programming, you should build your own Java concurrent programming knowledge system.

This article will focus on the volatile keyword in Java, describing atomicity, visibility, ordering, the role of volatile, implementation principles, usage scenarios, and related issues such as JMM and pseudo-sharing in a simple and easy-to-understand way.

In order to better describe volatile, let’s first talk about its prerequisite knowledge: orderliness, visibility, and atomicity.

Orderliness

What is orderliness?

When we use high-level languages and simple syntax for programming, we ultimately have to translate the language into instructions that the CPU understands.

Since it is the CPU that does the work, in order to speed up CPU utilization, the instructions for our process control will be reordered.

In the Java memory model, the instruction reordering rules need to satisfy the happens-before rule. For example, the startup of a thread must occur before other operations of the thread. The instructions that start the thread cannot be reordered to the thread. behind the tasks performed

That is to say, in the Java memory model, instruction reordering will not affect the single-thread execution process we have specified , but in the case of multi-threading, the execution process of each thread cannot be estimated.

For a more appropriate description, look at the following piece of code

    static int a, b, x, y;

    public static void main(String[] args){
        long count = 0;
        while (true) {
            count++;
            a = 0;b = 0;x = 0;y = 0;
            Thread thread1 = new Thread(() -> {
                a = 1;
                x = b;
            });
            Thread thread2 = new Thread(() -> {
                b = 1;
                y = a;
            });
            thread1.start();
            thread2.start();

            try {
                thread1.join();
                thread2.join();
            } catch (Exception e) {}

            if (x == 0 && y == 0) {
                break;
            }
        }
        //count=118960,x=0,y=0
        System.out.println("count=" + count + ",x=" + x + ",y=" + y);
    }

Initialize the four variables a, b, x, and y to 0

According to our thinking, the execution order is

//线程1
a = 1;
x = b;

//线程2
b = 1;
y = a;

However, after the instructions are reordered, four situations may occur:

//线程1
//1           2           3           4     
a = 1;      a = 1;      x = b;      x = b;        
x = b;      x = b;      a = 1;      a = 1;  

//线程2
//1           2           3           4 
b = 1;      y = a;      b = 1;      y = a;
y = a;      b = 1;      y = a;      b = 1;

When the fourth situation occurs, both x and y may be 0

So how can we ensure orderliness?

Use volatile to modify variables to ensure orderliness

In order to improve CPU utilization, instructions will be reordered. Reordering can only ensure the running logic of the process under a single thread.

The order of execution cannot be predicted under multi-threads, and orderliness cannot be guaranteed. If you want to ensure orderliness under multi-threads, you can use volatile. Volatile will use memory barriers to prohibit instruction reordering to achieve orderliness.

You can also lock directly to ensure synchronous execution.

At the same time, you can use Unsafethe memory barrier of the class in the concurrent package to prohibit reordering.

//线程1
a = 1;
unsafe.fullFence();
x = b;

//线程2
b = 1;   
unsafe.fullFence();
y = a;

visibility

What is visibility?

In the Java memory model, each thread has its own working memory and main memory. When reading data, it needs to be copied from the main memory to the working memory. When modifying the data, it is only modified in its own working memory. If multiple threads If a certain data is manipulated at the same time and the modification is not written back to the main memory, other threads will not be able to detect the data changes.

For example, in the following piece of code, the created thread will keep looping because it cannot sense that other threads modify variables.

    //nonVolatileNumber 是未被volatile修饰的
    new Thread(() -> {
        while (nonVolatileNumber == 0) {
    
        }
    }).start();
    
    TimeUnit.SECONDS.sleep(1);
    nonVolatileNumber = 100;

So how do you make this variable visible?

You can use volatile modification on this variable to ensure visibility, or you can lock it synchronized. After locking, it is equivalent to re-reading the data in the main memory.

atomicity

What is atomicity?

In fact, it is whether one or a group of operations can be completed at the same time. If not, then they must all fail, rather than some of them succeeding and some of them failing.

The atomicity of read and load (above) instructions in the Java memory model is implemented by the virtual machine

To use the auto-increment of a variable, it actually needs to be read from the main memory first, then modified and finally written back to the main memory.

So can volatile guarantee atomicity?

We use two threads to increment the same variable modified with volatile ten thousand times.

        private volatile int num = 0;
    
        public static void main(String[] args) throws InterruptedException {
            C_VolatileAndAtomic test = new C_VolatileAndAtomic();
            Thread t1 = new Thread(() -> {
                forAdd(test);
            });
    
            Thread t2 = new Thread(() -> {
                forAdd(test);
            });
    
            t1.start();
            t2.start();
    
            t1.join();
            t2.join();
    
            //13710
            System.out.println(test.num);
        }
    
        /**
         * 循环自增一万次
         *
         * @param test
         */
        private static void forAdd(C_VolatileAndAtomic test) {
            for (int i = 0; i < 10000; i++) {
                test.num++;
            }
        }

Unfortunately, the result is not 20000, indicating that volatile-modified variables cannot guarantee their atomicity.

So what method can ensure atomicity?

The synchronized locking method can ensure atomicity, because only it can access the lock at the same time.

Using atomic classes, the underlying method of using CAS can also ensure atomicity. What is CAS? We’ll talk about it in subsequent articles

volatile principle

After describing and testing ordering, visibility, and atomicity, we can know that volatile can guarantee ordering and visibility, but it cannot guarantee atomicity .

So how does the volatile bottom layer achieve orderliness and visibility?

The JVM will add volatile access flags to variables modified with volatile , and will use the operating system's memory barrier to prohibit reordering of instructions when bytecode instructions are run.

The commonly used universal memory barrier is storeload, store1 storeload load2, which prohibits write instructions from being rearranged below the barrier, and prohibits read instructions from being rearranged above the barrier. That is, the memory written back by the store is visible (perceivable) to other processors. Subsequent load reads will be read from memory.

The volatile assembly instruction implementation is actually a lock prefix instruction

The lock prefix instruction has no impact on a single core, because a single core can guarantee orderliness, visibility, and atomicity.

The lock prefix instruction will write the data back to the memory when modifying it under multi-core. Writing back to the memory needs to ensure that only one processor operates at the same time. This can be done by locking the bus, but other processors cannot access it.

In order to improve the concurrency granularity, the processor supports cache locking (only locking cache lines), and uses the cache consistency protocol to ensure that the same cache line data cannot be modified at the same time.

After writing back to the memory, sniffing technology is used to allow other processors to sense data changes and re-read the memory before subsequent use.

False sharing problem

Since each read is an operation on a cache line, if multiple threads frequently modify two variables in the same cache line, will it cause other processors that use the cache line to always need to read data again? ?

This is actually the so-called pseudo-sharing problem

For example, two variables i1 and i2 are in the same cache line. Processor 1 frequently writes to i1, and processor 2 frequently writes to i2. Both i1 and i2 are modified by volatile, which also causes i1 to be modified. When, processor 2 senses that the cache line is dirty, so it needs to re-read the memory to obtain the latest cache line, but such performance overhead does not make any sense for processor 2 to write to i2.

A common way to solve the false sharing problem is to add enough fields between these two fields so that they are not on the same cache line, which will also lead to wasted space.

In order to solve the problem of false sharing, JDK also provides @sun.misc.Contendedannotations to help us fill in fields.

The following code makes two threads loop 1 billion times to perform self-increment. When a false sharing problem occurs, it takes more than 30 seconds. When a false sharing problem does not occur, it takes only a few seconds.

It should be noted that @sun.misc.Contendedwhen using annotations, you need to carry JVM parameters-XX:-RestrictContended

        @sun.misc.Contended
        private volatile int i1 = 0;
        @sun.misc.Contended
        private volatile int i2 = 0;
        
        public static void main(String[] args) throws InterruptedException {
            D_VolatileAndFalseSharding test = new D_VolatileAndFalseSharding();
            int count = 1_000_000_000;
            Thread t1 = new Thread(() -> {
                for (int i = 0; i < count; i++) {
                    test.i1++;
                }
            });
    
            Thread t2 = new Thread(() -> {
                for (int i = 0; i < count; i++) {
                    test.i2++;
                }
            });
    
            long start = System.currentTimeMillis();
    
            t1.start();
            t2.start();
    
            t1.join();
            t2.join();
    
            //31910 i1:1000000000 i2:1000000000
    
            //使用@sun.misc.Contended解决伪共享问题  需要携带JVM参数 -XX:-RestrictContended
            //5961 i1:1000000000 i2:1000000000
            System.out.println((System.currentTimeMillis() - start) + " i1:"+ test.i1 + " i2:"+ test.i2);
        }

volatile usage scenarios

Volatile prohibits instruction reordering through memory barriers to ensure visibility and orderliness.

Based on the characteristics of visibility, volatile is very suitable for use in reading scenarios in concurrent programming, because volatile guarantees visibility, and with read operations without locking , the overhead is very small

For example: the synchronization status status of the AQS queue in the concurrent package will be modified with volatile

Write operations often require locking synchronization guarantees

Based on the characteristics of orderliness, volatile can prohibit the reordering of object creation instructions when double-checking locks, thereby preventing other threads from acquiring objects that have not yet been initialized.

Creating an object can be divided into three steps:

//1.分配内存
//2.初始化对象
//3.将对象指向分配的空间

Since steps 2 and 3 both depend on step 1, step 1 cannot be reordered, and steps 2 and 3 have no dependencies, so reordering may result in pointing the object to the allocated space first and then initializing it.

If in the double detection lock at this time, a thread happens to judge that it is not empty and uses this object, but it has not been initialized at this time, there may be a problem that the obtained object has not been initialized yet.

Therefore, the correct double-check lock needs to add volatile to prohibit reordering.

        private static volatile Singleton singleton;
    
        public static Singleton getSingleton(){
            if (Objects.isNull(singleton)){
                //有可能很多线程阻塞到拿锁,拿完锁再判断一次
                synchronized (Singleton.class){
                    if (Objects.isNull(singleton)){
                        singleton = new Singleton();
                    }
                }
            }
    
            return singleton;
        }

Summarize

This article focuses on the volatile keyword to describe orderliness, visibility, atomicity, JMM, volatile principles, usage scenarios, pseudo-sharing issues, etc.

In order to improve CPU utilization, instructions will be reordered. The reordering will not affect the pointing process under single thread, but the execution process under multi-thread cannot be predicted.

In the Java memory model, each thread has its own working memory. Reading data needs to be read from the main memory, and modifying the data needs to be written back to the main memory; in concurrent programming, when other threads cannot sense that the variable has been modified, If you continue to use it, you may make an error

Volatile prohibits instruction reordering through memory barriers to achieve orderliness and visibility, but cannot satisfy atomicity.

The underlying assembly of volatile is implemented using the lock prefix instruction. When modifying data under multi-core, the bus will be locked to write the data back to the memory. Due to the high cost of locking the bus, the cache line will be locked later, and the cache consistency protocol will be used to ensure that only one process can be processed at the same time. The processor modifies the same cache line, and uses sniffing technology to allow other processors that own the cache line to perceive that the cache line is dirty and subsequently re-read it.

If the variables frequently written by multiple threads are in the same cache line, false sharing problems will occur. At this time, you need to fill the fields so that they are not in the same cache line.

Based on the characteristics of visibility, volatile is often used to implement lock-free read operations in concurrent programming; based on the characteristics of orderliness, it can ensure that the obtained instance is not uninitialized in double detection locks.

Finally (don’t do it for free, just press three times in a row to beg for help~)

This article is included in the column " From Point to Line, and from Line to Surface" to build a Java concurrent programming knowledge system in simple terms . Interested students can continue to pay attention.

The notes and cases of this article have been included in gitee-StudyJava and github-StudyJava . Interested students can continue to pay attention under stat~

Case address:

Gitee-JavaConcurrentProgramming/src/main/java/A_volatile

Github-JavaConcurrentProgramming/src/main/java/A_volatile

If you have any questions, you can discuss them in the comment area. If you think Cai Cai’s writing is good, you can like, follow, and collect it to support it~