How much do you know about volatile, which must be asked in interviews?

foreword

The popular keyword volatile in Java is often mentioned in interviews and is often discussed in various technical exchange groups, but it seems that the discussion cannot give a perfect result. , Reorganize from the perspective of compilation.

The two major characteristics of volatile: prohibition of reordering, memory visibility, these two concepts, students who are not very clear can read this article -> java volatile keyword for confusion

The concepts are understood, but still very confusing, how exactly are they implemented?

This article will involve some compilation content. If you read it several times, you should be able to understand it.

reordering

To understand reordering, let's look at a simple piece of code

public class VolatileTest {

    int a = 0;
    int b = 0;

    public void set() {
        a = 1;
        b = 1;
    }

    public void loop() {
        while (b == 0) continue;
        if (a == 1) {
            System.out.println("i'm here");
        } else {
            System.out.println("what's wrong");
        }
    }
}

The VolatileTest class has two methods, set() and loop(). Assuming that thread B executes the loop method and thread A executes the set method, what will be the result?

The answer is not sure, because it involves the reordering of the compiler and the reordering of CPU instructions.

Compiler reordering The
compiler can reorder the bytecode instructions in order to improve the running speed of the program without changing the single-threaded semantics, so the assignment order of a and b in the code may become after compilation. Set b first, then set a.

Because for thread A, which one is set first does not affect its own result.

CPU instruction reordering
What about CPU instruction reordering?
Before going into depth, let's take a look at the cpu cache structure of x86.
write picture description here
1. Various registers are used to store local variables and function parameters. It takes 1 cycle to access once and takes less than 1 ns;
2. L1 Cache, first-level cache, local core cache, divided into 32K data cache L1d and 32k instruction cache L1i , it takes 3cycles to access L1, which takes about 1ns;
3. L2 Cache, L2 cache, local core cache, is designed as a buffer between L1 cache and shared L3 cache, with a size of 256K, access to L2 requires 12cycles, consumes The time is about 3ns;
4. L3 Cache, L3 cache, all cores in the same slot share the L3 cache and are divided into multiple 2M segments. It takes 38cycles to access L3 and takes about 12ns;

Of course, there is also the well-known DRAM, which generally takes 65ns to access memory, so CPU access to memory once is very slow compared to cache.

For CPUs in different sockets, the data of L1 and L2 are not shared. Generally, the consistency of the cache is guaranteed through the MESI protocol, but at a cost.

In the MESI protocol, each Cache line has 4 states, namely:

1. The M (Modified)
line of data is valid, but it has been modified, which is inconsistent with the data in the memory, and the data only exists in this Cache

2. E (Exclusive)
This row of data is valid, consistent with the data in the memory, and the data only exists in this Cache

3. The S (Shared)
line of data is valid and consistent with the data in the memory, and the data is distributed in many Caches

4、I(Invalid)

This row of data is invalid

The Cache controller of each Core not only knows its own read and write operations, but also monitors the read and write operations of other Caches. If there are 4 Cores:
1. Core1 loads the variable X from memory with a value of 10. At this time, the cache in Core1 The state of the cache line of the variable X is E;
2. Core2 also loads the variable X from the memory, and the cache line state of the cache variable X of Core1 and Core2 is converted to S;
3. Core3 also loads the variable X from the memory, Then set X to 20. At this time, the cache line state of the cache variable X in Core3 is converted to M, and the cache lines corresponding to other Cores become I (invalid).

Of course, the internal details of different processors are also different. For example, Intel's core i7 processor uses the MESIF protocol evolved from MESI, and F (Forward) evolves from Share. If a cache line is in the F state, The data can be passed directly to other cores, so there is no entanglement here.

The CPU is blocked during the transition of the cache line state. After a long-term optimization, LoadBuffer and StoreBuffer are added between the register and the L1 cache to reduce the blocking time. LoadBuffer and StoreBuffer are collectively called Memoryordering Buffers (MOB) , the length of the Load buffer is 64, and the length of the store buffer is 36. When the Buffer and L1 transmit data, the CPU does not need to wait.

1. When the CPU executes load to read data, it puts the read request into the LoadBuffer, so that there is no need to wait for other CPUs to respond, perform the following operations first, and then process the result of the read request later.
2. When the CPU executes the store to write data, the data is written to the StoreBuffer, and when a suitable time point is reached, the data of the StoreBuffer is flushed to the main memory.

Because of the existence of StoreBuffer, when the CPU writes data, the real data is not immediately displayed in the memory, so it is invisible to other CPUs; for the same reason, the request in LoadBuffer cannot get the latest data set by other CPUs;

Since StoreBuffer and LoadBuffer are executed asynchronously, from the outside, there is no strict fixed order of whether to write first and then read, or read first and then write.

How memory visibility is implemented

It can be seen from the above analysis that it is actually the asynchrony when the CPU executes the load and store data, which makes the memory between different CPUs invisible. So how can the CPU get the latest data when it is loaded?

Set a volatile variable
Write a simple java code, declare a volatile variable, and assign a value

public class VolatileTest {

    static volatile int i;

    public static void main(String[] args){
        i = 10;
    }
}

This code itself is meaningless. I just want to see what is the difference between the compiled bytecode after adding volatile. After executing javap -verbose VolatileTest, the result is as follows:
write picture description here
I am very disappointed, and I did not find a similar keyword after compiling. The bytecode instructions (monitorenter, monitorexit) and the assignment instruction putstatic after volatile compilation are no different. The only difference is that the modified flags of the variable i have an ACC_VOLATILE flag.

However, I think we can start with this flag. First, search for ACC_VOLATILE globally. When there is no way to start, look at where the keyword is used. Sure enough, we can find a similar name in the accessFlags.hpp file.
write picture description here
Through is_volatile(), you can judge whether a variable is modified by volatile, and then search the place where "is_volatile" is used globally. Finally, in the bytecodeInterpreter.cpp file, find the interpreter implementation of the putstatic bytecode instruction, which has the is_volatile() method. .
write picture description here
Of course, during normal execution, this logic will not be followed, and the machine code instructions corresponding to the bytecode will be directly executed. This code can be used during debugging, but the final logic is the same.

The cache variable is an instance of the variable i in the constant pool cache in the java code. Because the variable i is modified by volatile, cache->is_volatile() is true, and the assignment operation to the variable i is implemented by the release_int_field_put method.

Let's take a look at the
write picture description here
internal is wrapped in a layer, OrderAccess::release_store does magic, allowing other threads to read the latest value of the variable i. write picture description here
Strangely, in the implementation of OrderAccess::release_store, the first parameter is forced to add a volatile, obviously, this is a keyword of c/c++.

The volatile keyword in c/c++ is used to modify variables and is usually used for language-level memory barriers. In "The C++ Programming Language", the description of volatile is as follows:

A volatile specifier is a hint to a compiler that an object may change its value in ways not specified by the language so that aggressive optimizations must be avoided.

Volatile is a type modifier. The variable declared by volatile means that it may change at any time. Each time it is used, it must be read from the memory address corresponding to the variable i. The compiler will no longer optimize the code that operates the variable. The following Write two simple c/c++ codes to verify

#include <iostream>

int foo = 10;
int a = 1;
int main(int argc, const char * argv[]) {
    // insert code here...
    a = 2;
    a = foo + 10;
    int b = a + 20;
    return b;
}

The variable i in the code is actually invalid. The compiled assembly code obtained by executing g++ -S -O2 main.cpp is as follows:
write picture description here
It can be found that in the generated assembly code, some invalid responsible operations for variable a have been optimized out. Now, if you add volatile when declaring variable a

#include <iostream>

int foo = 10;
volatile int a = 1;
int main(int argc, const char * argv[]) {
    // insert code here...
    a = 2;
    a = foo + 10;
    int b = a + 20;
    return b;
}

The assembly code is generated again as follows:
write picture description here
Compared with the first time, there are the following differences:

1. The statement of assigning 2 to variable a has also been retained, although it is an invalid action, so the volatile keyword can prohibit instruction optimization. In fact, it plays the role of a compiler barrier;

Compiler barriers can avoid the problem of out-of-order memory access caused by compiler optimization. You can also manually insert compiler barriers in the code. For example, the following code has the same effect as adding the volatile keyword.

#include <iostream>

int foo = 10;
int a = 1;
int main(int argc, const char * argv[]) {
    // insert code here...
    a = 2;
    __asm__ volatile ("" : : : "memory"); //编译器屏障
    a = foo + 10;
    __asm__ volatile ("" : : : "memory");
    int b = a + 20;
    return b;
}

After compiling, it is similar to the above
write picture description here
2, where _a(%rip) is the address of variable a each time, through movl $2, _a(%rip) can set the memory of variable a to 2, about RIP, you can check under x64 New addressing for PIC: RIP-relative addressing

Therefore, each assignment to the variable a will be written to memory; each time the variable is read, it will be reloaded from memory.

Feeling a little off track, let's go back to the JVM code.
write picture description here
After the assignment operation is performed, OrderAccess::storeload() is executed immediately. What is this?

In fact, this is the memory barrier that is often talked about. Before, I only knew how to read, but I didn't know how to implement it. It has been known from the analysis of the CPU cache structure: a load operation needs to enter the LoadBuffer, and then go to the memory to load; a store operation needs to enter the StoreBuffer, and then write to the cache, these two operations are asynchronous, which will lead to incorrect instructions Reordering, so a series of memory barriers are defined in the JVM to specify the execution order of instructions.

The memory barriers defined in the JVM are as follows, the implementation
write picture description here
of JDK1.7 1. loadload barriers (load1, loadload, load2)
2. loadstore barriers (load, loadstore, store)

Both of these barriers are implemented by the acquire() method
write picture description here
where asm , which represents the beginning of the assembly code.
volatile, as analyzed before, prevents the compiler from optimizing the code.
After compiling this instruction, I found that I didn't understand it.... The last "memory" is the function of the compiler barrier.

Insert the barrier in LoadBuffer, clear the load operation before the barrier, and then perform the operation after the barrier, which can ensure that the data of the load operation is ready before the next store instruction

3. The storestore barrier (store1, storestore, store2)
is implemented by the "release()" method:
write picture description here
insert the barrier into the StoreBuffer, clear the store operation before the barrier, and then execute the store operation after the barrier to ensure that the data written by store1 is in the Visible to other CPUs when executing store2.

4. After the storeload barrier (store, storeload, load)
assigns a volatile variable in java, this barrier is inserted, which is implemented by the "fence()" method:
write picture description here
are you excited to see this?

Use os::is_MP() to determine whether it is multi-core. If there is only one CPU, these problems do not exist.

The storeload barrier is completely implemented by the following instructions

__asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");

In order to test the use of these instructions, let's write some c++ code and compile it

#include <iostream>

int foo = 10;

int main(int argc, const char * argv[]) {
    // insert code here...
    volatile int a = foo + 10;
    // __asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");
    volatile int b = foo + 20;

    return 0;
}

In order for the variables a and b not to be optimized by the compiler, volatile is used here for decoration. The compiled assembly instructions are as follows:
write picture description here
From the compiled code, it can be found that when the foo variable is used for the second time, it is not reloaded from memory and used the value of the register.

Add the asm volatile * instruction and then recompile
write picture description here
. Compared with before, there are two more instructions here, one lock and one addl.
The function of the lock instruction is: when the instruction following the lock is executed, the LOCK# signal of the processor will be set (this signal will lock the bus, preventing other CPUs from accessing memory through the bus until the execution of these instructions ends), the execution of this instruction becomes Atomic operation, the previous read and write requests cannot be reordered beyond the lock instruction, which is equivalent to a memory barrier.

There is one more: when the foo variable is used for the second time, it is reloaded from memory to ensure that the latest value of the foo variable can be obtained. This is achieved by the following instructions

__asm__ volatile ( : : : "cc", "memory");

It is also a compiler barrier that informs the compiler to regenerate the load instruction (which cannot be fetched from the cache register).

Reading volatile variables
Also in the bytecodeInterpreter.cpp file, find the interpreter implementation of the getstatic bytecode instruction.
write picture description here
Get the variable value through obj->obj_field_acquire(field_offset)
write picture description here
and finally realize it through OrderAccess::load_acquire

inline jint OrderAccess::load_acquire(volatile jint* p) { return *p; }

The bottom layer is based on C++'s volatile implementation. Because volatile comes with the function of compiler barrier, it can always get the latest value in memory.

1. Those with 1-5 work experience, do not know where to start in the face of the current popular technology,

Need to break through the technical bottleneck.

2. After staying in the company for a long time, I lived very comfortably, but the interview hit a wall when I changed jobs. Those who need to study in a short period of time and change jobs to get high salaries.

3. If you have no work experience, but the foundation is very solid, you are proficient in the working mechanism of java, using design thinking, and using the common java development framework.

4. I feel that I am very good, and I can handle general needs. However, the knowledge points learned are not systematic, and it is difficult to continue to break through in the technical field.

  1. Group number: Advanced Architecture Group 697579751 Note good information!

6. Ali Java senior Daniel live broadcast to explain knowledge points, share knowledge,
write picture description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326292296&siteId=291194637
Recommended