Detailed explanation of the volatile keyword in java concurrency

What is the volatile keyword

The volatile keyword is used to modify variables. Variables modified by this keyword can ensure visibility and order.
But it cannot achieve atomicity.
You can think of it as a weakened, lightweight Synchronized keyword.
Used for synchronization.

Let's start with the description of the three characteristics mentioned above.

Three characteristics

Visibility, atomicity, and order are the basis of the entire java concurrency.

Visibility: That is, when a thread modifies the value of a shared variable, after this operation, other threads read the variable and read the modified new data instead of the old data.
Atomicity: An operation is indivisible and cannot be interrupted. It will either be executed or not executed. It is impossible to say that I will stop there halfway through execution.
Orderliness: For example, our code is executed in relative order. The code in the previous line is executed first, and the code in the next line is executed later. Why do you say that? In a single-threaded environment, it can indeed be seen as sequential, but not necessarily from a multi-threaded perspective. The compiler and CPU will reorder the code or instructions under the premise of ensuring the correct single-threaded result for execution efficiency. For single-threaded, it does not affect, but for multi-threaded, this is a problem.

These three characteristics can be said to be the problems we hope to solve in java concurrency.
In response to this, JMM (Java Memory Model) is built around these three characteristics.

Next, we will introduce JMM

java memory model

At the bottom of our hardware, the cpu needs to interact with the main memory (memory).
But we know that the registers in the CPU are fast, but the speed of the main memory is too slow compared to the registers.
If the cpu directly interacts with the memory, it would be a waste of time. The register capacity is small and the cost is too high.
So the bottom layer uses a cache to connect the register (cpu) with the main memory. As a speed lies between the two, the price is also acceptable. As a cache for both.
Now the basic bottom layer adopts this model. But different CPUs have different models. If you map them directly to programmers, it will be too troublesome. There are too many scenarios for programmers to consider.
So java builds its own memory model to encapsulate these models, and customizes a default unchanging logic to provide programmers. This is the Java Memory Model (JMM).

That is, we understand the underlying memory situation, we only need to consider the java memory model, we don't need to consider what kind of cpu or mess.
Insert picture description here
The schematic diagram of JMM is shown above.
The model here is a logical concept, not necessarily real. As programmers, we don’t need to consider whether it really exists.

Each thread has a private worker thread, and the worker thread is connected to the main memory.
The main memory is shared by all threads, and what is stored is shared variables (almost all instance variables, static variables, Class objects, etc.)

Thread read operation: first flush the shared variables in the main memory to its own local memory, and then read from the local memory
Thread write operation: first write the data to the local memory, and then flush the data to the main memory

It should be noted here that no matter whether it is reading or writing, it is not atomic, they are the coincidence operation of two separate operations.
All data reading and writing of the thread must be performed in the local memory, and the main memory cannot be directly manipulated.

JMM problem

This memory model realizes the cache, so that the throughput of the cpu is as large as possible, and there is no need to wait for the slow memory. It has advantages and naturally disadvantages.
Separating read and write operations will cause thread safety issues.
For example, the following example:

	static int i =0；
	两个线程同时执行以下代码，结果会怎样呢？
	线程A执行将i=2;
	线程B读取i值;

Thread A runs first, and thread B runs later. Is the i read by thread B 2?
Not necessarily, it may be 0. When
thread A runs and puts the i=2 variable in local memory, thread B flushes the i in the main memory to its own local memory. At this time, the i in the main memory is 0, and then the thread A's local memory flushes i to main memory.
So, the final result is that i is 2 in main memory, and the result read by thread B is 0. Although according to logic A runs first and B runs later, the result should be 2. But this is not the case, so this is a problem caused by JMM. (The value of i in the cache of thread AB is different here, so the problem of cache consistency is designed)

Instruction reordering

In addition to the above problems, it also faces the problem of instruction reordering (code, which can also be understood as bytecode instructions, which are almost the same).

First of all, why is the reordering?
For jvm and compile time, the current code order is not necessarily more efficient, so in order to pursue efficiency, the order needs to be disrupted.
Especially in a concurrent environment, reordering becomes more important.

Take such an example.
Modern CPUs usually use pipeline technology.
Because multiple instructions need to be executed, each instruction can also be decomposed into different steps. The registers (not as resource) used in each step are different. If only one part of an instruction is executed at the same time, except for the resources it occupies , Other resources are wasted.
Therefore, we use pipeline technology, such as executing part a of instruction 1 at the first moment, executing part b of instruction 2 at the same time, and executing part d of instruction 3 at the same time. At the same time, multiple instructions run at the same time, which is much more efficient.
At the same time, if for an instruction, if the order of two of the steps can be reversed, then we don’t have to wait for step 3 in the order of step 2 (this will block), I can choose to execute step 2 or step first according to the best. 3. In this way, reordering instructions will make it more efficient.

Much like this example, our code reordering is the same here, for efficiency.

E.g:

int i = 0;//1
int j = 1;//2
int a = i+j;//3

For example, in the following example, do we have to execute it in the order of 123?
Not necessarily, if it is faster, we can use the 213 order, and the result will not change at this time.

Then you may have to ask, why can't 312 here?
Very simple, because 3 depends on 12, 12 must be executed before 3, and the relative order of 12 does not matter.
We can easily see, but how to determine the JVM?

Determined by the defined happens before rules.

Happens Before rules

This is a pre-defined rule and cannot be violated when the jvm is optimized (reordered).

Principle of program order: In a thread, in accordance with the program code sequence, the operation written in the front occurs before the operation written in the back.
Volatile rules: The writing of volatile variables occurs before the reading, which ensures the visibility of volatile variables.
Locking rules: unlock (unlock) must happen before the subsequent lock (lock).
Transitivity: A precedes B, and B precedes C, so A must precede C.
The thread's start method precedes every action it takes.
All operations of the thread precede the termination of the thread.
The interruption of the thread (interrupt()) precedes the interrupted code.
The object's constructor ends before the finalize method.

The first rule The program order rule says that in a thread, all operations are in order, but in JMM, as long as the execution result is the same, reordering is allowed. The emphasis of happens-before here is also single The correctness of the thread execution results, but there is no guarantee that the same is true for multithreading. The second rule monitor rule is actually easy to understand, that is, before locking, make sure that the lock has been released before you can continue to lock. The third rule applies to the volatile discussed. If one thread writes a variable first and another thread reads it, then the write operation must precede the read operation. The fourth rule is the transitivity of happens-before. The next few articles will not be repeated one by one.

In a single thread, reordering does not matter, because the actual result remains the same, but with multiple threads, the problem is big.

For example, the following question:

int a = 0;
bool flag = false;

public void write() {
    
    
   a = 2;              //1
   flag = true;        //2
}

public void multiply() {
    
    
   if (flag) {
    
             //3
       int ret = a * a;//4
   }
}

Thread A executes the writer method first, and thread B executes the multiply method afterwards.
Is the result 4, not necessarily, if reordering is carried out, as shown below

Here 1, 2 is the program sequence rule, which can be reordered.
3 and 4 are interdependent, so 3 happens-before 4, which cannot be rearranged, and the problem is re-excluded.
In the above case, the result of ret is 0. It is not the same as we expected.
So there is a problem here, we clearly hope that the result is 4 in the code.

The volatile keyword comes out

In order to solve the above problem, the protagonist appears.

First of all, for the first question:
set the static variable i to volatile

	static volatile int i =0；
	两个线程同时执行以下代码，结果会怎样呢？
	线程A执行将i=2;
	线程B读取i值;

At this time, the read/write operations are all atomic, so thread A writes first, (is about to write to the working memory, and the two steps of refreshing the working memory to the main memory are combined into one, and refreshed immediately after writing to the working memory. )
Then the B thread reads, the same is true if it is obtained, it is also obtained directly after refreshing.
The final result is 2.

For the second question:

int a = 0;
volatile bool flag = false;

public void write() {
    
    
   a = 2;              //1
   flag = true;        //2
}

public void multiply() {
    
    
   if (flag) {
    
             //3
       int ret = a * a;//4
   }
}

Only the flag is defined as volatile here.
Look at the hb sequence at this time.
In the write method, here is an explanation: The
volatile keyword prohibits reordering . The so-called prohibition is that the ordinary variable operation before the code must occur before itself and cannot be rearranged behind itself. Similarly, the latter cannot go to the front. The volatile keyword here is equivalent to a barrier, which separates the upper and lower regions of your own. It is not my business to rearrange your own regions, but you can't mess around across regions. (In fact, memory barriers are also used. For example, all write operations before this barrier are flushed to main memory.)
Therefore, this limits the order of 1-2.
At the same time, because volatile write hb is read, 2-3
simultaneously 3 and 4 There are dependencies, so 3-4

Therefore, the order of 1-2-3-4 was finally realized, as we wished.

Here is the realization of visibility and order.

What about atomicity?
Didn't the volatile we mentioned earlier realize the atomicity of read/write, then why is it not atomic?

The atomicity we are talking about here is about i++, which modifies a value based on the original value.
This is not a simple reading or writing.
The logic is:

	先读
	修改
	写

It is a compound operation. Volatile cannot achieve the atomicity of this compound operation, so there is no way.

Imagine that for i=0;
thread A executes i++; thread B also executes i++;
what might happen

	线程A		线程B
	读			
				读
	修改	
				修改
	写			
				写

According to the above sequence, i will not be 2 but 1, when thread A gets i=0, then thread B will also read 0. When thread A has finished writing, thread B has already finished reading, so it It is also 1 when it is written. So it is different from what we expected.
Volatile is unable to solve this problem.
(Ps: here can be considered to solve through CAS, or lock)

to sum up

volatile achieved

Visibility: The two steps of writing to the working memory and refreshing the working memory to the main memory are combined into one. The main memory is refreshed to the working memory, and the value obtained by the CPU from the working memory is also combined into one, so the volatile variable is made The read/write is atomic, so it can be guaranteed to be visible.
The operation to realize the two-in-one operation is the lock prefix in the assembly, so that the cache of the current cpu is flushed into the memory, and at the same time, other caches are invalidated, so that other cpus must re-acquire the cache. (In other words, refresh immediately after writing, and refresh and read immediately when reading)
Orderliness: The volatile keyword prohibits instruction reordering, and memory barriers are used. It doesn't matter how your code is rearranged before and after the barrier, but the front cannot be followed, and the last cannot be used. Semantically, writes before the memory barrier must be flushed into the memory, so that the reads after the memory barrier can obtain the results of the previous writes. (So the memory barrier will reduce the new performance, resulting in the inability to optimize the code)

Not implemented:

Atomicity: Only the atomicity of a single operation is realized. For example, i++ reads first, modifies later, and writes last. It is a compound operation, so atomicity is not guaranteed

Reference

Concurrent programming Java memory model + volatile keyword + HappenBefore rule
thread safety (on) -thoroughly understand the volatile keyword
Java interviewer favorite volatile keyword