Multithreading from entry to advanced (9)--Java memory model and volatile

. One, Java memory model

1.1 Computer memory model

An instruction is executed by the CPU. The instruction is first obtained from the main memory, and then the instruction is executed. In this process, data may be read and written, but the CPU executes instructions at a much faster rate than reads and writes data from the main memory. This causes a waste of CPU resources. In order to improve the efficiency of the CPU, a high-speed buffer memory called Cache is added between the CPU and the main memory. Then the relationship between the CPU, the Cache and the main memory is as follows

image-20201216110214967

Then, the execution process of the program is as follows:

  • First copy the data from the main memory to the high-speed buffer of the CPU
  • When the CPU performs calculations, it directly flushes the Cache to read and write data
  • When the operation is over, update the data in the Cache to the main memory
1.1.1 Cache inconsistency problem

The hierarchical structure composed of CPU, Cache and main memory, although it solves the problem of the inconsistency of the execution rate of CPU and main memory, it also brings new problems:

  • Single-core CPU, single-threaded, there is no problem, the cache of the cpu core is only accessed by one thread. The cache is exclusive, and there will be no problems such as access conflicts.
  • Single-core CPU, multi-threaded, after the CPU loads a block of memory into the cache, different threads will map to the same cache location when they access the same physical address, so that even if the thread is switched, the cache will not be invalidated . But since only one thread is executing at any time, there will be no cache access conflict
  • Multi-core CPU, multi-thread, each core has at least one level one cache (Cache can be multiple). Multiple threads access a shared memory in the process, and these multiple threads are executed on different cores, each core will keep a shared memory buffer in its own caehe . Since multiple cores can be parallelized, multiple threads may write to their respective caches at the same time, and the data between the respective caches may be different.

However, modern computers are multi-core CPUs, so there is a problem of inconsistent caches. For details, see the following explanation:

There are two threads as follows, respectively performing i++ operations. The initial value of i is 0. According to normal conditions, the final result of two threads executing i++ should be 2. However, due to the problem of inconsistent caches, there are two situations as follows:

image-20201216125613954

  • Thread 1 first reads i=0 from the main memory to the Cache of thread 1, and then the CPU completes the operation, writes i=1 to the main memory, and then thread 2 reads i= from the main memory 1 to the Cache of thread 2, and then the CPU completes the calculation, and writes i=2 into the main memory through the Cache, which is the desired result
  • Thread 1 first reads i=0 from the main memory to the Cache of thread 1. Because it is multi-core, thread 2 also reads i=0 into its own Cache. Finally, after thread 1 and thread 2 finish their calculations , Also write i=1 back to the memory, the final result is i=1, there is a problem of inconsistent cache

There are probably two ways to solve the problem of inconsistent cache in the computer memory model:

  • By adding a bus LOCK#lock by: Since for the CPU and other components are performed through the communication bus, if the bus plus LOCK#lock it, that is to say blocking other CPU accesses (such as memory) to other members, so that only There is a CPU that can use the memory of this variable. A LCOK#lock signal is issued on the bus , so only after this code is completely executed, other CPUs can read variables from its memory, and then perform corresponding operations.

  • Through the Cache Coherence Protocol (Cache Coherence Protocol): Cache Coherence Protocol (Cache Coherence Protocol), the most famous is Intel's MESI protocol, the MESI protocol ensures that the copies of shared variables used in each cache are consistent.

    The core idea of ​​MESI is: When the CPU writes data, if the operating variable is found to be a shared variable, that is, a copy of the variable exists in other CPUs, it will send a signal to notify other CPUs to invalidate the cache line of the variable. , So when other CPUs need to read this variable and find that the cache line that caches the variable in their cache is invalid, then it will reread it from the memory.

1.1.2 Processor reordering problem

In a multithreaded scenario, there is also a hardware problem: in order to make full use of the arithmetic unit inside the processor, the processor may perform out-of-order execution processing on the input code. This is processor optimization.

image-20201216130856531

Here processor A and processor B can simultaneously write the shared variable into their own write buffer (A1, B1), then read another shared variable (A2, B2) from the memory, and finally write themselves to the buffer area The saved dirty data is flushed to the memory (A3, B3). When executed in this sequence, the program can get the result of x=y=0.

image-20201216130959551

Judging from the order in which the memory operations actually occur, until processor A executes A3 to refresh its own write buffer area, the write operation A1 is actually executed. Although the order of memory operations performed by processor A is: A1→A2, the actual order of memory operations is A2→A1. At this time, the memory operation sequence of processor A is reordered (the situation of processor B is the same as that of processor A, so I won't repeat it here).

1.2 Java memory model

The Java Memory Model (JMM) is the Java programming language specification for the computer memory model. Using the Java memory model can ensure atomicity, visibility and order in multi-threaded scenarios.

JMM stipulates that all shared variables (local variables will not be shared between threads and are not affected by JMM) are stored in main memory.Each thread has its own workspaceAll operations of threads on shared variables must be performed in the workspaceA copy of the shared variable of the main memory is stored in the workspaceDifferent threads cannot directly access each other's workspaceMessage passing between threads is also done through main memory

image-20201216132902677

Regulations on synchronization in JMM:

  • Before the thread is unlocked, the value of the shared variable must be flushed back to the main memory
  • Before the thread locks, it must read the latest value of the main memory to its own working memory
  • Locking and unlocking are the same lock

JMM must ensure atomicity, visibility, and order

Two, volatile keyword

Volatile is a lightweight synchronization mechanism provided by the Java virtual machine. It has the following three characteristics:

  • Ensure visibility
  • No guarantee of atomicity
  • Disable instruction reordering

2.1 Ensure visibility

Let's take a look at an example without visibility:

//资源类
class MyData{
    
    
    int num = 0;
    public void addTo60(){
    
    
        num = 60;
    }
}

public class VolatileDemo {
    
    
    public static void main(String[] args) {
    
    
        MyData myData = new MyData();
        new Thread(() -> {
    
    
            System.out.println(Thread.currentThread().getName() + "\t come in");
            //注意这里必须要休眠
			try {
    
     TimeUnit.SECONDS.sleep(3);} catch (InterruptedException e) {
    
    e.printStackTrace();}
            myData.addTo60();
            System.out.println(Thread.currentThread().getName() + "\t update number value");
        },"A").start();

        while(myData.num == 0){
    
    

        }
        System.out.println(Thread.currentThread().getName() + "\t主线程结束");
    }

}

image-20201217100151736

It has been blocked, which means that the main thread has been stuck in the while loop, as shown in the figure below:

image-20201217100254605

Explain why there is a block of code in the A thread?

If the A thread is not blocked, because it only executes a line of num=60 code, the execution speed is very fast. When the execution is completed, the main thread obtains the modified value of the A thread from the main memory, so that the visibility is not visible. The effect of

As can be seen from the above code, if the volatile keyword is not added, visibility is not guaranteed, and then execute the following effect of adding the volatile keyword

class MyData{
    
    
    //加入volatile关键字
    volatile int num = 0;
    public void addTo60(){
    
    
        num = 60;
    }
}

public class VolatileDemo {
    
    
    public static void main(String[] args) {
    
    
        MyData myData = new MyData();
        new Thread(() -> {
    
    
            System.out.println(Thread.currentThread().getName() + "\t come in");
            try {
    
     TimeUnit.SECONDS.sleep(3);} catch (InterruptedException e) {
    
    e.printStackTrace();}
            myData.addTo60();
            System.out.println(Thread.currentThread().getName() + "\t update number value");
        },"A").start();

        while(myData.num == 0){
    
    

        }
        System.out.println(Thread.currentThread().getName() + "\t主线程结束");
    }

}

image-20201217100643270

After adding volatile to the num variable, the execution diagram should be as follows

[External link image transfer failed. The origin site may have an anti-hotlinking mechanism. It is recommended to save the image and upload it directly (img-MPzPRjCX-1615980357906) (Multi-threaded from entry to advanced (9))-Java memory model and volatile.assets/ image-20201217100754170.png)

2.2 Atomicity is not guaranteed

Atomicity: indivisible, completeness, that is, when a thread is executing a certain business, it cannot be blocked or divided in the middle, and either complete at the same time or fail at the same time

Volatile does not guarantee atomicity, and can be verified with the following code:

class MyData2{
    
    
    volatile int num = 0;
    public  void add(){
    
    
        num++;
    }
}

public class VolatileDemo2 {
    
    
    public static void main(String[] args) {
    
    
        MyData2 myData = new MyData2();
        for(int i = 1;i <= 20;i++){
    
    
            new Thread(() -> {
    
    
                for (int j = 0; j < 2000; j++) {
    
    
                    myData.add();
                }
            },String.valueOf(i)).start();
        }
        while (Thread.activeCount() > 2){
    
    
            Thread.yield();
        }

        System.out.println(myData.num);
    }
}

The above code creates 20 threads, and each thread executes the add method 2000 times. If volatile can guarantee atomicity, the final result is 40,000

image-20201217102032487

The answer is wrong. It can be verified that volatile does not guarantee atomicity. Analyze the above code

image-20201217102920860

When two threads write to the main memory at the same time, because of scheduling problems, only one thread can enter the main memory to write, then when other threads wake up and write to the main thread, it is no longer the latest data, which causes Write overwrite, resulting in the final result is not 40000

How to deal with it?

  1. Locking, adding sync lock is sure to solve the atomicity problem
  2. Use the AtomicInteger atomic class under JUC
class MyData2{
    
    
    volatile int num = 0;
    AtomicInteger atomicInteger = new AtomicInteger();
    public void add(){
    
    
        atomicInteger.getAndIncrement();
    }
}

public class VolatileDemo2 {
    
    
    public static void main(String[] args) {
    
    
        MyData2 myData = new MyData2();
        for(int i = 1;i <= 20;i++){
    
    
            new Thread(() -> {
    
    
                for (int j = 0; j < 2000; j++) {
    
    
                    myData.add();
                }
            },String.valueOf(i)).start();
        }
        while (Thread.activeCount() > 2){
    
    
            Thread.yield();
        }
        System.out.println(myData.atomicInteger);
    }
}

image-20201217103417255

2.3 Prohibition of order rearrangement

In order to improve performance during program execution, compilers and processors often reorder instructions. There are 3 types of reordering.

  • Compiler optimized reordering. The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.

  • Instruction-level parallel reordering. Modern processors use instruction-level parallelism (Instruction-Level Parallelism, ILP) to overlap multiple instructions. If there is no data dependency, the processor can change the execution order of the statements corresponding to the machine instructions.

  • Reordering of the memory system. Because the processor uses caches and read/write buffers, this makes load and store operations appear to be performed out of order.

image-20201218094649160

The processor must consider the difference between instructions when reorderingData dependence,E.g:

int x = 11;     //①
int y = 12;     //②
x = x + 5;      //③
y = x * x;      //④

When the compiler is optimizing, statement ③ must be after statement ① because of data dependence.

Volatile prohibited instruction rearrangement optimization, So as to avoid the phenomenon of out-of-order execution of the program in the multi-threaded environment, the main principle is as follows:

Memory barrier (Memory Barrier), also known as memory barrier, is a CPU instruction, which has two functions:

  • Ensure the order of execution of specific operations,

  • Ensure memory visibility of certain variables (using this feature to achieve volatile memory visibility).

Because both the compiler and the processor can perform instruction rearrangement optimization. If a MemoryBarrier is inserted between instructions, it will tell the compiler and CPU that no instructions can be reordered with this Memory Barrier instruction, that is to say, by inserting a memory barrier, the instructions before and after the memory barrier are prohibited from performing reordering optimization. Another function of the memory barrier is to force the cache data of various CPUs to be flushed out, so any thread on the CPU can read the latest version of these data.

2.4 volatile in singleton mode

The DCL of the singleton mode is written as follows, double verification

public class SingletonDemo {
    
    
    private static SingletonDemo instance = null;
    private SingletonDemo(){
    
    }

    public SingletonDemo getInstance(){
    
    
        if(instance == null){
    
    
            synchronized (SingletonDemo.class){
    
    
                if(instance == null){
    
    
                    return new SingletonDemo();
                }
            }
        }
        return instance;
    }
}

It is possible that the above code runs 1 million times, and there will only be one error. The reason is that the underlying instruction of the compiler is rearranged. When an object is in the new process, there are about four processes:

1. See if the class object is loaded, if not, load the class object first;

2. Allocate memory space and initialize the instance;

3. Call the constructor;

4. Return the address to the reference

If in a certain execution, because of the data dependence of the above four steps, it is possible to rearrange the instructions and disrupt the order of ② and ④, which means that the address reference is returned first, and the memory space is allocated later. . For example, there are two threads A and B. First, A thread performs the new operation, but because the instruction rearrangement returns the address reference first, it is unfortunate that the A thread is just suspended. At this time, the B thread obtains CPU resources, == find instance It is not equal to null, so it directly returns a reference that has not allocated memory space. == Causes thread B to use a variable that has not yet been initialized.

, There are about 4 processes below:

1. See if the class object is loaded, if not, load the class object first;

2. Allocate memory space and initialize the instance;

3. Call the constructor;

4. Return the address to the reference

If in a certain execution, because of the data dependence of the above four steps, it is possible to rearrange the instructions and disrupt the order of ② and ④, which means that the address reference is returned first, and the memory space is allocated later. . For example, there are two threads A and B. First, A thread performs the new operation, but because the instruction rearrangement returns the address reference first, it is unfortunate that the A thread is just suspended. At this time, the B thread obtains CPU resources, == find instance It is not equal to null, so it directly returns a reference that has not allocated memory space. == Causes thread B to use a variable that has not yet been initialized.

Therefore, it is necessary to add volatile to the instance to prohibit instruction reordering.

Guess you like

Origin blog.csdn.net/weixin_44706647/article/details/114945378