Concurrent Programming Topic 02-Implementation Principles of Concurrent Programming

Preface

In this section, we have an in-depth understanding of the principle of multithreading in Java. There are 5 sub-sections in this topic, namely:

Highlights of this section:

  1. Java Memory Model (Java Memory Model, JMM)

  2. How JMM solves the problems of atomicity, visibility, and order

  3. synchronized和volatile

Java memory model

The memory model defines the specification of the read and write operation behavior of multithreaded programs in the shared memory system to shield the memory access differences of various hardware and operating systems, so that Java programs can achieve consistent memory access effects on all platforms. The main goal of the Java memory model is to define the access rules of each variable in the program, that is, store variables in the virtual machine and remove variables from the memory (variables here refer to shared variables, that is, instance objects, static Fields, array objects, and other variables stored in the heap memory. For local variables, they are private to the thread and will not be shared). These rules are used to regulate the read and write operations to the memory to ensure the correctness of the instruction execution.

It is related to the processor, to the cache, to concurrency, and to the compiler. He solved the memory access problems caused by CPU multi-level cache, processor optimization, instruction rearrangement, etc., and guaranteed visibility, atomicity and order in concurrent scenarios. The memory model mainly uses two ways to solve concurrency problems: limit processor optimization and use memory barriers

The Java memory model defines the interaction between threads and memory. In the JMM abstract model, it is divided into main memory and working memory. The main memory is shared by all threads, and the working memory is unique to each thread. All operations (reading, assignment) of variables by threads must be performed in working memory, and variables in main memory cannot be directly read and written. Moreover, different threads cannot access the variables in each other's working memory. The transmission of variable values ​​between threads needs to be completed through the main memory. The interaction between the three is as follows:

Insert picture description here

Therefore, in general, JMM is a specification. The purpose is to solve the inconsistency of local memory data when multiple threads communicate through shared memory, the compiler will reorder code instructions, and the processor will execute the code out of order. And so on. The purpose is to ensure atomicity, visibility and order in concurrent programming scenarios.

Here is an article recommended: What is the Java memory model?

Extended reading: In- depth understanding of JVM memory model (JMM) and garbage collector (GC)

How JMM solves the problems of atomicity, visibility, and order

In fact, thread safety issues can be summarized as: visibility, atomicity, orderliness, these issues, we understand these issues and know how to solve them, then multi-threaded safety issues are not a problem

In the Java 1.8 source code HotSpot, we can see the clue, similar to solving the problem of atomicity, visibility, and ordering of shared variables between the CPU cache and main memory. In the Java world, JMM (Java Memory Model) The model is used to maintain issues such as multi-thread safety.

A series of keywords related to concurrent processing are provided in Java, such as volatile, Synchronized, final, juc, etc. These are the keywords provided to developers after the Java memory model encapsulates the underlying implementation. In the development of multithreaded code At the time, we can directly use keywords such as synchronized to control concurrency, so that we don’t need to care about the underlying compiler optimization and cache consistency issues. So in the Java memory model, in addition to defining a set of specifications, it also provides The open instructions are encapsulated at the bottom layer and provided to developers.

  • Atomic guarantee

Two advanced bytecode instructions monitor and monitorexit are provided in java. Synchronized in Java is corresponding to ensure that the operations in the code block are atomic.

  • Visibility Guarantee

The volatile keyword in Java provides a function, that is, the modified variable can be synchronized to the main memory immediately after being modified, and the modified variable is refreshed from the main memory every time before being used. Therefore, volatile can be used to ensure the visibility of variables during multithreaded operations.

In addition to volatile, the two keywords synchronized and final in Java can also achieve visibility.

  • Orderly guarantee

In Java, synchronized and volatile can be used to ensure the orderliness of operations between multiple threads. The implementation method is different:

The volatile keyword prohibits instruction rearrangement.
The synchronized keyword guarantees that only one thread operation is allowed at the same time.

How volatile guarantees visibility

Download the hsdis tool, https://sourceforge.net/projects/fcml/files/fcml-1.1.1/hsdis-1.1.1-win32-amd64.zip/download

After decompression, it is stored in the server path of the jre directory

Then run the main function, before running the main function, add the following virtual machine parameters:

-server -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly-XX:CompileCommand=compileonly,*App.getInstance (replace with the actual running code)

Shared variables modified by volatile variables will have an additional assembly instruction with the lock prefix when writing operations. This instruction was mentioned when we explained the CPU cache. It will trigger a bus lock or a cache lock through cache consistency. Agreement to solve the visibility problem.

For a write operation to a variable declared volatile, the JVM will send a Lock prefix instruction to the processor to write the data of the cache line where the variable is located back to the system memory, and then according to the previously mentioned MESI cache consistency A sexual agreement to ensure the consistency of data in each cache under multiple CPUs.

Volatile prevents instruction reordering

The purpose of instruction rearrangement is to maximize CPU utilization and performance. The out-of-order execution optimization of CPU does not affect the correctness in the single-core era, but in the multi-core era, multithreading can achieve true parallelism on different cores. , Once data is shared between threads, some unexpected problems may occur.

The principle that instruction reordering must follow is that it does not affect the final result of code execution. The compiler and processor will not change the execution order of two operations that have data dependencies. Instructions executed in the processor and operations executed in a single thread.).

This semantics is actually the as-if-serial semantics. No matter how the reordering is done, the execution result of a single-threaded program will not change, and the compiler and processor must comply with the as-if-serial semantics.

The impact of instruction rearrangement under multi-core and multi-threading

public class ThreadTest {
    
    

    private static int x = 0, y = 0;

    private static int a = 0, b = 0;

    public static void main(String[] args) throws InterruptedException {
    
    
        Thread t1 = new Thread(() -> {
    
    
        a = 1;
        x = b;
         });
        Thread t2 = new Thread(() -> {
    
    
            b = 1;
            y = a;
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println("x=" + x + "->y=" + y);
    }
}

If the compiler reordering and cache visibility are not considered, the possible results of the above code are x=0,y=1; x=1,y=0; x=1,y=1 these three results, Because t1/t2 may be executed successively, or the other way round, or t1/t2 may be executed alternately, but the execution result of this code may also be x=0, y=0. This is a result of out-of-order execution, because there is no data dependency between the two lines of code inside thread t1, so x=b can be out-of-order before a=1; at the same time in thread t2 y=a can also be executed earlier than a=1 in t1, so their execution order may be

t1: x = b
t2: b = 1
t2: y = a
t1: a = 1

So from the above example, reordering will cause visibility problems. But the seriousness of the problems caused by reordering is far greater than visibility, because not all instructions are simple read or write, such as the partial initialization of DCL. So it is not enough to simply solve the visibility problem, and the processor reordering problem needs to be solved.

Memory barrier

Memory barriers need to solve the two problems we mentioned earlier. One is the optimization disorder of the compiler and the execution disorder of the CPU. We can use the two mechanisms of optimization barriers and memory barriers to solve them.

From the CPU level to understand what is a memory barrier

The nature of the out-of-order execution of the CPU is that on a multi-CPU machine, each CPU has a cache. When a specific data is acquired by a specific CPU for the first time, since it does not exist in the CPU cache, it will It can be obtained from the memory, and can be quickly accessed from the cache after being loaded into the CPU cache. When a CPU performs a write operation, it must ensure that other CPUs have removed this data from their cache so that other CPUs can safely modify the data. Obviously, when there are multiple caches, we must use a cache consistency protocol to avoid the problem of data inconsistency, and this communication process may cause out-of-order access problems, that is, out-of-order access to memory at runtime.

The current CPU architecture provides the memory barrier function. In the x86 cpu, the corresponding memory barrier is implemented. The write barrier, load barrier, and full barrier are implemented. The main function is

  • Prevent reordering between instructions

  • Ensure data visibility

  1. store barrier

The store barrier is called a write barrier, which is equivalent to a storestore barrier. It forces all executions before the storestore memory barrier to be executed before the memory barrier and sends a cache invalid signal. All store instructions after the storestore barrier instruction must be executed after the instructions before the storestore barrier have been executed. That is to say, the instructions before and after the write barrier are reordered. Yes, all memory updates that occurred before the store barrier are visible (visible here means that the modified value is visible and the operation result is visible)

Insert picture description here

  1. load barrier

The load barrier is called the read barrier, which is equivalent to the load barrier. It forces all load instructions after the load barrier to be executed after the load barrier. That is, the binary system reorders the load instructions before and after the load barrier read barrier, and cooperates with the store barrier, so that all memory updates that occur before the store barrier are visible to the load operation after the load barrier

Insert picture description here

  1. Full barrier

A full barrier becomes a full barrier, which is equivalent to a storeload, and is an all-round barrier because it has the effects of the previous two barriers. All store/load instructions before the storeload barrier are forced to be executed before the barrier, and all store/load instructions after the barrier are executed after the barrier. It is forbidden to reorder the instructions before and after the storeload barrier.

Insert picture description here

Summary: The memory barrier only solves the problem of sequential consistency, not the problem of cache consistency. Cache consistency is accomplished by the cpu's cache lock and the MESI protocol. The cache consistency protocol only cares about cache consistency, not sequential consistency. So these are two questions

How to solve the problem of instruction reordering at the compiler level

At the compiler level, the volatile keyword cancels the cache and reordering at the compiler level. Ensure that the instructions before the optimization barrier when compiling the program will not be executed after the optimization barrier. This ensures that the optimization during compilation will not affect the actual code logic sequence.

If the hardware architecture itself has guaranteed memory visibility, then volatile is an empty tag and will not insert a semantic memory barrier. If the hardware architecture itself does not perform processor reordering and has stronger reordering semantics, then volatile is an empty tag and will not insert a memory barrier with related semantics.

In JMM, memory barrier instructions are divided into 4 categories, and specific types of processors are reordered by using different memory barriers under different semantics to ensure memory visibility

LoadLoad Barriers, load1; LoadLoad; load2, to ensure that the loading of load1 data takes precedence over the loading of load2 and all subsequent load instructions

StoreStore Barriers, store1; storestore;store2, to ensure that store1 data is visible to other processors prior to store2 and all subsequent storage instructions

LoadStore Barries, load1;loadstore;store2, to ensure that load1 data loading takes precedence over store2 and subsequent storage instructions are flushed to the memory

StoreLoad Barries, store1; storeload; load2, to ensure that store1 data becomes visible to other processors, prior to loading of load2 and all subsequent load instructions; this memory barrier instruction is an all-purpose barrier, in front of the CPU level memory It was mentioned in the barrier. It also has the effect of the other 3 barriers

Why volatile cannot guarantee atomicity

public class Demo {
    
    

    volatile int i;

    public void incr(){
    
    
        i++;
    }

    public static void main(String[] args) {
    
    
        new Demo().incr();
    }
}

After running, use javap -c Demo.class to view the bytecode

Insert picture description here
For an atomic increment operation, there are three steps:

  1. Read the value of volatile variable to local;
  2. Increase the value of the variable;
  3. Write the local value back to make it visible to other threads.

Write at the end

Demonstration code address of this section:

https://github.com/harrypottry/ThreadDemo

For more architectural knowledge, please pay attention to this series of articles : The growth path of Java architects

Guess you like

Origin blog.csdn.net/qq_34361283/article/details/109542894