[Java from the beginning to the end] No6. Multi-threading and high concurrency review

Basic review→Multi-threading and high concurrency
——————————————————————————————————————
The main content comes from three parts:
One: Tencent Classroom teacher Ma Bingbing ’s open class
Two: “Coding to Efficiency: Java Development Manual”
Three: My own experience summary

1.What is a thread?

Now after installing QQ on your computer, their installation files are a bunch of static entities , or static data and script files. They represent programs , which are just an ordered collection of a set of instructions. They have no Any running means that
there is a QQ startup program program app → QQ.exe
. When you click to start this program, QQ running → process, a process will start. The process is a dynamic entity and it has its own life cycle. It is generated due to creation, run due to scheduling, placed in a waiting state due to waiting for resources or events, and canceled due to completion of tasks. It reflects the entire dynamic process of a program running on a certain data set.
Processes and threads are basic units for program execution scheduled by the operating system . The system uses this basic unit to achieve concurrency in system applications. A thread is the smallest execution unit of a process, and a thread is also called a lightweight process.
The running of the program needs to be loaded into the memory before it can interact with the CPU. However, since the current processing speed of the CPU is much greater than the memory, the CPU has the ability to allocate multiple time slices per unit time to execute different programs. Although it Only one task can be executed in a certain time slice, but the computer always gives us the illusion that multiple programs are running in parallel. This is the result of the CPU quickly switching between the execution of multiple programs. This is the CPU allocating execution power to multiple programs . A reflection of ensuring operational efficiency on a single process .
A process is to a thread, just like a CPU is to a process. It is a manifestation of the process allocating execution and processing resources to multiple threads to ensure program running efficiency .
I think it’s easier to understand this way. To sum it up, this is:After a program is started, there will be at least one process. After a process is started, there will be at least one thread in it . The purpose of this one-to-many situation is to modify the number of the one with more in a certain situation, so that the CPU can execute in a relatively full task state and maximize efficiency.
————————————————————————————————
Attached is a complex version of the explanation: the differences and connections between programs, processes, and threads

2. How to create and start threads

There are many ways to start a thread:

  1. Inherit the Thread class and override the run method.
    Insert image description here
    When starting this thread, just add new to it and call start.
    Insert image description here

2.Implement the Runnable interface and still override the run method.
Insert image description here
When starting this thread, you need to pass it as a parameter into the Thread class and then call the start method. Compared with the first method, this method is recommended because it is easier to expand, more flexible, and exposes fewer details to the outside world. You can focus on implementing the run() method. (The first one often does not comply with the Liskov substitution principle)
Insert image description here
Regarding this thread creation method for implementing interfaces, there is a convenient way to write it after JDK1.8, which is to use anonymous inner classes.
Insert image description here

  1. Create threads using Callable and Future, implement the Callable interface, rewrite its call() method, and use the Future interface implementation class FutureTask (this implementation class implements both the Future interface and the Runnable interface, so it can be used as the target of the Thread class) to build a thread.
FutureTask<Integer> future = new FutureTask<Integer>(
    (Callable<Integer>)()->{
    
    
      return 5;
    }
);
new Thread(task,"返回值=5").start();

There are two differences between Callable and Runnable:
1. The call() method has a return value . Whether you inherit from Thread or implement the Runnable interface, one drawback is that after the task thread is completed, the result cannot be obtained directly, and you need to use Shared variables can be obtained, and Callable and Future solve this problem very well.
2. The call() method can throw exceptions . Runnable can only catch child thread exceptions in the main thread through setDefaultUncaughtExceptionHandler().
————————————————————————————————————
The last thing I want to emphasize is that some people call getting threads from the thread pool There is nothing wrong with creating a thread.

3. Basic methods of threads

  • sleep(): The current thread is paused for a period of time (parameter) to let other threads run.
    Insert image description here
  • yield(): The current thread exits the CPU execution and returns to the waiting queue, so that other threads in the waiting queue may obtain execution rights. Of course, it is also possible that the current thread that has just yield() continues to obtain the execution rights of the CPU.
    Insert image description here
  • join(): Add another thread to the execution of the current thread. For example, if t1.join() is used in the thread of t2, then the t2 thread will stop when the t2 thread executes t1.join() and then execute the t1 thread. When the t1 thread finishes executing, it comes back and continues to execute the t2 thread, which is equivalent to adding the t1 thread to the t2 thread. The join() method is often used to wait for the end of a thread's execution.
    Insert image description here
  • getState(): Get the current thread status.
    Insert image description here

4. Thread life cycle

A thread can have its own operation stack, program counter, local variable table and other resources. It shares all resources of the process with other threads in the same process. Threads exist in multiple states during their life cycle. As shown in the figure,
there are five states: NEW (new state), RUNNABLE (ready state), RUNNING (running state), BLOCKED (blocked state), and DEAD (terminated state).
Insert image description here
1.NEWThat is, the newly created state is a state in which the thread is created but not started.
2.RUNABLEThat is, the ready state is the state after calling the start() method and before running. In the waiting queue, in addition, the thread's start() method cannot be called multiple times, otherwise an IllegalStateException exception will be thrown.
3.RUNNINGThat is, the running state is the state of the thread when the run () method is being executed. A thread may exit the RUNNING state due to certain factors, such as time, exceptions, locks, scheduling, etc.
4.BLOCKED, that is, the blocking state. There are the following situations when entering this state.

  • Synchronous blocking locks are occupied by other threads.
  • Active blocking, calling certain methods of Thread, actively giving up CPU execution rights, such as sleep(), join(), etc.
  • Wait() is executed for blocking.

5.DEAD, that is, the termination state, which is the state after the execution of the run() method ends or exits due to an exception. This state is
irreversible.

————————————————————————————————

5.synchornized

①syncornized basics
The use of the synchornized keyword is that when multiple threads access the same resource, the resource needs to be locked. The locked code block here is also called critical code or critical section, and can also be called synchronization area. One point that needs to be emphasized here is That is, the target of the synchornized keyword lock is an object, not a code block. It is only necessary to obtain the object lock before the code in synchornized can be executed. For example, to execute the code block wrapped in synchornized below, you must first obtain the lock of the o object. (The locked objects must be the same, otherwise the mutual exclusion effect will not be achieved.)
Insert image description hereSo, what is an object lock? The implementation of the synchronized keyword is a specification for the JVM, not a specific implementation step. Different The JVM may have different implementation methods, but they all must comply with the specifications of the synchornized keyword. The specification is that the synchornized keyword is used for code synchronization. As for how to synchronize, it depends on the implementation of your JVM, such as about synchornized in Hotspot. The implementation of the keyword is to select a fixed two bits on the header (64 bits) of our object to identify the current lock status of the object. For example, 01 means it is locked by a certain type of lock, and 00 means it is not locked. . Of course, the above method is just an example, and it is generally not written like this. A more convenient way is to directly lock the current object: the
Insert image description here
following method has the same meaning as synchornized(this) to lock the current object:
Insert image description here
when you lock When using a static method, what is similar to locking is the class object of this class. All instances of this class need to obtain the lock when calling this static method.
Insert image description hereTo add, the lock object should not be a String constant or basic types such as Integer or Long. String constants may be associated with other class libraries, and if the value of basic types such as Integer changes, the object will also change. It is not recommended to use it.

Regarding whether synchronized locks objects or codes, and the difference between [synchronized+normal methods] and [synchronized+static methods], please refer to the blog below. There are examples of detailed explanations, and I will not repeat them here: Does synchronized lock code or
objects——
——————————————————————————————
Another important concept of synchronized lock isreentrant, also called reentrant lock. In the same thread, there are multiple lock methods, but the lock is the same object. When one of the methods calls another method, it can be executed, similar to the following in the m1 method. When calling the m2 method, the objects they lock are all this. If the lock is not reentrant, when m2 calls it, it will be found that m1 has been locked, and m2 is waiting for m1 to be released, then it will be deadlocked. So when m2 is called at this time, it is found that the current thread is still locking this object, then you are allowed to continue accessing it. This is a reentrant lock . Remember that the premise of this experiment is that the lock methods of the same thread call each other, and the lock are the same object.
Insert image description here
The following is a reentrant lock experiment of a parent and child class (all locks are the this object of the subclass), the same as above.
Insert image description here
The principle of adding synchornized here is that the scope of the lock is as small as possible and the locking time is as short as possible, that is, the object can be locked. , don't lock the class, if you can lock the code block, don't lock the method.
Another point to add is that if an exception occurs in the critical section code, the lock resource will be released and may be occupied by other threads that want to obtain the lock, causing the final data to be inconsistent.

②synhornized advanced
Let’s start with the previous picture:
Insert image description here
Let’s start formally. First, explain why there is a concept of CAS, and what is the relationship between CAS and synchornized. Why do you need to know CAS first to learn synchornized.
first what isCAS, it is the English abbreviation of compare and swap , which means compare and swap. Students who have installed Linux virtual machines themselves should remember that there is a swap area when partitioning the disk, which means swap area.
What is CAS used for? First of all, it can complete concurrent operations without locking (the lock here refers to the operating system, the kernel concept lock, which is a heavyweight lock), because the consumption of locks is relatively large. In order to make the operation of mutually exclusive resources more lightweight, CAS was born. When the amount of concurrency is not large, or when concurrent operations are very fast, using CAS will be more efficient than synchronized locks.
——————————————————————————————
CAS is also called lockless or spin lock . Look at the picture, its working principle is also Very simple, if there is a shared resource variable with a value of A, the first thread S1 reads A into the memory space of its own thread for calculation and processing, and obtains the result D. At this time, it needs to write the value of D back to A. , before writing back, S1 will obtain the latest shared resource variable and see if the latest value is still A. If it is still A, it means that no one else has touched this shared resource variable during the period of time I modified it. Then I Just directly overwrite the shared resource variable with the value of D, which is comparison and exchange.
If thread S2 changes the value of A from A to B during the operation of the S1 thread, and finds that the value has been changed when S1 writes it back, it will do it again, read B, calculate again, and finally return.
There is an obvious problem in this, which is the ABA problem . The S1 thread changes the shared resource variable value to A. The shared resource variable value goes through the other two threads S2 and S3. S2 changes the shared resource variable value from A to B. S3 changes the shared resource variable value from B back to A. At this time, S1 comes back after the modification and finds that the shared resource variable value is A, so it directly overwrites it. Depending on your specific business, there may be problems here because of the intermediate value. If it is not used, how to solve it? The simplest way is to add the version number, which is similar to optimistic locking.
Insert image description hereHere we have a simple understanding of CAS first,
and give an example of the use of CAS in Java:

// 在Java并发包JUC下有已经使用了CAS的类,例如 
// AtomicInteger类的incrementAndGet()方法:
public final int incrementAndGet() {
    
    
        for (;;) {
    
    
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                return next;
        }
    }

// compareAndSet(current, next)底层最终调用的是
// unsafe类的compareAndSwapInt方法,名字是不是很眼熟
public final boolean compareAndSet(int expect, int update) {
    
    
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }

// 再看一下Unsafe类:
// 使用native关键字说明这个方法是原生函数,也就是这个方法是用C/C++语言实现的,并且被编译成了DLL或者SO(Linux下),由java去调用。
// 要想知道compareAndSwapInt的实现就得去看JVM虚拟机的源码
public final native boolean compareAndSwapInt(Object var1, long var2, int var4, int var5);

Let's take a look at the relationship between CAS and the synchornized keyword. After the continuous optimization of the JDK version, synchornized is no longer a heavyweight lock before. It now has multiple lock implementations, including the concept of lock upgrade:
Bias lock → lightweight lock → heavyweight lock
——————————————————————————————
Let me tell you, before talking about the bias lock , you need to understand one more little thing. Knowledge is the memory layout of the object and the small details of MarkWord :
Look at the picture, we have a small T object. After new comes out, the memory space is allocated to it in the heap memory. This memory space is mainly divided into four parts (you can use openJDK The JOL tool provided to view the object memory layout)
The first part is MarkWord , which is used to make some marks and logos in the header.
The second part is Klass pointer , which is a pointer used to point to T.class. The default is four bytes.
The first and second parts are also collectively called the object header.
The third part is the class attribute , such as m, which is of int type and occupies four bytes.
The fourth part is padding , which is used for supplementary alignment. A 64-bit virtual machine requires that the heap memory space size be divisible by 8. If the above If the sum of the three parts is not divisible by 8, then the padding will be filled.
Insert image description hereNext, let's use JOL to take a look at the memory layout of the T object
Insert image description here
. After we add a synchronized lock to the T object, we will see
Insert image description here
if there are any changes in the memory layout. So the first conclusion we get is that what we call locking a The object is actually marked in its header, the MarkWord area, which stores lock information (lock status) . This is also the first function of the object MarkWord. Attached is a picture of the various lock statuses of a 64-bit virtual machine. In the memory layout diagram of the information section below, what else is there besides storing lock status information
Insert image description here in Teacher Ma's first MarkWord? First, there is also GC mark information and the identification of the garbage collector. Second, the hashCode value, if called.

Specifically, how to mark the lock information? In this picture, in our markword, which is 8 bytes and 64 bits in length, the last two or three digits are used to mark the lock. For example, the unlocked state is called 001, and the biased lock is 101
. The lightweight lock is 00, and the heavyweight lock is 10.
Insert image description hereCome back and look at this picture, which is the upgrade process from lock-free state to spin lock (why the bias lock is skipped will be explained later): To summarize
Insert image description herehere, MarkWord is What? It is the header information of the object in the heap memory space. It mainly stores the above three information. What we usually call locking is actually modifying the value of MarkWord corresponding to the lock status information location.
——————————————————————————————————
After understanding the memory object layout and MarkWord, return to the concept of synchronized lock upgrade:
Bias lock → lightweight lock → heavyweight lockAs for the picture above
Insert image description here, first of all, we still have to briefly explain what are biased locks, lightweight locks and heavyweight locks . From a large scale, biased locks and lightweight locks are user space locks, and heavyweight locks are operating system locks. Space, kernel locks, and synchronized were very slow in JDK 1.2 because there was no concept of lock upgrades. It was a heavyweight lock that directly obtained kernel resources to control. It was a bit excessive to use amplified moves as soon as it started. Later, the concepts of biased locks and lightweight locks were gradually optimized. So what is a biased lock? I will go directly to the content in the book, which is very detailed (the ThreadID mentioned in it is a small part of MarkWord). Here first
Insert image description here
CAS Let’s talk about bias locks. Let’s talk about lightweight locks, which are roughly divided into spin locks and adaptive spin locks. The latter is a change of the former in an application scenario, and it cannot be said to be completely optimized. The difference between the two The implementation is the same. We will only explain the spin lock . When your first thread A accesses the shared resource, assume that the default value of ThreadID is 0000, modify the ThreadID to 1111, and wait until the second and third thread B. C has come, and to compete for the right to use this resource, it has evolved from a bias lock to a spin lock. At this time, the default is still biased towards thread A, but there will be the concept of lock revocation , and the ThreadID will be returned to the default value. At the same time, A, Threads B and C use CAS to modify the ThreadID. Whoever makes the modification will execute it. The remaining two threads that have not grabbed it will start an operation similar to an infinite loop (the target can occupy ThreadID is 0000, if A grabs the ThreadID It is 1111. B and C read and found that it is not 0000, which means that there are still people using it. B and C failed to execute CAS and continue to cycle CAS). The cycle is modified until the first thread releases the resource and resets the ThreadID to 0000. Modify Successfully obtains execution rights. During this period, subsequent threads will not hang or block, and there is no need to save the execution scene., reduces the consumption of thread resource scheduling, does not need to apply for a large lock from the kernel, and completes multi-threaded operations. This is a spin lock, as if other waiting threads are spinning and waiting next to it. Some people also call this kind of lock lock-free , but it is not recommended to call it that. . .
Spin locks also have obvious problems. Although they do not require large locks in the kernel state, they consume memory. If you have 10,000 threads concurrently, one thread is executing, and the remaining 9999 are spinning. This is unbearable. Ah, if it has not finished executing for 10 minutes, the CPU will be full, so the spin lock is suitable for scenarios where the amount of concurrency is small and shared resource operations are executed quickly (the lock is released quickly) . Heavyweight locks will not consume CPU resources. They will have a thread waiting in the queue and the thread will be in a suspended state.
To make up for the interview questions
Insert image description here
————————————————————————————————
Let’s look at how to upgrade the bias lock to a spin lock and how to upgrade it. To the heavyweight lock : it is very simple to upgrade the bias lock to a spin lock. When a second thread comes to use the current shared resource, it will be upgraded. As mentioned in the above book, the bias lock is used to reduce the number of unnecessary locks
. What competes for overhead is not the mutex lock. When competition occurs, the biased lock management will no longer be used.
So how can a lightweight lock be upgraded to a heavyweight lock ?
Insert image description here
Another way to look at the implementation of hotspot before JDK1.6 is to look at the relationship between the current spinning thread and the number of cpu cores. For example, when the number of threads exceeds half of the current number of cpu cores, these are old concepts, but they are no longer the same . , but new concepts are based on these concepts. You must first understand the past before you can look at the future. The current implementation is the adaptive spin lock mentioned above, how many turns does it rotate, and how many threads do it need to be upgraded to a heavyweight lock? These are all calculated and scheduled by the JVM automatically. When appropriate, it will make an upgrade action. For example, if the previous thread automatically performs 10 times If the execution is successful within the spin, then the next spin will be successful by default, and the number of spins may be relaxed at this time.
The summary is: determined based on the spin time of the same lock and the owner status of the lock.
If competition intensifies: there are threads that spin more than 10 times, -XX:PreBlockSpin, or the number of spin threads exceeds half of the number of CPU cores, after 1.6, Adaptive Self Spinning is added, and the JVM controls the upgrade of heavyweight locks by itself :-> Apply for resources from the operating system, linux mutex, the CPU calls the system from level 3 to level 0, the thread hangs, enters the waiting queue, waits for the scheduling of the operating system, and then maps back to the user space.
———————————————————————————————— Now come back and look at a line, which
is the general process of lock upgrade we mentioned above.
Insert image description hereThere is a situation where the biased lock is directly upgraded to a heavyweight lock. For example, calling the hashCode() and wait() methods will directly upgrade the lock to a heavyweight lock. Let's look at why there are two
Insert image description herestates after a new object comes out:
Insert image description here
first, Bias lock is started by default. You can change the parameters at startup to set it to be unavailable. Since it is started by default, careful friends must have discovered that the picture above is directly from new without lock state. Why does it reach the spin lock state? Because the bias lock has the concept of delayed start by default.
Insert image description hereIf it is accessed concurrently before the delayed start of the bias lock, then it goes directly to the spin lock.
Insert image description here
How about Hotspot's implementation of a delay of 4 seconds? To prove this four-second delay, look at the code. We will sleep the current thread for five seconds.
Insert image description here
Then look at the lock status information, the biased lock in the 101 status.
Insert image description here
So why is there this concept of delayed start? Our previous experiment was directly from the lock-free state to the spin lock state. The
reasons are as follows.
Will the use of bias locks be more efficient than spin locks? The answer is no. When we clearly know that a resource will be concurrently preempted by multiple threads, why should we use a biased lock first? There are also additional steps to cancel the lock. It will be better to let this resource directly enter the spin lock state than It is much better to first [bias lock → undo → spin lock]. Ensure efficiency.
During the four-second delay period, the JVM virtual machine itself has some threads that are started by default, and there are a lot of sync codes in them. When these sync codes are started, they know that there will definitely be competition. If biased locks are used, biased locks will continue to lock. The operations of revocation and lock upgrade are less efficient.

To supplement the interview question,
Insert image description herewe can also modify the delay time to 0 seconds by modifying the virtual machine parameters.

-XX:BiasedLockingStartupDelay=0

If you set the above parameters new Object () -> 101 biased lock -> thread ID is 0 -> Anonymous BiasedLock to turn on the biased lock, the new object will be a biased anonymous object 101 by default

Then there is another problem. The newly created object has a bias lock. Who does it prefer? It has not been accessed yet. It is the state in our picture: anonymous bias state, which means no bias state. Wait for the first When accessed by a thread, it actually becomes a thread-biased state (that is, the threadID in MarkWord is modified).

Add a parameter about biased locking in Java non-standard parameters to see
BiasedLocaking, which means biased locking. For example, the last UseBiasedLocaking value in the parameter is true, which means that biased locking is used by default, and BiasedLocakingStartupDelay is the number of milliseconds to delay startup. The value is 4000 means delayed startup for four seconds.
Insert image description here——————————————————————————————————
Finally, let’s talk a little bit about the underlying implementation of synchornized. The first is the implementation of the JVM layer. Use Monitor lock (content in the book)
Insert image description here
Insert image description hereAs for why there are two monitorexits, some friends may have questions. These two exits are one exit under abnormal circumstances and one exit under normal circumstances, so there are two exit statements in total.
Above is the bytecode, which is the underlying content of .class. Let’s go to the bottom to study it. Use hsdis to observe the underlying implementation of synchronized, which is the assembly language implementation after JIT dynamics and real-time compilation of .class bytecode files: lock
Insert image description here
cmpxchg Instruction, the word cmpxchg means compare and exchange, which is CAS. What does the CAS of lock mean? The essence of lock is to have a semaphore. Whichever thread can modify it first will obtain the execution right of the lock code block. So the bottom-level implementation is lock [cmpxchg = cas modifies variable value], but there seems to be a very optimized implementation, but hotspot does not use it, it is lazy, and the lock instruction is supported by most bottom systems.
At the hardware level:
the lock instruction locks a Northbridge signal when executing subsequent instructions (without locking the bus)
——————————————————————— ——————
Here, synchronized comes to an end.

6.volatile

First of all, volatile has two characteristics. Let’s start with these two characteristics:
1: Thread visibility
2: Preventing instruction reordering

—————————————————— ——————————
First of all:1: Thread visibility
The implementation of thread visibility is related to JMM, which is Java's cache model Java Memory Model . There is a shared variable in our main memory. In the implementation of Java's cache model, the thread will copy the variable in the main memory to the thread. in the local memory, and then operate the variable wildly, but the local cache value is modified, and finally written back to the main memory. However, this process is not clear to other threads by default and is a black box. Only by adding the volatile keyword can Let other threads know the changes of this shared variable in real time. Look at a simple small program. Even if the flag is changed to false in the main thread, the child thread cannot stop because the flag it uses is the value in the local cache and is always true.
Insert image description hereEveryone must have a lot of questions here. First of all, why does a thread have a local cache? How is it explained in the book?
Insert image description here
Second, how does the volatile keyword achieve thread visibility?
Insert image description hereRegarding thread visibility, we will continue to explain in detail later.
——————————————————————————————————
The following is:Two: Prevent instruction reordering
Before understanding how to prevent instruction reordering, we must first understand instruction reordering . Instruction reordering can also be called out-of-order execution of the CPU. It seems to be out-of-order execution, but it is actually a kind of instruction optimization. Let’s read the content in the book:
Insert image description hereInsert image description hereAbout The terms happens-before and as-if-serial are not explained here. You can read this blog:
Java concurrent programming: happens-before and as-if-serial semantics

To sum up, instruction optimization means that the CPU will independently merge and optimize some instructions to improve efficiency while ensuring the consistency of the final result. This starting point is not problematic, but in some cases it will affect our code.
For an example to confirm the existence of instruction reordering, look at the code:

public class test {
    
    
    private static int x = 0, y = 0;
    private static int a = 0, b =0;

    public static void main(String[] args) throws InterruptedException {
    
    
        int i = 0;
        for(;;) {
    
    
            i++;
            x = 0; y = 0;
            a = 0; b = 0;
            Thread one = new Thread(new Runnable() {
    
    
                public void run() {
    
    
                    a = 1;// ①
                    x = b;// ②
                }
            });

            Thread two = new Thread(new Runnable() {
    
    
                public void run() {
    
    
                    b = 1;// ③
                    y = a;// ④
                }
            });
            one.start();two.start();
            one.join();two.join();
            String result = "第" + i + "次 (" + x + "," + y + ")";
            if(x == 0 && y == 0) {
    
    
                System.err.println(result);
                break;
            } else {
    
    
                //System.out.println(result);
            }
        }
    }
}

First, we continuously create two threads one and two, and then join to execute them in the main thread. Assuming that there is no instruction reordering (the local execution order of ① and ②, ③ and ④ in the thread will not change, That is to say, ① must be executed before ②, and ③ must be executed before ④), then the results may be as follows:
1. One is executed first, and two is executed next. At this time (x=0, y=1) → ①② ③④
2. Two is executed first, and one is executed next. At this time (x=1, y=0) → ③④ ①②
3. One and two are executed alternately at the same time. At this time (x=1, y=1) → ① ③④ ② Or ③ ​​①② ④
As long as the execution order of ① and ②, ③ and ④ in the thread will not change, it is impossible for the situation (x=0, y=0) to occur, but we ran the program and found that it did occur. This situation:
Insert image description here
This simple example proves the existence of instruction reordering. Generally, only codes that are not directly related to the upper and lower instructions will be reordered. Codes with a sequential relationship such as a=1 and x=a will not be reordered.

So what are the bad consequences of instruction reordering? Let’s use an interview question to illustrate it. It is a very classic Meituan interview question, and the solution to this problem is also explained in the Alibaba Java Development Manual. I have explained this problem to you at the daily meeting of the department not long ago, because there are many blogs about this problem on the Internet. Here I will briefly explain it to you using code + drawing, and first throw out the problem
: In DCL singleton mode, does the singleton object need to be modified with volatile?
Let’s explain step by step. First of all, I believe there is no need to explain the lazy singleton model. Everyone understands it. Here is the code.

// 情况1
public class LazySingleton {
    
    
    private static T t = null;

    public static T getInstance(){
    
    
    
        if(t == null) {
    
    
            t = new T();
        }
    
        return instance;
    }
}

There is nothing wrong with this code running in a single thread, but there is a problem with multi-threading. If multiple threads run to if(t == null) at the same time, multiple t objects will be new, which does not conform to the design concept of the singleton mode. Then How to modify it? The simplest way is to just use the synchronized we just mentioned. Look at the code.

// 情况2
public class LazySingleton {
    
    
    private static T t = null;

    public synchronized static T getInstance(){
    
    
    
        if(t == null) {
    
    
            t = new T();
        }
    
        return instance;
    }
}

Is this solved? It is solved, but the efficiency is not high. The locking force is too strong. If you can lock the code block, don't lock the method. What if there are other codes in this method that do not need to be locked, such as some log codes, so we start Reduce the scope of the lock, impatient friends just do this. as follows

// 情况3
public class LazySingleton {
    
    
    private static T t = null;

    public static T getInstance(){
    
    
    
        if(t == null) {
    
    
        	synchronized(this) {
    
    
				t = new T();
			}
        }
    
        return instance;
    }
}

Is it okay to write like this? Of course not, this is the same problem as case 1, except that a lock is added to limit the order of new. If multiple threads run to if(t == null), multiple t objects will still be created and returned. You have to wrap the sentence if(t==null) in the lock, as follows

// 情况4
public class LazySingleton {
    
    
    private static T t = null;

    public static T getInstance(){
    
    
    
    	synchronized(this) {
    
    
	        if(t == null) {
    
    
				t = new T();
	        }
	    }
    
        return instance;
    }
}

In this way, the lock scope will not be reduced, and the speed will definitely be faster, but it is still a bit slow. Where is the slowness? All threads that obtain this singleton object must first obtain the lock and queue to obtain it. The efficiency is very low. The above problem It is that when creating the t object, it will be created repeatedly and too much code is locked. The latest problem here is that there is no need to add a lock when acquiring it. That is, just lock it during the creation of competition. Make sure that only There will be a t object. Don’t lock it when you obtain it later. How to change it is as follows

// 情况5
public class LazySingleton {
    
    
    private static T t = null;

    public static T getInstance(){
    
    
    	
    	if(t == null) {
    
    
	    	synchronized(this) {
    
    
		        if(t == null) {
    
    
					t = new T();
		        }
		    }
	    }
    
        return instance;
    }
}

In this way, after the t object is built, all subsequent threads that obtain the single object do not need to be locked and queued, and the efficiency is naturally high. Here, two if(t == null) checks are used, and locks are used. The object creation of singleton mode is completed, so this creation method is called DCL → double-checked Locking singleton mode.
This seems to solve the problem of concurrent creation of multi-threads and ensures reading efficiency. It is very perfect. In fact, the singleton mode in the source code of many frameworks is written like this. However, it can only be said that this way of writing is 99.99% perfect. Yes, the difference is so 0.001%. Why do you say that? Above we learned about the instruction optimization of the CPU. Let’s take a look at it together.

T t = new T();

This code is split into several steps when executed. First of all, let me declare that this seems to be only one sentence, but the new operation is not atomic.

class T {
    
    
	int m;
	public T() {
    
    
		m = 8;
	}
}

First, let’s take a look at how many sentences there will be after this new code is compiled into bytecode:
Insert image description here
There are five sentences in total, and the three sentences related to our object creation have been sketched out. Next, let’s draw a picture and briefly explain it. The construction process of the small t object is divided into three steps:
Insert image description here
Some friends here may have discovered where the problem will occur. After the CPU instructions are optimized, the order of the above three steps becomes ①③②. When the first thread reaches ①③At this step (halfway through new), the t object is no longer null. At this time, other threads come in to judge if (t == null) and find that it is no longer null, so they take the t object and use it, and the result is t The m attribute value of the object is 0. This is the problem. The solution is to modify the m attribute with volatile to prevent the instructions from being reordered , so that the steps ①②③ are not allowed to be reordered.
(This situation cannot occur without high concurrency. It may require hundreds of thousands or millions of concurrencies. This issue is also mentioned in Alibaba's Java development manual, but it only recommends adding volatile and does not say anything about it. Reason, after all, this problem is difficult to solve using an example program)

So how to prevent instruction reordering ?
First of all, in terms of JVM level requirements, this is a JVM specification. For the heap memory space modified with volatile in Java code, instruction reordering is not allowed. As for the specific JVM, such as how Hotspot is implemented, that is up to each JVM. Thing, this is the bastard's ass, regulations.
So how to ensure that two instructions cannot change positions? This concept is called a memory barrier , which can be understood vividly as adding a barrier between two instructions so that they cannot move.
There are four memory barriers (logical concepts) in the JVM : load means reading , and store means writing . These four barriers are easy to understand. For example, the first loadload → read barrier means the above There are two read statements at the same time as below. At this time, the two read statements cannot change positions. The following three are the same.
Insert image description here
So what does it mean to not allow instruction reordering in volatile memory space? As shown in the figure below,
Insert image description here
add corresponding read and write barriers before and after volatile read and write operations to achieve the effect of prohibiting instruction reordering. These are all logical concepts. So how is it implemented? Take hotspot as an example:
Insert image description herestill use hsdis to view the assembly code generated by volatile: lock addl
Insert image description herelock addl means adding a zero to a certain register. What is the meaning of this? The answer is that it is meaningless, although it is also a lock instruction. But what it locks is an empty instruction, so why can it achieve thread visibility and prevent instruction reordering? It’s really weird, now let’s find out the answer:

The LOCK instruction is used for exclusive use of shared memory when executing instructions in a multiprocessor. Its function is to refresh the contents of the cache corresponding to the current processor to the memory and invalidate the cache corresponding to other processors. In addition, it also provides the function that ordered instructions cannot cross this memory barrier.

Therefore, the JVM (hotspot) bottom layer uses the LOCK instruction to achieve thread visibility and prevent instruction reordering .
So how does he specifically invalidate the caches of other processors?
Explanation: The cache coherence protocol and MESI
first state that these two have nothing to do with the implementation of volatile, but their semantics or expressed ideas are consistent with volatile.
Before we talk about cache invalidation, let’s first understand the concept of cache . First, the picture above.
Insert image description hereBecause the speed of CPU is much greater than that of memory, using the corresponding cache between CPU and memory can make reading more efficient and faster. Usually It is divided into three types of cache, L1, L2 and L3 (L0 actually has a cache, so I won’t explain it here). Let’s take a look at the speed of accessing them in the figure below: (registers is the cache of cpu L0). At the same time, when
Insert image description hereyou When the CPU has multiple cores, there will be a shared cache L3. Each core has a copy of L1 and L2.
Insert image description here
When we read content from the main memory into the cache, how much is read at a time? If the memory There is a variable of type x int, which is four bytes, so will I only read four bytes? The answer is no. Data is read from the memory in blocks, and the memory content of the x variable is taken over at once. This is related to the bus width. If you have a 64-bit bus width, then When you read a 64-byte memory content and read a 4-byte memory content, the read size is the same. What you read is 64-byte content, but in the four-byte content, The remaining 60 bits may be empty. The content of a block read is called a cache line: Cache Line
Insert image description hereInsert image description here
For example, there is an array in memory. When you access the element with index 0 for the first time, you may access the element with index 1 again. If I can read more content at a time, then why should I? Instead of reading the remaining content into the cache, the next visit will be much faster.
Insert image description hereWe have almost understood the concept of cache. Next, let’s talk about cache failure. It is still the picture above. When our cpu1 wants to read the data of x, and cpu2 wants to read the data of y, assuming that these two data are on the same cache line, When cpu1 modifies the value of , but an entire cache line. Secondly, the protocol it relies on is called the MESI cache cache consistency protocol . This operation of notifying cache invalidation is basically implemented by the bottom layer of all systems.
MESI cache cache consistency protocol is actually a cache consistency protocol of inter cpu. It is a cpu-level protocol. Different cpu protocol names will be different. MESI means that the first letters of the four states of the cache line are M, E, S, and I.
At this point in the above picture
Insert image description here, the explanation of volatile thread visibility comes to an end.
————————————————————————————————
I owe the next two for now, and I will come back to write them after I finish the book. #TODO

7.threadlocal

8. Thread pool

Guess you like

Origin blog.csdn.net/cjl836735455/article/details/106745984