3. Detailed explanation of synchronized in concurrent programming

The significance of designing a synchronized synchronizer

In multi-threaded programming, multiple threads may access the same shared and variable resource. This resource is the so-called critical resource. This resource may be shared by variables, objects, files, etc.: resources can be shared by multiple threads at the same
time Access
mutable: resources can be changed during the lifetime

The resulting problem:
Since the execution of threads is uncontrollable, a synchronization mechanism is needed to coordinate access to variable resources.

Then how to solve such problems:
In fact, all our concurrent modes use serialized access to critical resources when solving thread safety problems, that is, at the same time, only one thread accesses critical resources, also known as Synchronized Mutex Access

Java provides two kinds of synchronized mutual exclusion access: synchronized and lock
are essentially locking. The purpose of locking is to serialize access to critical resources. Only one thread can access critical resources at the same time, but there is one difference: In the case of multi-threading, the variables inside the method are stored in the private stack of the thread, so they are not shared, so there will be no thread insecurity

Detailed explanation of synchronized principle

The built-in lock of synchronized is a kind of object lock (the lock is an object rather than a reference). The granularity of the lock is an object. It is used for mutually exclusive access to shared resources and is also reentrant. Locking method
:

Synchronous instance method, the lock is the instance object
Synchronized class method, locks the class object
The synchronized code block locks the objects in the brackets

The underlying principle of synchronized
is based on the JVM's built-in lock implementation. It is implemented through the internal object Monitor (monitor lock). Based on the entry and exit of the Monitor object, the method and the code block are synchronized. The implementation of the monitor lock depends on the Mutex of the underlying operating system. Lock (mutual exclusion lock) implementation, it is a heavyweight lock with low performance, of course, after JDK1.5, major optimizations have been made, such as lock coarsening, lock elimination, lightweight lock, biased lock, adaptive spin and other technologies to reduce the overhead of lock operations, the concurrency performance of built-in locks is basically the same as that of locks

The synchronized keyword is compiled into bytecode and then reflected into monitorenter and monitorexit two instructions respectively at the actual position and the end position of the synchronization fast logic code.
insert image description here
Each synchronization object has its own Monitor (monitor lock), and the locking process As shown below:

Then there is a question, we know that the synchronized lock is added to the object, how does the object record the state of the lock?
The answer is that the lock state is recorded in the object header (Mark Word). Let’s understand the memory layout of the object: The memory layout of the
HotSpot virtual machine, the memory can be divided into three areas, the object header, instance data, and alignment padding.

Object header: such as hash code, age of the object, object lock, lock status flag, biased lock ID, biased time, array object, etc.
Instance data: object member variables, methods, etc. when creating an object
Alignment padding: The object size must be an integer multiple of 8 bytes. The

object header
of the HotSpot virtual machine includes two parts of information. The first part is ", which is used to store the runtime data of the object itself, such as hash code (HashCode), GC generation Age, lock status flags, locks held by threads, biased thread IDs, biased timestamps, etc., the length of this part of data is 32 in 32-bit and 64-bit virtual machines (not considering the scenario of enabling compressed pointers for the time being). and 64 Bits, which is officially called "Mark Word". The object needs to store a lot of runtime data, which has actually exceeded the limit that the 32 and 64-bit Bitmap structure can record, but the object header information is defined with the object itself Data-independent additional storage cost, considering the space efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much information as possible in a very small space, and it will reuse its own storage according to the state of the object Space. For example, in the state where the object is not locked in the 32-bit HotSpot virtual machine, 25Bits in the 32 Bits space of Mark Word are used to store the object hash code (HashCode), 4Bits are used to store the generational age of the object, and 2Bits It is used to store the lock flag bit, 1Bit is fixed to 0, and the storage content of the object in other states (lightweight lock, heavyweight lock, GC mark, biasable) is shown in the following table. But if the object is an array type, you

need Three machine codes, because the JVM virtual machine can determine the size of the Java object through the metadata information of the Java object, but cannot confirm the size of the array from the metadata of the array, so use one piece to record the length of the array. The object header information is the same as the
object The extra storage cost is irrelevant to the data defined by itself, but considering the space efficiency of the virtual machine, Mark Word is designed as a non-fixed data structure to store as much data as possible in a very small space memory, and it will be repeated according to the state of the object Use your own storage space, that is to say, Mark Word will change with the running of the program, and the changing state is as follows (32-bit virtual machine):

Lock expansion upgrade process
There are four kinds of lock states in total, no lock state, biased lock, lightweight lock and heavyweight lock. With the competition of locks, locks can be upgraded from biased locks to lightweight locks, and then upgraded to heavyweight locks, but the upgrade of locks is one-way, that is to say, it can only be upgraded from low to high, and there will be no locks. downgrade. The following figure shows the whole process of lock upgrade:

Biased lock
Biased lock is a new lock added after Java 6. It is an optimization method for locking operations. After research, it is found that in most cases, locks not only do not have multi-threaded competition, but are always multi-threaded by the same thread. Therefore, in order to reduce the cost of acquiring locks by the same thread (which involves some CAS operations, time-consuming), biased locks are introduced. The core idea of the biased lock is that if a thread acquires the lock, the lock enters the biased mode. At this time, the structure of the Mark Word also becomes a biased lock structure. When the thread requests the lock again, there is no need to perform any synchronization operations, that is, The process of acquiring locks, which saves a lot of operations related to lock applications, thus improving the performance of the program. Therefore, for occasions where there is no lock competition, biased locks have a good optimization effect. After all, it is very likely that the same thread applies for the same lock multiple times in a row. However, for situations where lock competition is fierce, biased locks are invalid, because it is very likely that the threads that apply for locks are different each time in this situation, so biased locks should not be used in this situation, otherwise the gains will outweigh the losses, and you need to pay attention What's more, after the biased lock fails, it will not expand to a heavyweight lock immediately, but will be upgraded to a lightweight lock first

If the lightweight lock
fails, the virtual machine will not immediately upgrade to a heavyweight lock. It will also try to use an optimization method called lightweight lock (added after 1.6). At this time, the structure of Mark Word It also becomes a lightweight lock structure. The basis for lightweight locks to improve program performance is that "for most locks, there is no competition during the entire synchronization cycle." Note that this is empirical data. What needs to be understood is that the scene where lightweight locks are suitable is the occasion where threads execute synchronization blocks alternately. If there are occasions where the same lock is accessed at the same time, it will cause lightweight locks to expand into heavyweight locks.

Spin lock
After the lightweight lock fails, the virtual machine will also perform an optimization method called spin lock in order to prevent the thread from actually hanging at the operating system level. This is based on the fact that in most cases, the thread will not hold the lock for too long. If you directly suspend the thread at the operating system level, it may not be worth the loss. After all, the operating system needs to switch from user state to thread when switching between threads. In the core state, the transition between these states takes a relatively long time, and the time cost is relatively high, so the spin lock assumes that the current thread can acquire the lock in the near future, so the virtual machine will let the thread that currently wants to acquire the lock Do a few empty loops (this is why it is called spin), generally it will not take too long, maybe 50 loops or 100 loops, after several loops, if you get the lock, you will enter the critical section smoothly. If the lock cannot be obtained, the thread will be suspended at the operating system level. This is the optimization method of the spin lock, and this method can indeed improve efficiency. In the end, there is no way but to upgrade to a heavyweight lock.

Lock elimination Lock
elimination is another kind of lock optimization for virtual machines. This optimization is more thorough. When the Java virtual machine is compiled in JIT (it can be simply understood as compiling when a certain piece of code is about to be executed for the first time, also known as just-in-time compilation ), by scanning the running context, remove locks that are unlikely to have shared resource competition, and eliminate unnecessary locks in this way, which can save meaningless time for requesting locks. The append of StringBuffer is a synchronization method, but in The StringBuffer in the add method belongs to a local variable and will not be used by other threads, so there is no possibility of shared resource competition for the StringBuffer, and the JVM will automatically eliminate its lock.

escape analysis

Using escape analysis, the compiler can optimize the code as follows:
1. Synchronization is omitted. If an object is found to be accessible only from one thread, operations on that object may not be synchronized.
2. Convert heap allocation to stack allocation. If an object is allocated in a subroutine, the object may be a candidate for stack allocation rather than heap allocation if a pointer to the object never escapes.
3. Detach object or scalar replacement. Some objects may be accessed without existing as a continuous memory structure, so part (or all) of the object may not be stored in memory, but stored in CPU registers.
Do all objects and arrays allocate space in heap memory?
Not necessarily
when the Java code is running, you can specify whether to enable escape analysis through JVM parameters, XX:+DoEscapeAnalysis: indicates that escape analysis is enabled XX:DoEscapeAnalysis: indicates that escape analysis is disabled Since jdk 1.7, escape analysis has been started by default. If you want to close it, you need to specify XX: DoEscapeAnalysis

code demo

public class StackAllocTest {
    
    

    /**
     * 进行两种测试
     * 关闭逃逸分析，同时调大堆空间，避免堆内GC的发生，如果有GC信息将会被打印出来
     * VM运行参数：-Xmx4G -Xms4G -XX:-DoEscapeAnalysis -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError
     *
     * 开启逃逸分析
     * VM运行参数：-Xmx4G -Xms4G -XX:+DoEscapeAnalysis -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError
     *
     * 执行main方法后
     * jps 查看进程
     * jmap -histo 进程ID
     *
     */

    public static void main(String[] args) {
    
    
        long start = System.currentTimeMillis();
        for (int i = 0; i < 500000; i++) {
    
    
            alloc();
        }
        long end = System.currentTimeMillis();
        //查看执行时间
        System.out.println("cost-time " + (end - start) + " ms");
        try {
    
    
            Thread.sleep(100000);
        } catch (InterruptedException e1) {
    
    
            e1.printStackTrace();
        }
    }


    private static Student alloc() {
    
    
        //Jit对编译时会对代码进行 逃逸分析
        //并不是所有对象存放在堆区，有的一部分存在线程栈空间
        Student student = new Student();
        return student;
    }

    static class Student {
    
    
        private String name;
        private int age;
    }
}

1. The local mac environment is jdk 1.8, which is the escape analysis enabled by default. Use -XX:-DoEscapeAnalysis to disable escape analysis. Here we first create 500,000 objects to see how many objects are actually in the heap. Here, 500,000 objects are created
insert image description here
. on the pile

2. Cancel the escape analysis, and then use jps to see how many objects are actually in the heap. It can be seen that
insert image description here
500,000 objects are not created here, indicating that escape analysis has been performed, and some objects are allocated on the stack.