Java concurrent lock optimization and lock upgrade

Preface

This article introduces Java Synchronized lock optimization.

  1. Where does the lock exist and how to identify which lock is
  2. How does the Monitor mechanism behave in Java
  3. Lock optimization
  4. Lock upgrade

1. Where is the lock

  • The layout of objects in memory is divided into three areas: object header, instance data, and alignment padding.
    The object header of the Hotspot virtual machine mainly includes two parts of data: Mark Word (marked field) and Klass Pointer (type pointer). The array will be 1 word wide (32 bits: 4 bytes) to store the length of the array.
  • The lock used for synchronized is stored in the Java object header.
    Among them, Klass Point is a pointer to the object's class metadata. The virtual machine uses this pointer to determine which class instance the object is; Mark Word is used to store the runtime data of the object itself, which implements lightweight locking and bias The key to the lock.

There are two ways of object header in JVM (take 32-bit JVM as an example):

// 普通对象
|--------------------------------------------------------------|
|                     Object Header (64 bits)                  |
|------------------------------------|-------------------------|
|        Mark Word (32 bits)         |    Klass Word (32 bits) |
|------------------------------------|-------------------------|

// 数组对象
|---------------------------------------------------------------------------------|
|                                 Object Header (96 bits)                         |
|--------------------------------|-----------------------|------------------------|
|        Mark Word(32bits)       |    Klass Word(32bits) |  array length(32bits)  |
|--------------------------------|-----------------------|------------------------|

  • This part of Mark Word is mainly used to store the runtime data of the object itself, such as hashcode, gc generation age, lock status flag, lock held by the thread, biased thread ID, biased timestamp, etc.
    The bit length of the mark word is the size of one word of the JVM, which means that the mark word of a 32-bit JVM is 32 bits, and a 64-bit JVM is 64 bits.
    Mark Word is designed as a non-fixed data structure to store as much data as possible in a very small memory space. It reuses its own storage space according to the state of the object. In order to store more information in a word size, JVM will The lowest two bits of the word are set as mark bits, and the Mark Word under different mark bits is shown as follows:
|-------------------------------------------------------|--------------------|
|                  Mark Word (32 bits)                  |       State        |
|-------------------------------------------------------|--------------------|
| identity_hashcode:25 | age:4 | biased_lock:0 |lock:01 |     Normal无锁      |
|-------------------------------------------------------|--------------------|
|  thread:23 | epoch:2 | age:4 | biased_lock:1| lock:01 |     Biased偏向锁    |
|-------------------------------------------------------|--------------------|
|               ptr_to_lock_record:30         | lock:00 | Lightweight Locked轻量级锁 |
|-------------------------------------------------------|--------------------|
|               ptr_to_heavyweight_monitor:30 | lock:10 | Heavyweight Locked重量级锁 |
|-------------------------------------------------------|--------------------|
|                                             | lock:11 |    Marked for GC   GC标记|
|-------------------------------------------------------|--------------------|

Lock state

  • lock: 2-bit lock status flag bit. Since we want to use as few binary bits as possible to represent as much information as possible, the lock flag is set. The value of the mark is different, the meaning of the whole mark word is different.
  • biased_lock: Whether the object enables the biased lock flag, which only occupies 1 binary bit. When it is 1, it means that the object has a biased lock; when it is 0, it means that the object has no biased lock.
  • age: 4-digit Java object age. In the GC, if the object is copied once in the Survivor area, the age increases by 1. When the subject reaches the set threshold, it will be promoted to the old age. By default, the age threshold for parallel GC is 15, and the age threshold for concurrent GC is 6. Since age has only 4 bits, the maximum value is 15, which is why the maximum value of the -XX:MaxTenuringThreshold option is 15.
  • identity_hashcode: 25-bit object identification hash code, using lazy loading technology. Call the method System.identityHashCode() to calculate and write the result to the object header. When the object is locked, the value will be moved to the monitor monitor.
  • thread: ID of the thread holding the bias lock.
  • epoch: bias timestamp.
  • ptr_to_lock_record: Pointer to the lock record in the stack.
  • ptr_to_heavyweight_monitor: Pointer to monitor monitor.

2. Monitor mechanism

Monitor is actually a synchronization tool and synchronization mechanism. In Java, the Object class itself is the monitor object. Java has built-in support for the Monitor Object model, that is, every Java object is a natural monitor, and every Java object is Have the potential to become a Monitor.
And only one thread can obtain ownership of the object monitor at the same time. When the thread enters, it tries to obtain the ownership of the object monitor through the monitorenter, and releases the ownership of the object monitor through monitorexit when it exits.

  • The monitorenter process is as follows:
    if the number of monitor entries is 0, the thread enters the monitor, and then the number of entries is set to 1, the thread is the owner of the monitor;
    if the thread has already occupied the monitor and just re-entered, the number of entries of the monitor +1;
    if other threads have occupied the monitor, the thread is in a blocked state until the number of entries in the monitor is 0, and then try to obtain the ownership of the monitor again
  • monitorexit:
    The thread executing monitorexit must be the owner of the monitor corresponding to objectref. When the instruction is executed, the entry number of the monitor is reduced by 1. If the entry number is 0 after subtracting 1, the thread exits the monitor and is no longer the owner of this monitor. Other threads blocked by this monitor can try to acquire the ownership of this monitor.

Monitor is a thread-private data structure. Each thread has a list of available monitor records, as well as a global available list.
Each locked object is associated with a monitor (the LockWord in the MarkWord of the object header points to the starting address of the monitor), and there is an Owner field in the monitor to store the unique identifier of the thread that owns the lock, indicating that the lock is locked by this Thread occupation.

Monitor Record list

  • Owner: Initially NULL means that no thread currently owns the monitor record. When the thread successfully owns the lock, the unique identifier of the thread is saved, and it is set to NULL when the lock is released;
  • EntryQ: Associate a system mutex (semaphore) to block all threads that try to lock the monitor record failure.
  • RcThis: Indicates the number of all threads blocked or waiting on the monitor record.
  • Nest: used to realize the counting of reentrant locks.
  • HashCode: Save the HashCode value copied from the object header (may also include GC age).
  • Candidate: Used to avoid unnecessary blocking or waiting for the thread to wake up, because only one thread can successfully hold the lock at a time. If each thread that releases the lock before awakens all the threads that are blocking or waiting, it will cause unnecessary context switching (From blocking to ready and then being blocked because of competing lock failures) resulting in severe performance degradation. Candidate has only two possible values, 0 means that there is no thread to be awakened, and 1 means that a successor thread must be awakened to compete for the lock.

3. Lock optimization

jdk1.6 introduces a large number of optimizations to the implementation of locks, such as spin locks, adaptive spin locks, lock elimination, lock coarsening, bias locks, lightweight locks and other technologies to reduce the overhead of lock operations.
There are four main states of locks, in order: no lock state, biased lock state, lightweight lock state, and heavyweight lock state . They will gradually upgrade with the fierce competition.
The lock can be upgraded and cannot be downgraded. This strategy is to improve the efficiency of acquiring and releasing locks.

1. Heavyweight lock

monitor The monitor lock is essentially implemented by relying on the Mutex Lock mutex of the operating system. We generally call it a heavyweight lock. Because the OS needs to switch between threads to switch from user mode to core mode, this conversion process is costly and time-consuming, so the synchronization efficiency will be relatively low.

2. Lightweight lock

The performance improvement of lightweight locks is based on the fact 对于绝大部分的锁,在整个生命周期内都是不会存在竞争的that if there is no competition, lightweight locks can use CAS operations to avoid the overhead of mutex, thereby improving efficiency.
If this basis is broken, in addition to the overhead of mutual exclusion, there are additional CAS operations. Therefore, in the case of multi-threaded competition, lightweight locks are slower than heavyweight locks.

  • The locking process of lightweight locks:

    1. When the thread enters the synchronization code block, the JVM will first create a space named Lock Record in the stack frame of the current thread to store a copy of the current Mark Word of the lock object (officially called Displaced Mark Word ), the owner pointer points to the Mark Word of the object. The state of the stack and object header at this time is shown in the figure:
      image.png
    2. JVM uses CAS operations to try to update the Mark Word in the object header to a pointer to Lock Record. If the update is successful, go to step 3; if the update fails, go to step 4
    3. If the update is successful, then this thread owns the lock of the object, and the lock status of the Mark Word of the object is a lightweight lock (the flag bit changes to '00'). The state of the thread stack and object header at this time is shown in the figure:
      Lightweight lock
    4. If the update fails, the JVM first checks whether the Mark Word of the object points to the stack frame of the current thread.
      If it is, it means that the current thread already has the lock of the object, and it can directly enter the synchronization code block to continue execution.
      If not, it means the lock The object has been preempted by other threads, and the current thread will try to spin a certain number of times to acquire the lock. If the CAS operation is still unsuccessful after spinning for a certain number of times, the lightweight lock must be upgraded to a heavyweight lock (the lock flag is changed to '10'). The Mark Word stores a pointer to the heavyweight lock, and wait The locked thread enters the blocking state
  • The unlocking process of the lightweight lock:

    1. Replace the current Mark Word of the object with the data in the Displaced Mark Word copied in the thread through the CAS operation
    2. If the replacement is successful, the entire synchronization process is complete
    3. If the replacement fails, it means that other threads have tried to acquire the lock, then while releasing the lock, wake up the suspended thread

3. Bias lock

Basis: 对于绝大部分锁,在整个同步周期内不仅不存在竞争,而且总由同一线程多次获得。
In some cases, the same thread always acquires the lock multiple times. At this time, it is redundant to re-do CAS to modify the Mark Word in the object header for the second time. Therefore, there is a biased lock. You only need to check whether it is a biased lock, lock ID and ThreadID. As long as it is the same thread, the object header is no longer modified. Its purpose is to minimize unnecessary lightweight lock execution paths without multi-threaded competition.

  • Favor the shackle process:

    1. Check whether the Mark Word is in a biasable state, that is, whether it is a bias lock 1, and the lock identification bit is 01;
    2. If it is a biasable state, test whether the thread ID is the current thread ID, if it is, perform step (5), otherwise perform step (3);
    3. If the thread ID is not the current thread ID, the CAS operation will compete for the lock. If the competition succeeds, the thread ID of Mark Word will be replaced with the current thread ID, otherwise the thread will be executed (4);
    4. The failure of the CAS competition lock proves that there is a multi-threaded competition. When the global safety point is reached, the thread that obtained the biased lock is suspended, the biased lock is upgraded to a lightweight lock, and then the thread blocked at the safe point continues to execute Synchronization code block;
    5. Execute synchronous code block
  • Deflection lock release process:

    1. When a thread already holds a biased lock and another thread tries to compete for a biased lock, the CAS replacement ThreadID operation fails, and the biased lock is cancelled. To revoke a biased lock, you need to wait for the thread that originally held the biased lock to reach a global safe point (no bytecode is being executed at this point in time), pause the thread, and check its status
    2. If the thread that originally held the biased lock is not active or has exited the synchronization code block, the thread releases the lock. Set the object header to the unlocked state (the lock flag bit is '01', whether the bias flag bit is '0')
    3. If the thread that originally held the biased lock does not exit the synchronization code block, it is upgraded to a lightweight lock (the lock flag is '00')

4. Summary

There are four main states of locks, in order: no lock state, biased lock state, lightweight lock state, and heavyweight lock state. The upgrade is shown in the following figure:
Lock upgrade.png

Pros and cons

Other optimization

  • Spin lock: During
    mutual exclusion synchronization, both suspend and resume threads need to be switched to kernel mode to complete, which puts a lot of pressure on performance concurrency. At the same time, in many applications, the locked state of shared data will only last for a short period of time, and it is not worthwhile to suspend and resume threads for this short period of time. So if there are multiple threads executing in parallel at the same time, you can let the thread requesting the lock later spin (the CPU is busy looping to execute the empty instruction) and wait a while to see if the thread holding the lock will release the lock soon. There is no need to give up the execution time of the CPU.

  • Adaptive spinning:
    If the lock is occupied for a short time during spinning, the effect of spin waiting will be better, and if the lock is occupied for a long time, the spinning thread will waste CPU resources. The simplest way to solve this problem is to specify the number of spins. If the lock is not acquired within the limited number of times (for example, 10 times), the thread will be suspended in the traditional way and enter the blocking state.
    After JDK1.6, the adaptive spin method was introduced. If a thread spins on the same lock object and waits for the lock to be successfully acquired, and the thread holding the lock is running, then the JVM will think that this spin also has It is possible to successfully acquire the lock again, allowing the spin to wait for a relatively longer time (for example, 100 times). On the other hand, if a lock spin is rarely successfully obtained, then the spin process will be omitted when the lock is acquired in the future to avoid wasting CPU.

  • Lock Elimination When the
    virtual machine just-in-time compiler (JIT) runs, it automatically eliminates the lock when it detects a lock that is unlikely to exist based on the escape analysis data). The basis of lock elimination is the data support of escape analysis.
    If it is judged that in a piece of code, the data on the heap will not escape and be accessed by other threads, you can treat them as data on the stack, thinking that they are thread-private and need not be locked.
    As shown below, there is a synchronization code block in the StringBuffer.append() method. The lock is the sb object, but all references of sb will not escape outside the concatString() method, and other threads cannot access it. Therefore, there is a lock, but after just-in-time compilation, it will be safely eliminated, and synchronization will be ignored and executed directly.

public String concatString(String s1, String s2, String s3) {
    StringBuffer sb = new StringBuffer();
    sb.append("a");
    sb.append("b");
    sb.append("c");
    return sb.toString();
}
  • Lock coarsening
    Lock coarsening means that the JVM detects that a series of fragmented operations lock the same object, and the scope of lock synchronization will be coarsened to the outside of the entire sequence of operations.
    Take the above concatString() method as an example, the internal StringBuffer.append() will lock each time, and the lock will be coarsened. You only need to add a lock once before the first append() to the last append(). Up.

Conclusion

This article introduces the lock optimization of Synchronized by JVM. Spin to adaptive spin, lock elimination and lock coarsening, from lock-free to biased lock, lightweight lock, all are avoiding the switching of threads into the kernel state. Optimization for various situations,

Reference:
Java object header explain
how to calculate the correct Java objects share memory?

Java Synchronised mechanism
in-depth analysis of the realization principle of
synchronized synchronized and lock optimization
The realization principle of Java synchronized keyword

Guess you like

Origin blog.csdn.net/u014099894/article/details/102811561