Java concurrent programming: Synchronized underlying optimization (biased lock, lightweight lock)

Java concurrent programming: Synchronized underlying optimization (biased lock, lightweight lock)

 

Java Concurrent Programming Series:

1. Heavyweight lock

  In the last article, I introduced the usage of Synchronized and the principle of its implementation. Now we should know that Synchronized is achieved through a lock called a monitor inside the object. But the essence of the monitor lock is realized by relying on the Mutex Lock of the underlying operating system. The operating system needs to switch from user state to core state to switch between threads. This cost is very high, and the transition between states takes a relatively long time, which is why Synchronized is inefficient. Therefore, this kind of lock that depends on the implementation of the operating system Mutex Lock is called "heavyweight lock". The core of all the optimizations made to Synchronized in JDK are to reduce the use of such heavyweight locks. After JDK1.6, in order to reduce the performance consumption caused by acquiring and releasing locks and improve performance, "lightweight locks" and "biased locks" were introduced.

2. Lightweight lock 

  There are four lock states in total: no lock state, biased lock, lightweight lock, and heavyweight lock. With the competition of locks, locks can be upgraded from biased locks to lightweight locks, and then upgraded heavyweight locks (but the upgrade of locks is one-way, that is to say, it can only be upgraded from low to high, and there will be no locks downgrade). Biased locks and lightweight locks are enabled by default in JDK 1.6. We can also disable biased locks by -XX:-UseBiasedLocking. The state of the lock is stored in the header file of the object, taking the 32-bit JDK as an example:

lock status

25 bit

4bit

1bit

2bit

23bit

2bit

Is it a biased lock?

lock flag

Lightweight lock

pointer to the lock record on the stack

00

heavyweight lock

pointer to mutex (heavyweight lock)

10

GC marker

null

11

Bias lock

thread ID

Epoch

object generation age

1

01

no lock

the hashCode of the object

object generation age

0

01

  "Lightweight" is relative to traditional locks implemented using operating system mutexes. However, it should be emphasized first that lightweight locks are not used to replace heavyweight locks. Its original intention is to reduce the performance consumption of traditional heavyweight locks without multi-threaded competition. Before explaining the execution process of lightweight locks, let's understand that the scenario to which lightweight locks are adapted is the situation where threads execute synchronized blocks alternately. If there is a situation where the same lock is accessed at the same time, it will lead to lightweight lock expansion. For heavyweight locks.

1. The locking process of lightweight locks

  (1) When the code enters the synchronization block, if the lock state of the synchronization object is a lock-free state (the lock flag is "01", whether it is a biased lock is "0"), the virtual machine will first place the current thread in the stack frame. A space named Lock Record is created in the lock record to store a copy of the current Mark Word of the lock object, which is officially called Displaced Mark Word. At this time, the state of the thread stack and object header is shown in Figure 2.1.

  (2) Copy the Mark Word in the object header to the lock record.

  (3) After the copy is successful, the virtual machine will use the CAS operation to try to update the Mark Word of the object to a pointer to the Lock Record, and point the owner pointer in the Lock record to the object mark word. If the update is successful, go to step (3), otherwise go to step (4).

  (4) If the update action is successful, then the thread owns the lock of the object, and the lock flag of the object Mark Word is set to "00", which means that the object is in a lightweight lock state, and the thread stack is at this time. The state with the object header is shown in Figure 2.2.

  (5) If the update operation fails, the virtual machine will first check whether the Mark Word of the object points to the stack frame of the current thread. If it is, it means that the current thread already owns the lock of the object, then it can directly enter the synchronization block to continue execution. . Otherwise, it means that multiple threads are competing for locks, and the lightweight lock will expand into a heavyweight lock. The status value of the lock flag becomes "10". The pointer to the heavyweight lock (mutex) is stored in the Mark Word. Threads waiting for the lock are also blocked. The current thread tries to use spin to acquire the lock. Spin is the process of using a loop to acquire the lock in order not to block the thread.

 

                     图2.1 轻量级锁CAS操作之前堆栈与对象的状态

   

                      图2.2 轻量级锁CAS操作之后堆栈与对象的状态

2、轻量级锁的解锁过程:

  (1)通过CAS操作尝试把线程中复制的Displaced Mark Word对象替换当前的Mark Word。

  (2)如果替换成功,整个同步过程就完成了。

  (3)如果替换失败,说明有其他线程尝试过获取该锁(此时锁已膨胀),那就要在释放锁的同时,唤醒被挂起的线程。

三、偏向锁

  引入偏向锁是为了在无多线程竞争的情况下尽量减少不必要的轻量级锁执行路径,因为轻量级锁的获取及释放依赖多次CAS原子指令,而偏向锁只需要在置换ThreadID的时候依赖一次CAS原子指令(由于一旦出现多线程竞争的情况就必须撤销偏向锁,所以偏向锁的撤销操作的性能损耗必须小于节省下来的CAS原子指令的性能消耗)。上面说过,轻量级锁是为了在线程交替执行同步块时提高性能,而偏向锁则是在只有一个线程执行同步块时进一步提高性能。

1、偏向锁获取过程:

  (1)访问Mark Word中偏向锁的标识是否设置成1,锁标志位是否为01——确认为可偏向状态。

  (2)如果为可偏向状态,则测试线程ID是否指向当前线程,如果是,进入步骤(5),否则进入步骤(3)。

  (3)如果线程ID并未指向当前线程,则通过CAS操作竞争锁。如果竞争成功,则将Mark Word中线程ID设置为当前线程ID,然后执行(5);如果竞争失败,执行(4)。

  (4)如果CAS获取偏向锁失败,则表示有竞争。当到达全局安全点(safepoint)时获得偏向锁的线程被挂起,偏向锁升级为轻量级锁,然后被阻塞在安全点的线程继续往下执行同步代码。

  (5)执行同步代码。

2、偏向锁的释放:

  偏向锁的撤销在上述第四步骤中有提到偏向锁只有遇到其他线程尝试竞争偏向锁时,持有偏向锁的线程才会释放锁,线程不会主动去释放偏向锁。偏向锁的撤销,需要等待全局安全点(在这个时间点上没有字节码正在执行),它会首先暂停拥有偏向锁的线程,判断锁对象是否处于被锁定状态,撤销偏向锁后恢复到未锁定(标志位为“01”)或轻量级锁(标志位为“00”)的状态。

3、重量级锁、轻量级锁和偏向锁之间转换

 

                                        图 2.3三者的转换图

  该图主要是对上述内容的总结,如果对上述内容有较好的了解的话,该图应该很容易看懂。

四、其他优化 

1、适应性自旋(Adaptive Spinning):从轻量级锁获取的流程中我们知道当线程在获取轻量级锁的过程中执行CAS操作失败时,是要通过自旋来获取重量级锁的。问题在于,自旋是需要消耗CPU的,如果一直获取不到锁的话,那该线程就一直处在自旋状态,白白浪费CPU资源。解决这个问题最简单的办法就是指定自旋的次数,例如让其循环10次,如果还没获取到锁就进入阻塞状态。但是JDK采用了更聪明的方式——适应性自旋,简单来说就是线程如果自旋成功了,则下次自旋的次数会更多,如果自旋失败了,则自旋的次数就会减少。

2、锁粗化(Lock Coarsening):锁粗化的概念应该比较好理解,就是将多次连接在一起的加锁、解锁操作合并为一次,将多个连续的锁扩展成一个范围更大的锁。举个例子:

copy code
 1 package com.paddx.test.string;
 2 
 3 public class StringBufferTest {
 4     StringBuffer stringBuffer = new StringBuffer();
 5 
 6     public void append(){
 7         stringBuffer.append("a");
 8         stringBuffer.append("b");
 9         stringBuffer.append("c");
10     }
11 }
copy code

  这里每次调用stringBuffer.append方法都需要加锁和解锁,如果虚拟机检测到有一系列连串的对同一个对象加锁和解锁操作,就会将其合并成一次范围更大的加锁和解锁操作,即在第一次append方法时进行加锁,最后一次append方法结束后进行解锁。

3、锁消除(Lock Elimination):锁消除即删除不必要的加锁操作。根据代码逃逸技术,如果判断到一段代码中,堆上的数据不会逃逸出当前线程,那么可以认为这段代码是线程安全的,不必要加锁。看下面这段程序:

copy code
 1 package com.paddx.test.concurrent;
 2 
 3 public class SynchronizedTest02 {
 4 
 5     public static void main(String[] args) {
 6         SynchronizedTest02 test02 = new SynchronizedTest02();
 7         //启动预热
 8         for (int i = 0; i < 10000; i++) {
 9             i++;
10         }
11         long start = System.currentTimeMillis();
12         for (int i = 0; i < 100000000; i++) {
13             test02.append("abc", "def");
14         }
15         System.out.println("Time=" + (System.currentTimeMillis() - start));
16     }
17 
18     public void append(String str1, String str2) {
19         StringBuffer sb = new StringBuffer();
20         sb.append(str1).append(str2);
21     }
22 }
copy code

虽然StringBuffer的append是一个同步方法,但是这段程序中的StringBuffer属于一个局部变量,并且不会从该方法中逃逸出去,所以其实这过程是线程安全的,可以将锁消除。下面是我本地执行的结果:

  为了尽量减少其他因素的影响,这里禁用了偏向锁(-XX:-UseBiasedLocking)。通过上面程序,可以看出消除锁以后性能还是有比较大提升的。

  Note: The execution results may be different between JDK versions. The JDK version I use here is 1.6.

V. Summary 

  This article focuses on the optimization of Synchronized by using lightweight locks and biased locks in JDk, but these two locks are not completely without shortcomings. There is one more lock escalation process. At this time, you need to disable the biased lock through -XX:-UseBiasedLocking. Here is a comparison of these locks:

Lock

advantage

shortcoming

Applicable scene

Bias lock

Locking and unlocking do not require additional consumption, and there is only a nanosecond gap compared to executing asynchronous methods.

If there is lock competition between threads, it will bring additional lock revocation consumption.

Suitable for scenarios where only one thread accesses the synchronized block.

Lightweight lock

The competing threads will not block, which improves the response speed of the program.

Using spin will consume CPU if threads that never get lock contention.

Pursue response time.

Synchronized blocks execute very fast.

heavyweight lock

Thread contention does not use spin and does not consume CPU.

The thread is blocked and the response time is slow.

Pursue throughput.

Synchronized blocks execute longer.

 

 references:

http://www.iteye.com/topic/1018932

http://www.infoq.com/cn/articles/java-se-16-synchronized

http://frank1234.iteye.com/blog/2163142

https://www.artima.com/insidejvm/ed2/threadsynch3.html

http://www.tuicool.com/articles/2aeAZn

 Author: liuxiaopeng

 Blog address: http://www.cnblogs.com/paddix/

 Disclaimer: For reprinting, please provide the original link in an obvious position on the article page. 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326286256&siteId=291194637