Java has an in-depth understanding of AQS and CAS principles

Introduction to AQS

The full name of AQS is Abstract Queued Synchronizer, which is generally translated as synchronizer. It is a framework for implementing multi-thread synchronization functions, designed and developed by the famous Doug Lea. AQS is widely used in source code, especially in JUC (Java Util Concurrent), such as ReentrantLock, Semaphore, CountDownLatch, and ThreadPoolExecutor. Understanding AQS is crucial to our understanding of other components in JUC, and in actual development, various demand scenarios can also be realized by customizing AQS.

Note: Understanding AQS requires a certain foundation in data structures, especially double-ended queues, and a certain understanding of Unsafe.

The relationship between ReentrantLock and AQS

We mainly use ReentrantLock to understand the internal working mechanism of AQS. First start with the lock() method of ReentrantLock:

The code is very simple. It just calls the lock() method of Sync. What is this Sync?

As can be seen, Sync is an internal class in ReentrantLock. ReentrantLock does not directly inherit AQS, but extends the functions of AQS through the internal class Sync, and then the global variable reference of Sync is stored in ReentrantLock.

Sync has two implementations in ReentrantLock: NonfairSync and FairSync, which correspond to unfair locks and fair locks respectively. Taking unfair lock as an example, the implementation source code is as follows:

It can be seen that in the lock() method in unfair lock, the following operations are mainly performed:

If the variable State (synchronization state) is successfully set through CAS, it means that the current thread acquires the lock successfully, and the current thread is set as an exclusive thread.
If setting the variable State (synchronization state) through CAS fails, it means that the current lock is being held by other threads, and the Acquire method is entered for subsequent processing.

The acruire() method is defined in AQS, as follows:

The acquire() method is a relatively important method and can be broken down into 3 main steps:

The main purpose of the tryAcquire() method is to try to acquire the lock;
addWaiter() If tryAcquire() fails to try to acquire the lock, call addWaiter to add the current thread to a waiting queue;
acquireQueued processes the nodes added to the queue, tries to acquire the lock by spinning, and suspends or cancels the thread according to the situation.

The above three methods are all defined in AQS, but tryAcquire() is a bit special. Its implementation is as follows:

By default, an exception is thrown directly, so it needs to be overridden in the subclass, which means "the real logic of acquiring the lock is implemented by the subclass synchronizer itself."

The implementation of tryAcquire in ReentrantLock (unfair lock) is as follows:

explain:

Get the current thread and determine the current lock status;
If state=0, it means that it is currently in a lock-free state, update the value of the state through cas, and return true;
If the current thread is reentrant, increase the number of reentries and return true;
If none of the above conditions are met, false will be returned if the lock acquisition fails.

Finally, a picture is used to represent the ReentrantLock.lock() process:

From the figure, we can see that during the process of ReentrantLock executing lock(), most of the core logic of the synchronization mechanism has been implemented in AQS. ReentrantLock itself only needs to implement the methods under certain specific steps. This design pattern It's called template mode. If you have done Android development, you should be very familiar with this model. The execution process of the Activity life cycle has been defined in the framework, and the subclass Activity only needs to provide the corresponding implementation in the corresponding onCreate, onPause and other life cycle methods.

Note: Not only ReentrantLock, other components in the JUC package such as CountDownLatch, Semaphor, etc. all inherit AQS through an internal class Sync, and then implement synchronization internally by operating Sync. The advantage of this approach is that the thread control logic is controlled inside Sync, while the external interface provided to users is a custom lock. This aggregation relationship can well decouple the logic that the two are concerned about.

Analysis of AQS core functional principles

First, let’s take a look at several key attributes in AQS, as follows:

The code shows two important attributes in AQS, Node and state.

state lock state

state represents the current lock status. When state = 0, it indicates a lock-free state; when state>0, it indicates that a thread has obtained the lock, that is, state=1. If the same thread obtains the synchronization lock multiple times, the state will increase, such as reentry 5 times, then state=5. When releasing the lock, it also needs to be released 5 times until state=0 before other threads are eligible to obtain the lock.

Another function of state is to implement "exclusive mode or shared mode of lock" .

"Exclusive mode: Only one thread can hold the synchronization lock."

For example, in exclusive mode, we can set the initial value of state to 0. When a thread applies for a lock object, it needs to determine whether the value of state is 0. If it is not 0, it means that other threads already hold the lock, then This thread needs to block and wait.

"Shared mode: Multiple threads can hold synchronization locks."

The principle is similar in shared mode. For example, if we allow 10 threads to perform a certain operation at the same time, threads exceeding this number need to block and wait. Then you only need to determine whether the value of state is less than 10 when the thread applies for the object. If it is less than 10, add 1 to the state and continue executing the synchronization statement; if it is equal to 10, it means that there are already 10 threads performing the operation at the same time, and this thread needs to block and wait.

Node double-ended queue node

Node is a first-in-first-out double-ended queue and a waiting queue that will be entered when multi-threads are blocked in contention for resources. This queue is the core of AQS's multi-thread synchronization.

As you can see from the previous ReentrantLock diagram, there are two Node pointers in AQS, pointing to the head and tail of the queue respectively.

The main structure of Node is as follows:

By default, the linked list structure in AQS is as shown below:

Analysis of the follow-up process after failure to obtain the lock

The meaning of the lock is to enable threads that compete for the lock object to execute synchronization code. When multiple threads compete for the lock, the thread that fails the competition needs to be blocked and wait for subsequent awakening. So how does ReentrantLock make threads wait and wake up?

We mentioned earlier that in the ReentrantLock.lock() stage, the three methods tryAcquire, addWaiter, and acquireQueued will be called successively in the acquire() method for processing. tryAcquire is overridden and implemented in ReentrantLock. If it returns true, it indicates that the lock is successfully acquired and the synchronization code statement continues to be executed. But if tryAcquire returns false, that is, the current lock object is held by other threads, then how will the current thread be handled by AQS?

「addWaiter」

First, the thread that currently fails to acquire the lock will be added to the end of a waiting queue. The specific source code is as follows:

There are two situations that will cause insertion into the queue to fail:

tail is empty: It means that the queue has never been initialized, so the enq method needs to be called to insert an empty Node in the queue;
compareAndSetTail failed: It means that a thread modified this queue during the insertion process, so enq needs to be called to reinsert the current node to the end of the queue.

After the addWaiter method, the thread is added to the end of the queue in the form of Node, but the thread has not yet been blocked. The real blocking operation is judged and executed in the acquireQueued method below.

「acquireQueued」

In the acquireQueued method, the thread in the node will not be suspended immediately, so during the process of inserting the node, the thread that previously held the lock may have completed execution and released the lock, so spin is used here to try to acquire the lock again (not Let go of any optimization details). If the spin operation still does not acquire the lock! Then the thread is suspended (blocked). The source code of this method is as follows:

It can be seen that in the shouldParkAfterFailedAcquire method, it will be judged whether the current thread should be suspended. The code is as follows:

First, get the waitStatus value of the predecessor node. There are 5 values of waitStatus in Node, and their respective meanings are as follows:

Next, different operations are performed based on different values of waitStatus. The main situations are as follows:

If waitStatus is equal to SIGNAL, return true to suspend the current thread and wait for subsequent wake-up operations.
If waitStatus is greater than 0, which is the CANCLE state, this predecessor node will be deleted from the queue, and the next node that is not in the "CANCEL" state will be gradually found in the loop as the predecessor node of the current node.
If waitStatus is neither SIGNAL nor CANCEL, the predecessor node status of the current node is set to SIGNAL. The advantage of this is that the next time shouldParkAfterFailedAcquire is executed, true can be returned directly and the thread is suspended.

The code returns to acquireQueued. If shouldParkAfterFailedAcquire returns true, indicating that the thread needs to be suspended, then the parkAndCheckInterrupt method will continue to be called to execute the real blocking thread code, as follows:

This method is relatively simple, just calling the park method in LockSupport. The Unsafe API is called in the LockSupport.park() method to execute the underlying native method to suspend the thread. The code has reached the operating system level at this point, and there is no need to analyze it in depth.

At this point, the general process of acquiring the lock has been analyzed. The entire process is summarized as follows:

The template method acquire of AQS acquires the lock by calling tryAcquire of the subclass's custom implementation;
If acquisition of the lock fails, the thread is constructed into a Node node and inserted into the end of the synchronization queue through the addWaiter method;
In the acquireQueued method, try to acquire the lock by spinning. If it fails, determine whether the current thread needs to be blocked. If it needs to be blocked, finally execute the native API in LockSupport(Unsafe) to implement thread blocking.

Analysis of lock release process

The thread blocked in the above locking phase needs to be awakened before it can be executed again. So when does AQS try to wake up the blocked thread in the waiting queue?

Like the locking process, releasing the lock needs to start with the ReentrantLock.unlock() method:

It can be seen that the tryRelease method is first called to try to release the lock. If successful, the unparkSuccessor method in AQS will eventually be called to release the lock. The specific implementation of unparkSuccessor is as follows:

explain:

First get the status of the current node (actually the head node is passed in). If the next node of the head node is null, or the status of the next node is CANCEL, start traversing from the end of the waiting queue until you find the first waitStatus Nodes less than 0.

If the node finally traversed is not null, call the LockSupport.unpark method again and call the underlying method to wake up the thread. At this point, the timing of the thread being awakened has also been analyzed.

What I have to say about CAS

Whether in the locking or lock-releasing phase, a common operation is mentioned many times: compareAndSetXXX. This operation will eventually call the API in Unsafe to perform CAS operations.

The full name of CAS is Compare And Swap, which is translated as compare and replace. It is a common technology to achieve concurrency security through hardware. The bottom layer uses the CAS instruction of the CPU to lock the cache or the bus to achieve communication between multiple processors. Atomic operations.

Its implementation process mainly has three operands: memory value V, old expected value E, and new value U to be modified. If and only if the expected value E and memory value V are the same, the memory value V will be modified to U. , otherwise do nothing.

The bottom layer of CAS will select the corresponding calling code according to the operating system and processor. Taking Windows and X86 processors as examples, if it is a multi-processor, the cache or bus lock is locked through the cmpxchg instruction with the lock prefix. To implement atomic operations between multi-processors; if it is a single processor, the atomic operations are completed through the cmpxchg instruction.

Custom AQS

After understanding the design idea of AQS, we can then implement our own synchronization implementation mechanism by customizing AQS.

MyLock in the code is the simplest exclusive lock. By using MyLock, the same functions as synchronized and ReentrantLock can be achieved. For example, the following code:

The final printed count value is 20000, indicating that the two threads are thread-safe synchronization operations.

Summarize

Generally speaking, AQS is a framework. Most of the logic required for synchronization has been encapsulated inside the framework. A status indicator state and a waiting queue Node are maintained inside AQS, and operations through state are divided into two types. : Exclusive and shared, which leads to two different implementations of AQS: exclusive locks (ReentrantLock, etc.) and shared locks (CountDownLatch, read-write locks, etc.). This lesson mainly analyzes the process of locking and releasing locks in AQS from the perspective of exclusive locks.

Understanding the principles of AQS is helpful to understand the basis for the implementation of other components in the JUC package, and understanding its principles can better expand its functions. Upper-level developers can extend this framework to implement locks suitable for different scenarios and functions. Several of the methods that may require subclass synchronizer implementation are as follows.

lock()。
tryAcquire(int): exclusive mode. Try to obtain the resource, returning true if successful and false if failed.
tryRelease(int): exclusive mode. Try to release the resource, returning true if successful and false if failed.
tryAcquireShared(int): sharing method. Try to get resources. A negative number indicates failure; 0 indicates success, but no resources remain available; a positive number indicates success, and there are remaining resources.
tryReleaseShared(int): Sharing mode. Try to release the resource. If the subsequent waiting node is allowed to be awakened after the release, it returns true, otherwise it returns false.