Understand AQS (AbstractQueuedSynchronizer) from the source code level in 10 minutes

foreword

In the last article, 15,000 words, 6 code cases, and 5 schematic diagrams let you fully understand Synchronized. It is said that synchronized is realized by object monitor

In the object monitor, the blocking queue is implemented by the cxq stack and the entry list, and the wait set implements the waiting queue, thereby realizing the synchronized waiting/notification mode

The JUC concurrent package in the JDK also implements the waiting/notification mode through similar blocking queues and waiting queues

This article will talk about AQS (AbstractQueuedSynchronizer), the cornerstone of JUC

Pre-knowledge required: CAS, volatile

If you don't know CAS, you can read the previous article about synchronized (link above)

If you don’t understand volatile, you can read this article with 5 cases and flowcharts to let you understand the volatile keyword from 0 to 1

This article focuses on AQS, and describes the data structure, design, and source code process of acquiring and releasing synchronization status in AQS, Condition, etc.

It takes about 10 minutes to watch this article, and you can watch it with a few questions

What is AQS and what is it used for?
What data structure is AQS implemented using?
How does AQS acquire/release the synchronization state?
What other features does AQS have besides the synchronized function?
How does AQS implement unfair locks and fair locks?
What is a Condition? What is its relationship with AQS?

AQS data structure

What is AQS?

AQS is a synchronization queue (blocking queue), which is the basis of concurrent packages. The bottom layer of many synchronization components in concurrent packages is implemented using AQS, such as: ReentrantLock, read-write lock, semaphore, etc...

AQS has three important fields, namely: head head node, tail tail node, state synchronization status

public abstract class AbstractQueuedSynchronizer
    extends AbstractOwnableSynchronizer
    implements java.io.Serializable {
    //头节点
    private transient volatile Node head;
    //尾节点
    private transient volatile Node tail;
    //同步状态
    private volatile int state;   
}

The head and tail nodes are easy to understand, because AQS itself is a doubly linked list, so what is the state synchronization state?

AQS uses the synchronization state to represent resources, and then uses CAS to obtain/release resources, such as setting the resource to 1, and a thread tries to obtain the resource. Since the synchronization state is currently 1, the thread CAS replaces the synchronization state to 0. After success, it indicates After the resource is acquired, other threads will not be able to acquire the resource (status is 0) until the thread that acquires the resource releases the resource

The above acquisition/release of resources can also be understood as acquisition/release of locks

At the same time, the three fields are all modified by volatile, and volatile is used to ensure memory visibility and prevent the current thread from being aware of other threads modifying the data.

Through the above description, we can know that AQS looks like this

When a thread fails to acquire resources, it will be constructed as a node and added to AQS

Node Node is an internal class in AQS, let’s take a look at some important fields in Node

static final class Node {
        //节点状态
        volatile int waitStatus;
    
        //前驱节点
        volatile Node prev;

        //后继节点
        volatile Node next;
        
        //当前节点所代表的线程
        volatile Thread thread;

        //等待队列使用时的后继节点指针
        Node nextWaiter;
}

prev, next, thread should be easy to understand

Both the AQS synchronization queue and the waiting queue use this kind of node. When the waiting queue node is awakened out of the queue, it is convenient to join the synchronization queue

nextWaiter is used for nodes to point to the next node in the waiting queue

waitStatus indicates the status of the node

state	illustrate
INITIAL	0 initial state
CANCELLED	1 The thread corresponding to this node cancels scheduling
SIGNAL	-1 The thread corresponding to this node is blocked, waiting to wake up competing resources
CONDITION	-2 The node is in the waiting (condition) queue, waiting to be woken up and dequeuing from the waiting queue to enter the synchronization queue competition
PROPAGATE	-3 In the case of sharing, all subsequent sharing nodes will be woken up

It doesn't matter if you don't understand the status, we will talk about it later

After the above description, the node probably grows like this

There is another internal class in AQS ConditionObjectfor implementing waiting queues/conditional queues, which we will talk about later

AQS can be divided into exclusive and shared modes, and these two modes can also support response interruption and nanosecond timeout

Exclusive mode can be understood as only one thread can obtain the synchronization state at the same time

The shared mode can be understood as multiple threads can obtain the synchronization state, and the commonly used sharedidentifiers in the method

Commonly used in the method acquireto identify the synchronization state and releaseto release the synchronization state

These methods are template methods, specify the process, and leave the specific implementation to the implementation class (such as obtaining the synchronization status, how to get it to the implementation class to implement)

Exclusive ceremony

Exclusive mode actually means that only one thread is allowed to monopolize the resource at any time. In the case of multi-thread competition, only one thread can obtain the synchronization status successfully.

get sync status

Exclusive acquisition that does not respond to interrupts is similar to responding to interrupts and timeouts. Let's acquiretake the source code as an example

    public final void acquire(int arg) {
        if (!tryAcquire(arg) &&
            acquireQueued(addWaiter(Node.EXCLUSIVE), arg))
            selfInterrupt();
    }

tryAcquireThe method is used to try to obtain the synchronization state. The parameter arg indicates how many synchronization states to obtain. If the acquisition is successful and returns true, the method will exit and leave it to the implementation class to implement

addWaiter

addWaiter(Node.EXCLUSIVE) builds an exclusive node and adds it to the end of AQS by means of CAS+ failure retry

    private Node addWaiter(Node mode) {
        //构建节点
        Node node = new Node(Thread.currentThread(), mode);
        //尾节点不为空则CAS替换尾节点
        Node pred = tail;
        if (pred != null) {
            node.prev = pred;
            if (compareAndSetTail(pred, node)) {
                pred.next = node;
                return node;
            }
        }
        //尾节点为空或则CAS失败执行enq
        enq(node);
        return node;
    }

    private Node enq(final Node node) {
        //失败重试
        for (;;) {
            Node t = tail;
            //没有尾节点 则CAS设置头节点（头尾节点为一个节点），否则CAS设置尾节点
            if (t == null) { // Must initialize
                if (compareAndSetHead(new Node()))
                    tail = head;
            } else {
                node.prev = t;
                if (compareAndSetTail(t, node)) {
                    t.next = node;
                    return t;
                }
            }
        }
    }

enqThe method mainly uses spin (will not enter the waiting mode halfway) to CAS to set the tail node. If there is no node in AQS, the head and tail nodes are the same node

Since there is competition for adding to the tail node, it is necessary to replace the tail node with CAS

acquireQueued

acquireQueuedThe method is mainly used for the nodes in the AQS queue to spin to obtain the synchronization state. In this spin, it is not always executed, but will be parked and waited.

final boolean acquireQueued(final Node node, int arg) {
    //记录是否失败
    boolean failed = true;
    try {
        //记录是否中断过
        boolean interrupted = false;
        //失败重试 
        for (;;) {
            //p 前驱节点
            final Node p = node.predecessor();
            //如果前驱节点为头节点，并尝试获取同步状态成功则返回
            if (p == head && tryAcquire(arg)) {
                //设置头节点
                setHead(node);
                p.next = null; // help GC
                failed = false;
                return interrupted;
            }
            //失败则设置下标记然后进入等待检查中断
            if (shouldParkAfterFailedAcquire(p, node) &&
                parkAndCheckInterrupt())
                interrupted = true;
        }
    } finally {
        //如果失败则取消获取
        if (failed)
            cancelAcquire(node);
    }
}

There is a condition before trying to get the synchronization state p == head && tryAcquire(arg): the predecessor node is the head node

Therefore, the node acquisition status in AQS is FIFO

But even if the precursor node is the head node, it may not be possible to obtain the synchronization state successfully, because threads that have not yet joined AQS may also try to obtain the synchronization state to achieve unfair locks

So how to achieve fair lock?

Just add this condition before trying to get the synchronization state!

Let's take a look again. shouldParkAfterFailedAcquireAfter failing to get the synchronization status, it should be parked.

private static boolean shouldParkAfterFailedAcquire(Node pred, Node node) {
    //前驱节点状态
    int ws = pred.waitStatus;
    if (ws == Node.SIGNAL)
        //前驱节点状态是SIGNAL 说明前驱释放同步状态回来唤醒 直接返回
        return true;
    if (ws > 0) {
        //如果前驱状态大于0 说明被取消了,就一直往前找,找到没被取消的节点
        do {
            node.prev = pred = pred.prev;
        } while (pred.waitStatus > 0);
        //排在没被取消的节点后面
        pred.next = node;
    } else {
        //前驱没被取消,而且状态不是SIGNAL CAS将状态更新为SIGNAL,释放同步状态要来唤醒
        compareAndSetWaitStatus(pred, ws, Node.SIGNAL);
    }
    return false;
}

In fact, some preparations before the park

Let's take a look again parkAndCheckInterrupt, use the tool class to enter the waiting state, and check whether it is interrupted after being woken up

private final boolean parkAndCheckInterrupt() {
        //线程进入等待状态... 
        LockSupport.park(this);
         //检查是否中断 (会清除中断标记位)
        return Thread.interrupted();
}

In acquireQueuedthe eventual cancelAcquirecancellation is performed if the synchronization state is not acquired and an exception is thrown

When the interrupt is sensed, return true to go back, come to the first layer acquiremethod execution selfInterruptmethod, and interrupt the thread by yourself

acquire flow chart:

If the first attempt to obtain the synchronization status fails, CAS+ fails and retry is added to the end of AQS

If the predecessor node is the head node and the synchronization status is successfully obtained, it will return, otherwise it will enter the waiting state and wait for wake-up, and try again after wake-up

An exception occurred during 2 to cancel the current node

release sync state

Release the synchronization state first. After success, the state of the head node is not 0. Wake up the next state that is not a canceled node.

public final boolean release(int arg) {
    //释放同步状态
    if (tryRelease(arg)) {
        Node h = head;
        if (h != null && h.waitStatus != 0)
            //唤醒下一个状态不大于0（大于0就是取消）的节点
            unparkSuccessor(h);
        return true;
    }
    return false;
}

response interrupt

acquireInterruptiblyGet Synchronization Status for Responding to Interrupts

public final void acquireInterruptibly(int arg)
        throws InterruptedException {
    //查看是否被中断，中断抛出异常
    if (Thread.interrupted())
        throw new InterruptedException();
    if (!tryAcquire(arg))
        doAcquireInterruptibly(arg);
}

doAcquireInterruptiblySimilar to the original process, it throws an interrupt exception when it is detected to be interrupted after being woken up

    private void doAcquireInterruptibly(int arg)
        throws InterruptedException {
        final Node node = addWaiter(Node.EXCLUSIVE);
        boolean failed = true;
        try {
            for (;;) {
                final Node p = node.predecessor();
                if (p == head && tryAcquire(arg)) {
                    setHead(node);
                    p.next = null; // help GC
                    failed = false;
                    return;
                }
                if (shouldParkAfterFailedAcquire(p, node) &&
                    parkAndCheckInterrupt())
                    //被唤醒后检查到被中断时抛出中断异常
                    throw new InterruptedException();
            }
        } finally {
            if (failed)
                cancelAcquire(node);
        }
    }

When the acquisition synchronization status of the response interrupt is interrupted, an interrupt exception will be thrown directly, and the one that does not respond is the interrupt itself

response timeout

Response timeout to obtain synchronization status using tryAcquireNanosthe timeout time is nanosecond level

public final boolean tryAcquireNanos(int arg, long nanosTimeout)
        throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    return tryAcquire(arg) ||
        doAcquireNanos(arg, nanosTimeout);
}

It can be seen that the response timeout will also respond to interruption

doAcquireNanosAlso similar to the original process

    private boolean doAcquireNanos(int arg, long nanosTimeout)
            throws InterruptedException {
        if (nanosTimeout <= 0L)
            return false;
        final long deadline = System.nanoTime() + nanosTimeout;
        final Node node = addWaiter(Node.EXCLUSIVE);
        boolean failed = true;
        try {
            for (;;) {
                final Node p = node.predecessor();
                if (p == head && tryAcquire(arg)) {
                    setHead(node);
                    p.next = null; // help GC
                    failed = false;
                    return true;
                }
                //还有多久超时
                nanosTimeout = deadline - System.nanoTime();
                if (nanosTimeout <= 0L)
                    //已超时
                    return false;
                if (shouldParkAfterFailedAcquire(p, node) &&
                    //大于1ms
                    nanosTimeout > spinForTimeoutThreshold)
                    //超时等待
                    LockSupport.parkNanos(this, nanosTimeout);
                //响应中断
                if (Thread.interrupted())
                    throw new InterruptedException();
            }
        } finally {
            if (failed)
                cancelAcquire(node);
        }
    }

The response timeout will calculate how long the timeout is left during the spin period. If it is greater than 1ms, it will wait for the corresponding time, otherwise it will continue to spin and respond to the interruption at the same time.

shared

Sharing is to allow multiple threads to acquire certain resources at the same time, such as semaphores and read locks are realized by sharing

In fact, the shared and exclusive processes are similar, but the implementation of trying to obtain the synchronization state is different

We use a method to get the synchronization state to illustrate

Shared access to synchronized state usingacquireShared

public final void acquireShared(int arg) {
    if (tryAcquireShared(arg) < 0)
        doAcquireShared(arg);
}

tryAcquireSharedTry to obtain the synchronization state, the parameter arg indicates how many synchronization states to obtain, and return the number of remaining synchronization states that can be obtained

If the number of remaining synchronization states that can be obtained is less than 0, it means that it has not been successfully entereddoAcquireShared

    private void doAcquireShared(int arg) {
        //添加共享式节点
        final Node node = addWaiter(Node.SHARED);
        boolean failed = true;
        try {
            boolean interrupted = false;
            for (;;) {
                //获取前驱节点
                final Node p = node.predecessor();
                if (p == head) {
                    int r = tryAcquireShared(arg);
                    if (r >= 0) {
                        //如果前驱节点为头节点 并且 获取同步状态成功 设置头节点
                        setHeadAndPropagate(node, r);
                        p.next = null; // help GC
                        if (interrupted)
                            selfInterrupt();
                        failed = false;
                        return;
                    }
                }
                //获取失败进入会等待的自旋
                if (shouldParkAfterFailedAcquire(p, node) &&
                    parkAndCheckInterrupt())
                    interrupted = true;
            }
        } finally {
            if (failed)
                cancelAcquire(node);
        }
    }

The method of responding to interruption, timeout, etc. is also similar to the exclusive method, but some setting details are different

Condition

As mentioned above, AQS acts as a blocking (synchronous) queue, and Condition acts as a waiting queue

The internal class ConditionObject of AQS is the implementation of Condition, which acts as a waiting queue and uses fields to record the head and tail nodes

public class ConditionObject implements Condition{
        //头节点
        private transient Node firstWaiter;
        //尾节点
        private transient Node lastWaiter;  
}

Nodes use nextWait to point to the next node to form a one-way linked list

At the same time, a series of methods are provided awaitto let the current thread enter the wait, and signala series of methods to wake up

        public final void await() throws InterruptedException {
            //响应中断
            if (Thread.interrupted())
                throw new InterruptedException();
            //添加到末尾 不需要保证原子性，因为能指向await一定是获取到同步资源的
            Node node = addConditionWaiter();
            //释放获取的同步状态
            int savedState = fullyRelease(node);
            int interruptMode = 0;
            //不在同步队列就park进入等待
            while (!isOnSyncQueue(node)) {
                LockSupport.park(this);
                if ((interruptMode = checkInterruptWhileWaiting(node)) != 0)
                    break;
            }
            //被唤醒后自旋获取同步状态
            if (acquireQueued(node, savedState) && interruptMode != THROW_IE)
                interruptMode = REINTERRUPT;
            //取消后清理
            if (node.nextWaiter != null) // clean up if cancelled
                unlinkCancelledWaiters();
            if (interruptMode != 0)
                reportInterruptAfterWait(interruptMode);
        }

await mainly adds the node to the end of the condition object, releases the obtained synchronization state, enters the waiting state, and spins to obtain the synchronization state after waking up

The main logic of signal is in transferForSignal

    final boolean transferForSignal(Node node) {
        //CAS修改节点状态 失败返回 变成取消
        if (!compareAndSetWaitStatus(node, Node.CONDITION, 0))
            return false;
        //加入AQS末尾
        Node p = enq(node);
        int ws = p.waitStatus;
        //CAS将节点状态修改为SIGNAL 成功则唤醒节点
        if (ws > 0 || !compareAndSetWaitStatus(p, ws, Node.SIGNAL))
            LockSupport.unpark(node.thread);
        return true;
    }

signal mainly changes the state from -2condition to 0 (if it fails, cancel the node), then add it to the end of AQS, and finally change the state to -1 signal, and wake up the node if it succeeds

Why add AQS at the end or use enq to CAS+ failure retry operation to ensure atomicity?

Because multiple ConditionObjects are allowed, one AQS synchronization queue may correspond to multiple Condition waiting (condition) queues

Summarize

This article takes AQS as the core, and describes the data structure, design ideas, source-level process of acquiring/releasing synchronization resources, Condition, etc. implemented by AQS in simple terms.

AQS uses head and tail nodes to implement two-way queues, provides template methods for synchronization status and acquisition/release of synchronization status to implement blocking (synchronization) queues, and these fields are modified with volatile to ensure visibility and read scenarios, without guarantees Atomicity, CAS is often used to ensure atomicity in writing scenarios

AQS and Condition use the same type of nodes. In AQS, the nodes are maintained as a two-way linked list, and in Condition, the nodes are maintained as a one-way linked list. In addition to maintaining the pointing relationship, the nodes also need to record the corresponding thread and node status

AQS is divided into exclusive and shared. When using the exclusive type, only one thread is allowed to obtain the synchronization state, and when the shared type is used, multiple threads are allowed to obtain the synchronization state; it also provides similar methods for responding to interrupts and waiting for timeouts

Get the synchronization status: first try to get the synchronization status, if it fails, add the node to the end of the AQS by CAS+failure retry, and wait to be woken up by the predecessor node; only when the current precursor node is the head node and the synchronization status is successfully obtained, it will return, otherwise it will enter the wait , continue to try (spin) after being woken up; if an exception occurs during this period, the node will be canceled before an exception is thrown

Release the synchronization state: try to release the synchronization state, and wake up the subsequent nodes that have not been canceled after success

When obtaining the synchronization state, the interrupt flag will be checked after being woken up. If it responds to the interrupt, it will directly throw an interrupt exception. If it does not respond, it will interrupt itself at the outermost layer.

When the response times out, the time is counted during the spin acquisition synchronization state. If the timeout is less than 1ms, it will not enter the waiting spin, and if it is greater than that, it will wait for the corresponding time

AQS acts as a blocking queue, and Condition acts as its waiting queue to implement the waiting/notification mode. The internal class ConditionObject of AQS will join the end of Condition when awaiting and release the synchronization state to enter the waiting queue. After being awakened, it spins (failure will enter waiting) Obtain the synchronization state; when in single, CAS will add the condition head node and add it to the end of AQS and then wake it up (because one AQS may correspond to multiple Conditions, so CAS is required to ensure atomicity)

Finally (don’t go whoring for nothing, beg three times with one click~)

This article is included in the column from point to line, from line to surface, and builds a Java concurrent programming knowledge system in simple terms . Interested students can continue to pay attention.

The notes and cases of this article are included in gitee-StudyJava and github-StudyJava . Interested students can continue to pay attention under stat~

If you have any questions, you can communicate in the comment area. If you think Cai Cai is well written, you can like, follow, and bookmark to support~

Pay attention to Cai Cai, share more dry goods, official account: Cai Cai's back-end private kitchen

This article is published by OpenWrite, a multi-post platform for blogging !