Do you really understand AQS?

The full name of AQS is AbstractQueuedSynchronizer, and most of the blocking locks/synchronizers used in Java are implemented based on it.

introduce

AQS, that is, AbstractQueuedSynchronizer, JDK official documentation describes AQS in this way. AQS is a framework for implementing blocking locks and synchronizers based on FIFO (first-in-first-out) queues. For example, the ReentrantLock we use in our daily development is implemented based on AQS, and from its source code we can also see that the main implementation logic of ReentrantLock is also placed in AQS.

Here the author sorts out the description of AQS in the official document, which is mainly divided into two parts:

The first part is the design and principle of AQS

  • AQS is a framework for implementing blocking locks and synchronizers based on FIFO (first-in-first-out) queues.
  • AQS relies on an atomic variable to represent its state state, and defines the acquire acquire synchronizer (lock) operation and release release synchronizer (lock) operation through the state, where the update operation of the state state can only be performed through getState, setState and Atomic methods such as compareAndSetState.
  • AQS supports exclusive mode (EXCLUSIVE) (default) and shared mode (SHARED).
  • AQS supports Condition condition queue, and defines the implementation class CondtionObject of Condition.

The so-called exclusive mode (EXCLUSIVE) means to acquire/wait for resources in an exclusive manner, while the shared mode (SHARED) means to acquire/wait for resources in a shared manner

The second part is the implementation of AQS

In the implementation class of AQS, we need to define the following protected-level methods, namely tryAcquire, tryRelease, tryAcquireShared, tryReleaseShared, isHeldExclusively (different characteristics can be realized by checking/modifying the state), and its implementation must be guaranteed to be thread-safe , short-lived, and non-blocking (by default, these methods throw UnsupportedOperationException).

The above protected method is not modified with abstract but simply throws UnsupportedOperationException to allow developers to develop as little code as possible, because not all blocking locks/synchronizers need to implement all protected methods, generally Choose according to whether the blocking lock/synchronizer is in exclusive mode (EXCLUSIVE) or shared mode (SHARED).

In addition, in AQS, the track record of the resource holding thread (which can be used to implement the reentrant mechanism) will be realized by inheriting AbstractOwnableSynchronizer.

public abstract class AbstractOwnableSynchronizer
    implements java.io.Serializable {

    /**
     * Empty constructor for use by subclasses.
     */
    protected AbstractOwnableSynchronizer() {}

    /**
     * The current owner of exclusive mode synchronization.
     */
    private transient Thread exclusiveOwnerThread;

    /**
     * Sets the thread that currently owns exclusive access.
     * A {@code null} argument indicates that no thread owns access.
     * This method does not otherwise impose any synchronization or
     * {@code volatile} field accesses.
     * @param thread the owner thread
     */
    protected final void setExclusiveOwnerThread(Thread thread) {
        exclusiveOwnerThread = thread;
    }

    /**
     * Returns the thread last set by {@code setExclusiveOwnerThread},
     * or {@code null} if never set.  This method does not otherwise
     * impose any synchronization or {@code volatile} field accesses.
     * @return the owner thread
     */
    protected final Thread getExclusiveOwnerThread() {
        return exclusiveOwnerThread;
    }
}
复制代码

The code of AbstractOwnableSynchronizer is relatively short and easy to read. From the code level, it is to provide a place where we can temporarily store threads. In addition, in this way we can implement a reentrant mechanism (use the exclusiveOwnerThread variable to record the current holding thread and compare it with subsequent acquiring threads), or ensure the consistency of resource acquisition and release threads, etc.

Finally, here, based on the above description and my own understanding, the author wrote a simple version of AQS, which also implements the consistency of resource threads by inheriting AbstractOwnableSynchronizer.

public class AQSImpl extends AbstractOwnableSynchronizer {

    private final AtomicInteger state = new AtomicInteger(0);

    public int getState() {
        return this.state.get();
    }

    public void setState(int state) {
        this.state.set(state);
    }

    public boolean compareAndSetState(int expect, int update) {
        return state.compareAndSet(expect, update);
    }

    public void acquire(int args) {
        while (!compareAndSetState(0, args)) {
            // self-loop
        }
        super.setExclusiveOwnerThread(Thread.currentThread());
    }

    // 此处是线程安全的,因为只有加锁的线程才能执行解锁成功
    public boolean release(int args) {
        int state = getState() - args;
        if (Thread.currentThread() != super.getExclusiveOwnerThread()) {
            throw new IllegalMonitorStateException();
        }
        if(state == 0){
            super.setExclusiveOwnerThread(null);
        }
        setState(state);
        return state == 0 ? true : false;
    }
}
复制代码
public static void main(String[] args) throws InterruptedException {
    AQSImpl aqs = new AQSImpl();
    aqs.acquire(1);

    new Thread(() -> {
        aqs.acquire(1);
        System.out.println("线程1执行");
    }).start();

    Thread.sleep(1000);

    System.out.println("主线程执行");
    aqs.release(1);

    Thread.sleep(5000);
}
复制代码
执行结果:
主线程执行
线程1执行
复制代码

To put it simply, AQSImpl is implemented through spin + CAS, that is, the state is modified synchronously through CAS (only state==0 can be locked successfully), and after the modification fails, the blocking wait is realized through spin. The resource synchronization of AQS is also implemented based on the CAS method, but the principle of blocking and waiting is not as simple as directly waiting through spin.

Let's take a look at how AQS implements resource synchronization and blocking waiting.

synchronization semantics

In AQS, we can easily find a state variable with the same function. This state variable is also atomic (realized by volatile and CAS), and it is modified in the implementation class through getState, setState and compareAndSetState methods. I posted this part of the code below:

public abstract class AbstractQueuedSynchronizer
    extends AbstractOwnableSynchronizer
    implements java.io.Serializable {

    /**
     * The synchronization state.
     */
    private volatile int state;

    /**
     * Returns the current value of synchronization state.
     * This operation has memory semantics of a {@code volatile} read.
     * @return current state value
     */
    protected final int getState() {
        return state;
    }

    /**
     * Sets the value of synchronization state.
     * This operation has memory semantics of a {@code volatile} write.
     * @param newState the new state value
     */
    protected final void setState(int newState) {
        state = newState;
    }

    /**
     * Atomically sets synchronization state to the given updated
     * value if the current state value equals the expected value.
     * This operation has memory semantics of a {@code volatile} read
     * and write.
     *
     * @param expect the expected value
     * @param update the new value
     * @return {@code true} if successful. False return indicates that the actual
     *         value was not equal to the expected value.
     */
    protected final boolean compareAndSetState(int expect, int update) {
        // See below for intrinsics setup to support this
        return unsafe.compareAndSwapInt(this, stateOffset, expect, update);
    }
}
复制代码

Here, resources can be abstracted as variable state, that is, the modification of variable state can be equivalent to the acquisition/release of resources.

But unlike the above examples, AQS does not define the semantics of resource acquisition/release through hard coding, but provides some methods for us to customize it, which is also in line with the design concept as a framework. Specifically, through what channels can we customize it? As mentioned above, it is realized by overriding the methods tryAcquire, tryRelease, tryAcquireShared, tryReleaseShared and isHeldExclusively in the implementation class. The following author summarizes the respective functions of these methods:

method

describe

tryAcquire

Indicates that resources are acquired in exclusive mode (EXCLUSIVE). If it returns true, it means that the acquisition is successful, otherwise it means that the acquisition fails. Among them, in the implementation of the method, we should judge whether we can currently obtain resources in exclusive mode.

tryRelease

Indicates that resources are released in exclusive mode (EXCLUSIVE). If true is returned, all releases are successful, otherwise, release failures or partial releases are indicated.

tryAcquireShared

Indicates to obtain resources in the shared mode (SHARED). If the return value is greater than 0, it means that the resource is obtained successfully and its successor node may also successfully obtain the resource; 0 means that the acquisition failed. Among them, in the implementation of the method, we should judge whether the resource can be obtained in the shared mode at present.

tryReleaseShared

Indicates that resources are released in shared mode (SHARED). If true is returned, the release is successful, otherwise, the release fails.

isHeldExclusively

Indicates whether the resource is held exclusively. If it returns true, it means it is held exclusively, otherwise it means it is not held exclusively.

To sum up, tryAcquire, tryRelease, and isHeldExclusively need to be implemented when implementing exclusive semantics; tryAcquireShared and tryReleaseShared need to be implemented when implementing shared semantics. The implementation of these methods is all the work we need to do to use the AQS framework.

Through such an abstract definition, we can freely define whether only one thread can acquire resources/locks (ReentrantLock) or multiple threads can acquire resources/locks (Semaphore) at the same time. Similar methods such as Lock#tryLock can be perfectly implemented through the above-mentioned custom methods, and you can read the corresponding source code for details.

At this point in the analysis, the semantics of AQS acquiring/releasing resources (locks) is basically explained. And let's recall the explanation of AQS at the beginning: "AQS is a framework for implementing blocking locks/synchronizers based on FIFO (first-in-first-out) queues"; it can be known that AQS is based on blocking locks/synchronizers of FIFO queues framework, then we will further analyze the blocking part below.

data structure

For the blocking part, in the above example of the author, it is realized through the spin operation, that is, busy waiting or spin waiting. However, the blocking achieved in this way has performance problems. When there are a large number of threads in the spin blocking state, it will consume a lot of CPU time. Therefore, AQS adopts another method of queue waiting, that is, when the thread fails to obtain resources, it will first turn the The thread pauses and then adds it to the queue to wait until the wake-up condition is met to wake up the waiting thread in the queue again to obtain resources.

Here, let's first take a look at the key data structure Node that makes up the waiting/conditional queue in AQS:

/**
 * Wait queue node class.
 *
 * <p>The wait queue is a variant of a "CLH" (Craig, Landin, and
 * Hagersten) lock queue. CLH locks are normally used for
 * spinlocks.  We instead use them for blocking synchronizers, but
 * use the same basic tactic of holding some of the control
 * information about a thread in the predecessor of its node.  A
 * "status" field in each node keeps track of whether a thread
 * should block.  A node is signalled when its predecessor
 * releases.  Each node of the queue otherwise serves as a
 * specific-notification-style monitor holding a single waiting
 * thread. The status field does NOT control whether threads are
 * granted locks etc though.  A thread may try to acquire if it is
 * first in the queue. But being first does not guarantee success;
 * it only gives the right to contend.  So the currently released
 * contender thread may need to rewait.
 *
 * <p>To enqueue into a CLH lock, you atomically splice it in as new
 * tail. To dequeue, you just set the head field.
 * <pre>
 *      +------+  prev +-----+       +-----+
 * head |      | <---- |     | <---- |     |  tail
 *      +------+       +-----+       +-----+
 * </pre>
 *
 * <p>Insertion into a CLH queue requires only a single atomic
 * operation on "tail", so there is a simple atomic point of
 * demarcation from unqueued to queued. Similarly, dequeuing
 * involves only updating the "head". However, it takes a bit
 * more work for nodes to determine who their successors are,
 * in part to deal with possible cancellation due to timeouts
 * and interrupts.
 *
 * <p>The "prev" links (not used in original CLH locks), are mainly
 * needed to handle cancellation. If a node is cancelled, its
 * successor is (normally) relinked to a non-cancelled
 * predecessor. For explanation of similar mechanics in the case
 * of spin locks, see the papers by Scott and Scherer at
 * http://www.cs.rochester.edu/u/scott/synchronization/
 *
 * <p>We also use "next" links to implement blocking mechanics.
 * The thread id for each node is kept in its own node, so a
 * predecessor signals the next node to wake up by traversing
 * next link to determine which thread it is.  Determination of
 * successor must avoid races with newly queued nodes to set
 * the "next" fields of their predecessors.  This is solved
 * when necessary by checking backwards from the atomically
 * updated "tail" when a node's successor appears to be null.
 * (Or, said differently, the next-links are an optimization
 * so that we don't usually need a backward scan.)
 *
 * <p>Cancellation introduces some conservatism to the basic
 * algorithms.  Since we must poll for cancellation of other
 * nodes, we can miss noticing whether a cancelled node is
 * ahead or behind us. This is dealt with by always unparking
 * successors upon cancellation, allowing them to stabilize on
 * a new predecessor, unless we can identify an uncancelled
 * predecessor who will carry this responsibility.
 *
 * <p>CLH queues need a dummy header node to get started. But
 * we don't create them on construction, because it would be wasted
 * effort if there is never contention. Instead, the node
 * is constructed and head and tail pointers are set upon first
 * contention.
 *
 * <p>Threads waiting on Conditions use the same nodes, but
 * use an additional link. Conditions only need to link nodes
 * in simple (non-concurrent) linked queues because they are
 * only accessed when exclusively held.  Upon await, a node is
 * inserted into a condition queue.  Upon signal, the node is
 * transferred to the main queue.  A special value of status
 * field is used to mark which queue a node is on.
 *
 * ...
 *
 */
static final class Node {
    // ...
}
复制代码

Through the above notes, you should be able to clearly understand its design ideas. Here, the author makes a simple translation of this note:

The wait queue of AQS is a variant of the CLH lock (spin lock) queue. Each node Node in the queue will store a thread variable to identify the current execution thread and a state waitStatus variable to mark whether the current state is blocked, and use this state to control the behavior of subsequent nodes. When the node is at the head of the queue, this node will acquire resources, but it may not be successful (it just provides a chance to acquire, if the acquisition fails, it will continue to enter the waiting state), it depends on whether the semantics we implement are fair .

CLH lock queue, you can refer to the paper "A Hierarchical CLH Queue Lock"

In the queue, the node dequeue operation only needs to simply set the head node, while the node enqueue operation requires an atomic operation on the tail node. For nodes connected by two links, they are prev and next. Among them, the prev link is mainly used to process the cancel operation of the node. When the node is canceled, its successor needs to be relinked to the previous non-cancelled node; the next link is used to implement the blocking mechanism, and the predecessor node is through the next Link to wake up the successor nodes (each node stores the represented blocked thread). In addition, when the successor node is awakened and the new node enters the queue, there may be null when traversing backward through the next link. At this time, it is necessary to traverse forward from the tail node to find (because prev is set first when entering the queue).

Regarding the creation of the head node, a delayed loading method is adopted, and it will be set when the first blocked node enters the queue. That is, it will not be pre-created in the constructor at the beginning, but will be replaced by a virtual node. This is done because precreating is a waste when there are no blocking waits.

In addition, if the thread is waiting on the condition condition queue, the same data structure Node is used. The difference is that the next and prev links are not used here, but an additional nextWaiter is used to link. For the condition condition queue, you only need to simply execute the link operation, because the corresponding condition condition queue can only be accessed when the resource is exclusively held, so there is no concurrency. The node will be inserted into the condition condition queue when await is executed to enter the waiting queue; the node will be transferred to the waiting queue when signal is executed.

Which kind of queue (waiting queue or conditional queue) the node Node is stored in is marked by the waitStatus field.

Through the official notes, we should have a certain understanding of the waiting/conditional queue and its data structure Node, and then we will analyze Node from the perspective of source code:

static final class Node {
    /** Marker to indicate a node is waiting in shared mode */
    static final Node SHARED = new Node();
    /** Marker to indicate a node is waiting in exclusive mode */
    static final Node EXCLUSIVE = null;
    /** waitStatus value to indicate thread has cancelled */
    static final int CANCELLED =  1;
    /** waitStatus value to indicate successor's thread needs unparking */
    static final int SIGNAL    = -1;
    /** waitStatus value to indicate thread is waiting on condition */
    static final int CONDITION = -2;
    /**
     * waitStatus value to indicate the next acquireShared should
     * unconditionally propagate
     */
    static final int PROPAGATE = -3;
    /**
     * Status field, taking on only the values:
     *   SIGNAL:     The successor of this node is (or will soon be)
     *               blocked (via park), so the current node must
     *               unpark its successor when it releases or
     *               cancels. To avoid races, acquire methods must
     *               first indicate they need a signal,
     *               then retry the atomic acquire, and then,
     *               on failure, block.
     *   CANCELLED:  This node is cancelled due to timeout or interrupt.
     *               Nodes never leave this state. In particular,
     *               a thread with cancelled node never again blocks.
     *   CONDITION:  This node is currently on a condition queue.
     *               It will not be used as a sync queue node
     *               until transferred, at which time the status
     *               will be set to 0. (Use of this value here has
     *               nothing to do with the other uses of the
     *               field, but simplifies mechanics.)
     *   PROPAGATE:  A releaseShared should be propagated to other
     *               nodes. This is set (for head node only) in
     *               doReleaseShared to ensure propagation
     *               continues, even if other operations have
     *               since intervened.
     *   0:          None of the above
     *
     * The values are arranged numerically to simplify use.
     * Non-negative values mean that a node doesn't need to
     * signal. So, most code doesn't need to check for particular
     * values, just for sign.
     *
     * The field is initialized to 0 for normal sync nodes, and
     * CONDITION for condition nodes.  It is modified using CAS
     * (or when possible, unconditional volatile writes).
     */
    volatile int waitStatus;
    /**
     * Link to predecessor node that current node/thread relies on
     * for checking waitStatus. Assigned during enqueuing, and nulled
     * out (for sake of GC) only upon dequeuing.  Also, upon
     * cancellation of a predecessor, we short-circuit while
     * finding a non-cancelled one, which will always exist
     * because the head node is never cancelled: A node becomes
     * head only as a result of successful acquire. A
     * cancelled thread never succeeds in acquiring, and a thread only
     * cancels itself, not any other node.
     */
    volatile Node prev;
    /**
     * Link to the successor node that the current node/thread
     * unparks upon release. Assigned during enqueuing, adjusted
     * when bypassing cancelled predecessors, and nulled out (for
     * sake of GC) when dequeued.  The enq operation does not
     * assign next field of a predecessor until after attachment,
     * so seeing a null next field does not necessarily mean that
     * node is at end of queue. However, if a next field appears
     * to be null, we can scan prev's from the tail to
     * double-check.  The next field of cancelled nodes is set to
     * point to the node itself instead of null, to make life
     * easier for isOnSyncQueue.
     */
    volatile Node next;
    /**
     * The thread that enqueued this node.  Initialized on
     * construction and nulled out after use.
     */
    volatile Thread thread;
    /**
     * Link to next node waiting on condition, or the special
     * value SHARED.  Because condition queues are accessed only
     * when holding in exclusive mode, we just need a simple
     * linked queue to hold nodes while they are waiting on
     * conditions. They are then transferred to the queue to
     * re-acquire. And because conditions can only be exclusive,
     * we save a field by using special value to indicate shared
     * mode.
     */
    Node nextWaiter;
    /**
     * Returns true if node is waiting in shared mode.
     */
    final boolean isShared() {
        return nextWaiter == SHARED;
    }
    /**
     * Returns previous node, or throws NullPointerException if null.
     * Use when predecessor cannot be null.  The null check could
     * be elided, but is present to help the VM.
     *
     * @return the predecessor of this node
     */
    final Node predecessor() throws NullPointerException {
        Node p = prev;
        if (p == null)
            throw new NullPointerException();
        else
            return p;
    }
    Node() {    // Used to establish initial head or SHARED marker
    }
    Node(Thread thread, Node mode) {     // Used by addWaiter
        this.nextWaiter = mode;
        this.thread = thread;
    }
    Node(Thread thread, int waitStatus) { // Used by Condition
        this.waitStatus = waitStatus;
        this.thread = thread;
    }
}
复制代码

First, let's look at the meaning of the waitStatus status field in different values:

waitStatus

describe

SIGNAL=-1

Indicates that the successor node of the current node needs to be woken up. If the node is set to SIGNAL, it will wake up the successor node when it is released or canceled.

CANCELLED=1

Indicates that the current node is cancelled. It will be set to CANCELLED when the node is interrupted or timed out. CANCELLED is (one of) the final states of the node, so the node can no longer change when it enters the CANCELLED state.

CONIDTION=-2

Indicates that the current node is waiting in the condition condition queue. When the node is transferred from the condition condition queue to the waiting queue, the waitStatus status will be set to 0.

PROPAGATE=-3

Indicates that the current node should be propagated unconditionally the next time acquireShared is called. releaseShared should also be propagated to other nodes.

INIT=0

Indicates the initialization status of the node, and waitStatus will be set to 0 only in two cases, that is, the queue tail node and the queue head node out of the queue.

The waitStatus status here is represented by numbers. This is done to simplify some processing of code logic. For example, when waking up a node, it can be judged directly through the range instead of enumerating the corresponding enumeration for judgment.

Then there are the prev and next of the two links used in the wait queue:

  • prev is used to link the predecessor node of the node. It will be set when the node is enqueued, and it will be set to null (for GC) when the node is dequeued. In addition, when the predecessor node of the node is cancelled, it will be relinked to a non-cancelled node (this node always exists, because the head node cannot be cancelled).
  • next is used to link the successor node of the node, it will be set when the node is enqueued, and it will be set to null (for GC) when the node is dequeued. In addition, when a node joins the queue, the next link is not set immediately, but the next link is set after the binding is successful (through the prev link), so when it is found that next points to null, it does not mean that the current node is the end of the team. In this case, AQS will do a double-check, and when it finds that next points to null, it will traverse forward from the tail of the queue to find it.

The next link for the canceled node points to itself instead of null.

Then there is the execution thread variable thread stored in the node:

The thread attribute represents the thread that acquires resources. It will set thread when entering the queue to construct a node, and set it to null when dequeuing.

The purpose of saving the thread attribute in the node here is to facilitate the operation of the execution thread in subsequent operations.

Then there is the link attribute nextWaiter between condition condition queue nodes:

In the condition condition queue, nodes are linked through the nextWaiter attribute. Because the condition condition queue can only be accessed when the resource is exclusively held, so when the condition is waiting, you only need to simply hold the node link to the condition condition queue.

In addition, nextWaiter can also point to a special value SHARED, which is used to indicate that the current node belongs to the SHARED mode. The default value is EXCLUSIVE, that is, null, indicating the EXCLUSIVE mode. Here we can also see the implementation of the method isShared, which simply judges whether the node is waiting in SHARED mode by comparing whether nextWaiter is equal to SHARED.

Finally, here is a complete structure of the wait queue and condition condition queue combined with comments and code content:


wait队列:

                                             +-------------------+
                                             |                   +<-----------------------------+
                                  +--------->+ SHARED/EXCLUSIVE  |                              |
                                  |          |                   +<---------+                   |
                                  |          +-------------------+          |                   |
                                  |                    ^                    |                   |
                      Node        |        Node        |         Node       |        Node       |
                 +------------+   |   +------------+   |   +------------+   |  +------------+   |
                 | thread     |   |   | thread     |   |   | thread     |   |  | thread     |   |
                 | waitStatus |   |   | waitStatus |   |   | waitStatus |   |  | waitStatus |   |
                 | nextWaiter+----+   | nextWaiter+----+   | nextWaiter+----+  | nextWaiter+----+
                 |            |       |            |       |            |      |            |
        null<-----+prev       +<-------+prev       +<-------+prev       +<------+prev       |
                 |            |       |            |       |            |      |            |
                 |       next+------->+       next+------->+       next+------>+       next+----->null
                 +------------+       +------------+       +------------+      +------------+
                       ^                                                              ^
                       |                                                              |
    head+--------------+                                                              +---------------+tail

复制代码

condition队列:

                    Node                     Node                      Node                     Node
              +---------------+        +---------------+         +---------------+        +---------------+
              | thread        |        | thread        |         | thread        |        | thread        |
              | waitStatus    |        | waitStatus    |         | waitStatus    |        | waitStatus    |
              | prev          |        | prev          |         | prev          |        | prev          |
              | next          |        | next          |         | next          |        | next          |
              |               |        |               |         |               |        |               |
              |               |        |               |         |               |        |               |
              |   nextWaiter+--------->+   nextWaiter+---------->+   nextWaiter+--------->+   nextWaiter+--------->null
              +-------+-------+        +---------------+         +---------------+        +--------+------+
                      ^                                                                            ^
                      |                                                                            |
    firstWaiter+------+                                                                            +-------------+lastWaiter

复制代码

waiting queue

The waiting queue of AQS is implemented based on CLH lock. CLH lock adopts a mechanism of FIFO+busy spin to reduce resource (lock) competition, that is, it enters the waiting queue after resource competition fails, and continuously polls during the waiting period whether The resource can be obtained successfully. If the resource is obtained successfully, the current waiting node will be set as the head node (that is, the original head node will be dequeued). However, because CLH locks have the problem of spinning and consuming CPU, AQS adds a conditional waiting mechanism (optimization) on the basis of CLH locks. When the conditions are not met, the nodes (threads) in the queue enter the thread waiting state until the head node After completing the corresponding logic, wake up its successor node (avoiding invalid consumption of CPU).

After having a certain understanding of CLH locks and their related data structures, let's take a look at the core processing flow of AQS (for ease of understanding, some codes have been modified/deleted here):

/**
 * 排他模式
 */
private void doAcquire(int arg) {
    // 入队操作
    final Node node = addWaiter(Node.EXCLUSIVE);
    try {
        // 入队后马上对此节点进行出队操作,并在之后不断进行轮询
        for (;;) {
            // 此处为队头出队,如果失败就让线程继续等待,否则成功获取资源执行结束(排他模式出队)
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head) { 
                boolean r = tryAcquire(arg);
                // 获取资源成功
                if(r){
                    // 将当前节点设置为head节点,并将原head节点出队
                    setHead(node);
                    p.next = null; // help GC
                    return;
                }
            }
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) && ...){
                // 暂停节点线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                parkThread();
            }
        }
    } finally {
        // 如果失败则取消节点
        if (failed)
            cancelAcquire(node);
    }
}

/**
 * 共享模式
 */
private void doAcquireShared(int arg) {
    // 入队操作
    final Node node = addWaiter(Node.SHARED);
    try {
        // 入队后马上对此节点进行出队操作,并在之后不断进行轮询
        for (;;) {
            // 此处为队头出队,如果失败就让线程继续等待,否则成功获取资源执行结束(共享模式出队)
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head) { 
                int r = tryAcquireShared(arg);
                // 获取资源成功
                if (r >= 0) {
                    // 将当前节点设置为head节点,并将原head节点出队,最后向后传播唤醒信号。
                    setHeadAndPropagate(node, r);
                    p.next = null; // help GC
                    return;
                }
            }
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) && ...){
                // 暂停节点线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                parkThread();
            }
        }
    } finally {
        // 如果失败则取消节点
        if (failed)
            cancelAcquire(node);
    }
}
复制代码

The doAcquire method represents the dequeue and enqueue operations in exclusive mode, while the doAcquireShared method represents the dequeue and enqueue operations in shared mode. Although one is for the exclusive mode and the other is for the shared mode, there is no significant difference between the two in essence from a macro perspective. Basically, after the node enters the waiting queue, it continuously spins to determine whether it can obtain resource, and set itself as the head node after the resource is acquired successfully (let the original head node dequeue). Finally, the author draws a model diagram of its process based on the above code:

           +------+  spin / park +------+  spin / park +------+
head +---> | Node | <----------+ | Node | <----------+ | Node | <----+ tail
           +---+--+              +---+--+              +------+
               |                     ^
               |                     |
               +---------------------+
                       unpark
复制代码

So far, we should have a general understanding of the mechanism and model of the waiting queue. Next, the author will analyze the enqueue operation and dequeue operation of the waiting queue respectively.

Node enqueuing

When the execution thread fails to obtain resources, it will be constructed as a Node node and linked to the end of the waiting queue, and enter the thread waiting state. In AQS, both the exclusive mode and the shared mode use the addWaiter method to link the Node node to the end of the waiting queue. Below I posted the relevant code of addWaiter:

/**
 * Creates and enqueues node for current thread and given mode.
 *
 * @param mode Node.EXCLUSIVE for exclusive, Node.SHARED for shared
 * @return the new node
 */
private Node addWaiter(Node mode) {
    Node node = new Node(Thread.currentThread(), mode);
    // Try the fast path of enq; backup to full enq on failure
    Node pred = tail;
    if (pred != null) {
        node.prev = pred;
        if (compareAndSetTail(pred, node)) {
            pred.next = node;
            return node;
        }
    }
    enq(node);
    return node;
}

/**
 * Inserts node into queue, initializing if necessary. See picture above.
 * @param node the node to insert
 * @return node's predecessor
 */
private Node enq(final Node node) {
    // 此处用到死循环,保证在并发情况下也能插入成功
    for (;;) {
        Node t = tail;
        if (t == null) { // Must initialize
            if (compareAndSetHead(new Node()))
                tail = head;
        } else {
            node.prev = t;
            if (compareAndSetTail(t, node)) {
                t.next = node;
                return t;
            }
        }
    }
}   
复制代码

As can be seen from the above code, the enqueue operations of different modes are only different in the nextWaiter attribute on the Node node, and the other parts are exactly the same. In addition, the insertion of waiting nodes is realized through the tail insertion method, and the specific process is as follows:

As mentioned above, the exclusive mode and shared mode are distinguished by different values ​​of the nextWaiter attribute.

  1. Find the tail node through the tail link. If the tail node is null, initialize and create a virtual head node. Finally, repeat step 1. If the tail node is not null, execute step 2.
  2. Point the prev link of the new node to the tail node tail
  3. Set the new node as the tail node through CAS. If it fails, go back to step 1 and continue until the tail node is set successfully.
  4. Point the next link of the original tail node to the newly inserted node, and return the newly inserted tail node.

Through interpretation, you will find that even if the prev link of the new node points to the node at the end of the queue, it does not mean that the insertion is successful, because it is possible that multiple threads perform this operation at the same time under concurrent conditions, and the CAS operation can only It is guaranteed that a node is successfully set and the setting of the next link is executed, so we can consider that only after the next link is successfully set is the node inserted successfully.

Here is also a response and explanation to the above-mentioned point of view that "when next points to null, it does not mean that the current node is the tail of the queue". Because setting the prev link and the next link is not an atomic operation, it is possible that when the next link is used to determine the successor node, another thread is inserting a new node and just executing the operation of assigning a value to the prev link. In this case, AQS performs a double-check by traversing the prev link in reverse order, which is equivalent to using the insert operation in reverse.

Node dequeue

After the Node node enters the waiting queue, it will spin to determine whether it can successfully obtain resources, and enter the thread waiting state under certain conditions. And because the execution process of node dequeuing is different in the exclusive mode and the shared mode, it will be explained in points here.

In the traditional CLH lock, it does not need other wake-up operations because it only judges whether the resource can be obtained and performs the corresponding dequeue operation by simply spinning. On the basis of spin, AQS also adds a conditional waiting mechanism to let the thread enter the waiting state, so in use, it is still necessary to call the corresponding unparkSuccessor method to wake up its successor node, so that it can continue to perform spin judgment.

exclusive mode

The so-called exclusive mode means that each method call can only allow one node to obtain resources.

In the exclusive mode, after the node joins the queue, it will spin to determine whether it can obtain resources, and then continue to retry until the acquisition is successful (regardless of abnormal conditions). There is no separate method for packaging this part of the code, but it is embedded in the method of acquire to obtain resources. Below, the author will post several acquire methods related to the exclusive mode:

public final void acquire(int arg) {
    if (!tryAcquire(arg) &&
        // 主要看这里,对于入队的节点立刻执行自旋+条件等待的出队操作
        acquireQueued(addWaiter(Node.EXCLUSIVE), arg))
        selfInterrupt();
}        

/**
 * Acquires in exclusive uninterruptible mode for thread already in
 * queue. Used by condition wait methods as well as acquire.
 *
 * @param node the node
 * @param arg the acquire argument
 * @return {@code true} if interrupted while waiting
 */
final boolean acquireQueued(final Node node, int arg) {
    boolean failed = true;
    try {
        boolean interrupted = false;
        // 此处通过死循环保证了入队的节点肯定是能出队的,即重新去获取资源
        for (;;) {
            final Node p = node.predecessor();
            // 此处判断了前驱节点为head节点,所以这里必然只能是公平策略
            if (p == head && tryAcquire(arg)) {
                // 将当前节点设置为head节点,并将原head节点出队
                setHead(node);
                p.next = null; // help GC
                failed = false;
                return interrupted;
            }
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) &&
                // 暂停节点线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                parkAndCheckInterrupt())
                // 此处中断是通过布尔型返回给调用方是否有中断
                interrupted = true;
        }
    } finally {
        // 如果发生错误(如抛出异常)则将对节点进行取消操作
        if (failed)
            cancelAcquire(node);
    }
}

/**
 * Acquires in exclusive interruptible mode.
 * @param arg the acquire argument
 */
private void doAcquireInterruptibly(int arg) throws InterruptedException {
    // 将节点加入等待队列
    final Node node = addWaiter(Node.EXCLUSIVE);
    boolean failed = true;
    try {
        // 此处通过死循环保证了入队的队列肯定是能出队的,即重新去获取资源
        for (;;) {
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head && tryAcquire(arg)) {
                // 将当前节点设置为head节点,并将原head节点出队
                setHead(node);
                p.next = null; // help GC
                failed = false;
                return;
            }
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) &&
                // 暂停节点线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                parkAndCheckInterrupt())
                // 此处中断是抛出中断异常
                throw new InterruptedException();
        }
    } finally {
        // 如果发生错误(如抛出异常)则将对节点进行取消操作
        if (failed)
            cancelAcquire(node);
    }
}

/**
 * The number of nanoseconds for which it is faster to spin
 * rather than to use timed park. A rough estimate suffices
 * to improve responsiveness with very short timeouts.
 */
static final long spinForTimeoutThreshold = 1000L;

/**
 * Acquires in exclusive timed mode.
 *
 * @param arg the acquire argument
 * @param nanosTimeout max wait time
 * @return {@code true} if acquired
 */
private boolean doAcquireNanos(int arg, long nanosTimeout) throws InterruptedException {
    // 如果最大等待时间小于等于0则表示不等待,直接返回false(此处只有阻塞等待的节点才能进来)
    if (nanosTimeout <= 0L)
        return false;
    // 计算最大等待时间
    final long deadline = System.nanoTime() + nanosTimeout;
    // 将节点加入等待队列
    final Node node = addWaiter(Node.EXCLUSIVE);
    boolean failed = true;
    try {
        // 此处通过死循环保证了入队的节点肯定是能出队的,即重新去获取资源
        for (;;) {
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head && tryAcquire(arg)) {
                // 将当前节点设置为head节点,并将原head节点出队
                setHead(node);
                p.next = null; // help GC
                failed = false;
                return true;
            }
            // 判断是否超过最大等待时间
            nanosTimeout = deadline - System.nanoTime();
            if (nanosTimeout <= 0L)
                return false;
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) &&
                // 判断是否大于最大自旋次数
                nanosTimeout > spinForTimeoutThreshold)
                // 暂停节点线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                LockSupport.parkNanos(this, nanosTimeout);
            if (Thread.interrupted())
                throw new InterruptedException();
        }
    } finally {
        // 如果发生错误(如抛出异常)则将对节点进行取消操作
        if (failed)
            cancelAcquire(node);
    }
}
复制代码

There is a spin optimization in doAcquireNanos, namely nanosTimeout > spinForTimeoutThreshold, only when the timeout is greater than this threshold, the LockSupport.parkNanos method is executed to suspend the thread, otherwise the spin wait is executed directly. The official explanation is: For timeout blocking within this threshold, spinning is faster than stopping scheduling. The doAcquireSharedNanos method similar to it below also adopts similar optimization methods.

The above code gives three types of acquire methods, namely acquireQueued, doAcquireInterruptibly and doAcquireNanos. Although these three methods seem to have some differences, in fact their core dequeuing logic is the same, that is, through spin + waiting queue. Here, the author extracts the key process of the node dequeuing (pseudocode):

If you want to tell the difference between acquireQueued, doAcquireInterruptibly and doAcquireNanos, there may be some differences in the processing of interrupts and timeouts, but these are not the key to understanding the core process, so you can ignore them here.

// 此处通过死循环保证了入队的节点肯定是能出队的,即重新去获取资源
for (;;) {
    final Node p = node.predecessor();
    // 此处判断了前驱节点为head节点,所以这里必然只能是公平策略
    if (p == head && tryAcquire(arg)) {
        // 将当前节点设置为head节点,并将原head节点出队
        setHead(node);
        p.next = null; // help GC
        // ...
        // ... 资源获取成功后返回 return val
    }
    // 判断是否超时
    // ...
    // 判断是否进入等待状态(获取资源失败时)
    if (shouldParkAfterFailedAcquire(p, node) && ...){
        // 暂停节点线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
        parkThread()
    }
    // ...
    // ... 中断处理
}
复制代码

The logic of the setHead method that sets the current node as the head node (that is, dequeues the original head node) is as follows:

/** * Sets head of queue to be node, thus dequeuing. Called only by * acquire methods. Also nulls out unused fields for sake of GC * and to suppress unnecessary signals and traversals. * * @param node the node */ private void setHead(Node node) { head = node; // When the node becomes the head node, the value stored in the thread variable is useless (the value stored in the thread variable is used to execute thread waiting and thread wake-up) node.thread = null ; // The predecessor node of the head node is null node.prev = null; } Copy code

That is, the specific dequeue execution process is as follows:

  1. Determine whether the predecessor node of the current node is the head node. If not, the acquisition of resources fails and skips to step 4
  2. If the predecessor node of the current node is the head node, try to acquire resources (defined by the tryAcquire method), if the acquisition fails, skip to step 4
  3. If the resources are acquired successfully, perform the dequeue operation (set the current node as the head node and dequeue the original head node) set the current node as the head node (set the prev link and thread variable of the current node to null) set the original The next link of the head node is set to null The resource is obtained successfully, the result value is returned, and the execution ends
  4. Determine whether the current node enters the waiting state, if not, jump back to step 1
  5. If the current node can enter the waiting state, call the method to suspend the thread until the thread is woken up and jump back to step 1 (the interruption state will be judged during this period)

The semantics of acquiring a resource in exclusive mode is defined by tryAcquire. It must be a fair strategy to judge whether it is eligible to obtain resources by p == head, because only when the predecessor node is the head node, the thread is eligible to obtain resources.

However, because AQS will make the queue node enter the thread waiting state under certain conditions, it is necessary to call the corresponding unparkSuccessor method to wake up its successor node after the node execution is completed. Among them, in the exclusive mode, the unparkSuccessor method is triggered by calling the release.

/**
 * Releases in exclusive mode.  Implemented by unblocking one or
 * more threads if {@link #tryRelease} returns true.
 * This method can be used to implement method {@link Lock#unlock}.
 *
 * @param arg the release argument.  This value is conveyed to
 *        {@link #tryRelease} but is otherwise uninterpreted and
 *        can represent anything you like.
 * @return the value returned from {@link #tryRelease}
 */
@ReservedStackAccess
public final boolean release(int arg) {
    // 判断是否能释放资源
    if (tryRelease(arg)) {
        Node h = head;
        // 判断head节点是否存在后继节点
        if (h != null && h.waitStatus != 0)
            // 唤醒后继节点
            unparkSuccessor(h);
        return true;
    }
    return false;
}
复制代码

In the release method, it will first try to release the resource. If it succeeds, it will wake up the successor node (if any) and return true, otherwise it will directly return false. Among them, the specific implementation of the unparkSuccessor method used by AQS to wake up the successor node is as follows:

/**
 * Wakes up node's successor, if one exists.
 *
 * @param node the node
 */
private void unparkSuccessor(Node node) {
    /*
     * If status is negative (i.e., possibly needing signal) try
     * to clear in anticipation of signalling.  It is OK if this
     * fails or if status is changed by waiting thread.
     */
    int ws = node.waitStatus;
    // 尝试清理当前节点的状态
    if (ws < 0)
        compareAndSetWaitStatus(node, ws, 0);
        
    /*
     * Thread to unpark is held in successor, which is normally
     * just the next node.  But if cancelled or apparently null,
     * traverse backwards from tail to find the actual
     * non-cancelled successor.
     */
    // 寻找后继节点
    Node s = node.next;
    // 判断后继节点是否被取消
    if (s == null || s.waitStatus > 0) {
        s = null;
        // 后继节点被取消,从队尾向前遍历寻找有效后继节点
        for (Node t = tail; t != null && t != node; t = t.prev)
            if (t.waitStatus <= 0)
                s = t;
    }
    // 判断是否存在后继节点
    if (s != null)
        // 唤醒后继节点
        LockSupport.unpark(s.thread);
}
复制代码

The main responsibility of the unparkSuccessor method is to wake up the successor node. There are three main steps:

  1. Try to clean up the state of the current node, that is, set it to 0 (regardless of whether it succeeds or not)
  2. Find the successor node of the current node (if the node is canceled, reverse query from the end of the queue)
  3. Wake up the successor node through the method LockSupport#unpark method

Finally, wake up the successor node through the unparkSuccessor method to continue the spin retry to obtain resources after the node joins the queue until it succeeds (only consider the case of success).

In general, in the exclusive mode, each node will enter the waiting state after failing to acquire resources, and wake up after the execution of its predecessor node is completed to continue trying to acquire resources.

shared mode

The so-called sharing mode means that each method call can allow multiple nodes to obtain resources.

Similar to the exclusive mode, in the shared mode, after the node joins the queue, it will immediately spin to determine whether it can obtain resources, and then continue to retry until the acquisition is successful (regardless of abnormal conditions). The logic of this part of the code is also embedded in the method of acquire to obtain resources. Below, the author will post several acquire methods related to the shared mode:

/**
 * Acquires in shared uninterruptible mode.
 * @param arg the acquire argument
 */
private void doAcquireShared(int arg) {
    // 将节点加入等待队列
    final Node node = addWaiter(Node.SHARED);
    boolean failed = true;
    try {
        boolean interrupted = false;
        // 此处通过死循环保证了入队的队列肯定是能出队的,即重新去获取资源
        for (;;) {
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head) {
                int r = tryAcquireShared(arg);
                if (r >= 0) {
                    // 将当前节点设置为head节点,并将原head节点出队,最后向后传播唤醒信号
                    setHeadAndPropagate(node, r);
                    p.next = null; // help GC
                    if (interrupted)
                        selfInterrupt();
                    failed = false;
                    return;
                }
            }
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) &&
                // 暂停线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                parkAndCheckInterrupt())
                interrupted = true;
        }
    } finally {
        // 如果发生错误(如抛出异常)则将对节点进行取消操作
        if (failed)
            cancelAcquire(node);
    }
}

/**
 * Acquires in shared interruptible mode.
 * @param arg the acquire argument
 */
private void doAcquireSharedInterruptibly(int arg)
    throws InterruptedException {
    // 将节点加入等待队列
    final Node node = addWaiter(Node.SHARED);
    boolean failed = true;
    try {
        // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
        for (;;) {
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head) {
                int r = tryAcquireShared(arg);
                if (r >= 0) {
                    // 将当前节点设置为head节点,并将原head节点出队,最后向后传播唤醒信号
                    setHeadAndPropagate(node, r);
                    p.next = null; // help GC
                    failed = false;
                    return;
                }
            }
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) &&
                // 暂停线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                parkAndCheckInterrupt())
                throw new InterruptedException();
        }
    } finally {
        // 如果发生错误(如抛出异常)则将对节点进行取消操作
        if (failed)
            cancelAcquire(node);
    }
}

/**
 * Acquires in shared timed mode.
 *
 * @param arg the acquire argument
 * @param nanosTimeout max wait time
 * @return {@code true} if acquired
 */
private boolean doAcquireSharedNanos(int arg, long nanosTimeout)
        throws InterruptedException {
    // 如果最大等待时间小于等于0则表示不等待,直接返回false(此处只有阻塞等待的节点才能进来)
    if (nanosTimeout <= 0L)
        return false;
    // 计算最大等待时间
    final long deadline = System.nanoTime() + nanosTimeout;
    // 将节点加入等待队列
    final Node node = addWaiter(Node.SHARED);
    boolean failed = true;
    try {
        // 此处通过死循环保证了入队的节点肯定是能出队的,即重新去获取资源
        for (;;) {
            final Node p = node.predecessor();
            // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
            if (p == head) {
                int r = tryAcquireShared(arg);
                if (r >= 0) {
                    // 将当前节点设置为head节点,并将原head节点出队,最后向后传播唤醒信号
                    setHeadAndPropagate(node, r);
                    p.next = null; // help GC
                    failed = false;
                    return true;
                }
            }
            // 判断是否超过最大等待时间
            nanosTimeout = deadline - System.nanoTime();
            if (nanosTimeout <= 0L)
                return false;
            // 判断是否进入等待状态(获取资源失败时)
            if (shouldParkAfterFailedAcquire(p, node) &&
                // 判断是否大于最大自旋次数
                nanosTimeout > spinForTimeoutThreshold)
                // 暂停线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
                LockSupport.parkNanos(this, nanosTimeout);
            if (Thread.interrupted())
                throw new InterruptedException();
        }
    } finally {
        // 如果发生错误(如抛出异常)则将对节点进行取消操作
        if (failed)
            cancelAcquire(node);
    }
}
复制代码

The above code also gives three types of acquire methods, which are doAcquireShared, doAcquireSharedInterruptibly and doAcquireSharedNanos. Their core dequeue logic is also through spin + waiting queue. Here, the author also extracts the key process of node dequeue out (pseudocode):

If you want to tell the difference between doAcquireShared, doAcquireSharedInterruptibly and doAcquireSharedNanos, there may be some differences in interrupt processing and timeout processing, but these are not the key to understanding the core process, so you can ignore them here.

// 此处通过死循环保证了入队的队列肯定是能出队的,即重新去获取资源
for (;;) {
    final Node p = node.predecessor();
    // 此处判断了前驱节点为头节点,所以这里必然只能是公平策略
    if (p == head) {
        int r = tryAcquireShared(arg);
        if (r >= 0) {
            // 将当前节点设置为head节点,并将原head节点出队,并向后传播唤醒信号(如有)
            setHeadAndPropagate(node, r);
            p.next = null; // help GC
            // ...
            // ... 中断处理
            // ...
            // ... 资源获取成功后返回 return val
        }
    }
    // 判断是否超时
    // ...
    // 判断是否进入等待状态(获取资源失败时)
    if (shouldParkAfterFailedAcquire(p, node) && ... ){
        // 暂停线程(由于这里是在一个死循环体内,所以在唤醒之后会继续获取资源,直到成功为止。)
        parkThread())
    }
    // ...
    // ... 中断处理
}
复制代码

The logic of the setHeadAndPropagate method that sets the current node as the head node (that is, dequeues the original head node) and propagates the wake-up signal (if any) backward is as follows:

/** * Sets head of queue, and checks if successor may be waiting * in shared mode, if so propagating if either propagate > 0 or * PROPAGATE status was set. * * @param node the node * @param propagate the return value from a tryAcquireShared */ private void setHeadAndPropagate(Node node, int propagate) { Node h = head; // Record old head for check below // Set the current node as the head node (that is, dequeue the original head node), details You can read the above setHead(node); // Determine whether to propagate the wake-up signal backward if (propagate > 0 || h == null || h.waitStatus < 0 || (h = head) == null || h. waitStatus < 0) { Node s = node.next; // Determine whether the successor node shares the mode if (s == null || s.isShared()) // Propagate the wake-up signal backwards doReleaseShared(); } } Copy Code

That is, the specific dequeue execution process is as follows:

  1. Determine whether the predecessor node of the current node is the head node. If not, the acquisition of resources fails and skips to step 4
  2. If the predecessor node of the current node is the head node, try to acquire resources (defined by the tryAcquireShared method), if the acquisition fails, skip to step 4
  3. If the resources are acquired successfully, perform the dequeue operation (set the current node as the head node, let the original head node dequeue, and then propagate the wake-up signal backwards) set the current node as the head node, and set the node's prev link and thread Set the variable to null, wake up the successor node and continue to try to acquire resources (backward propagation) if certain conditions are met, set the next link of the original head node to null, the resource is successfully acquired, the result value is returned, and the execution ends
  4. Determine whether the current node enters the waiting state, if not, jump back to step 1
  5. If the current node can enter the waiting state, call the method to suspend the thread until the thread is woken up and jump back to step 1 (the interruption state will be judged during this period)

The semantics of acquiring a resource in shared mode is defined by tryAcquireShared. It must be a fair strategy to judge whether it is eligible to obtain resources by p == head, because only when the precursor node is the head node, the thread is eligible to obtain resources.

However, because AQS will make the queue node enter the thread waiting state under certain conditions, it is necessary to call the corresponding unparkSuccessor method to wake up the successor node after the node execution is completed. Among them, in the shared mode, the unparkSuccessor method is triggered by calling releaseShared.

/**
 * Releases in shared mode.  Implemented by unblocking one or more
 * threads if {@link #tryReleaseShared} returns true.
 *
 * @param arg the release argument.  This value is conveyed to
 *        {@link #tryReleaseShared} but is otherwise uninterpreted
 *        and can represent anything you like.
 * @return the value returned from {@link #tryReleaseShared}
 */
@ReservedStackAccess
public final boolean releaseShared(int arg) {
    // 判断是否能释放资源
    if (tryReleaseShared(arg)) {
        // 唤醒后继节点
        doReleaseShared();
        return true;
    }
    return false;
}
复制代码

In the releaseShared method, it will first try to release the resource, and if it succeeds, it will wake up the successor node (if any) and return true, otherwise it will directly return false. Among them, there are some differences between the operation of waking up the successor node in the shared mode and the exclusive mode. It adds some wake-up propagation logic on the basis of the unparkSuccessor method, that is, doReleaseShared:

/**
 * Release action for shared mode -- signals successor and ensures
 * propagation. (Note: For exclusive mode, release just amounts
 * to calling unparkSuccessor of head if it needs signal.)
 */
private void doReleaseShared() {
    /*
     * Ensure that a release propagates, even if there are other
     * in-progress acquires/releases.  This proceeds in the usual
     * way of trying to unparkSuccessor of head if it needs
     * signal. But if it does not, status is set to PROPAGATE to
     * ensure that upon release, propagation continues.
     * Additionally, we must loop in case a new node is added
     * while we are doing this. Also, unlike other uses of
     * unparkSuccessor, we need to know if CAS to reset status
     * fails, if so rechecking.
     */
    for (;;) {
        Node h = head;
        // 判断是否存在后继节点
        if (h != null && h != tail) {
            int ws = h.waitStatus;
            // 如存在(可唤醒的)后继节点
            if (ws == Node.SIGNAL) {
                // 清理head节点的状态
                if (!compareAndSetWaitStatus(h, Node.SIGNAL, 0))
                    continue;            // loop to recheck cases
                // 唤醒后继节点
                unparkSuccessor(h);
            }
            // 如不存在(需唤醒的)后继节点
            else if (ws == 0 &&
                        !compareAndSetWaitStatus(h, 0, Node.PROPAGATE))
                continue;                // loop on failed CAS
        }
        if (h == head)                   // loop if head changed
            break;
    }
}
复制代码

For the reason why the doReleaseShared method sets the waitStatus state to PROPAGATE, please read the section "Node Awakening Propagation" below, and the specific reason will not be continued here.

Finally, the doReleaseShared method will continuously judge whether there is a successor node (to be awakened) (judged by the SIGNAL state), and in some cases, it will be awakened by the unparkSuccessor method. If the awakened successor node successfully obtains the resource, it will propagate the wake-up signal backward until the awakened successor node fails to obtain the resource (by comparing whether the head node has changed).

In general, in the sharing mode, each node will enter the waiting state after failing to obtain resources, wake up after the execution of its predecessor node to continue to try to obtain resources, and continuously propagate the wake-up signal backward under certain conditions, Until the awakened successor node fails to acquire resources.

In essence, the mechanism of blocking and waiting in both the exclusive mode and the shared mode is based on the CLH lock, but AQS adds a conditional waiting strategy based on it.

Extensions: Implementation Details

Node blocking judgment

According to the above description, after the node joins the queue, it will continue to spin to determine whether it can obtain resources, and enter the thread waiting state under certain conditions. In AQS, the unified (whether it is exclusive mode or shared mode) is judged by the method shouldParkAfterFailedAcquire (whether the thread enters the waiting state), here I first post the relevant code:

/**
 * Checks and updates status for a node that failed to acquire.
 * Returns true if thread should block. This is the main signal
 * control in all acquire loops.  Requires that pred == node.prev.
 *
 * @param pred node's predecessor holding status
 * @param node the node
 * @return {@code true} if thread should block
 */
private static boolean shouldParkAfterFailedAcquire(Node pred, Node node) {
    int ws = pred.waitStatus;
    if (ws == Node.SIGNAL)
        /*
         * This node has already set status asking a release
         * to signal it, so it can safely park.
         */
        return true;
    if (ws > 0) {
        /*
         * Predecessor was cancelled. Skip over predecessors and
         * indicate retry.
         */
        do {
            node.prev = pred = pred.prev;
        } while (pred.waitStatus > 0);
        pred.next = node;
    } else {
        /*
         * waitStatus must be 0 or PROPAGATE.  Indicate that we
         * need a signal, but don't park yet.  Caller will need to
         * retry to make sure it cannot acquire before parking.
         */
        compareAndSetWaitStatus(pred, ws, Node.SIGNAL);
    }
    return false;
}
复制代码

The shouldParkAfterFailedAcquire method is mainly used to check and update the node status after resource acquisition fails, and finally confirm whether the node can enter the waiting state. If it returns true, it means that the node can enter the waiting state, otherwise it returns false, it means that it cannot enter the waiting state. In this case, the calling method will continue to call this method in the outer code to retry until it returns true. Among them, the shouldParkAfterFailedAcquire method is mainly divided into three situations:

  1. When the waitStatus status of the predecessor node is SIGNAL, it means that the node can enter the waiting state and return true
  2. When the waitStatus status of the predecessor node is greater than 0 (the node is cancelled), it is necessary to re-link the current node to the effective predecessor node (search forward through the prev link), and return false to indicate that the current cannot enter the waiting state
  3. When the waitStatus status of the predecessor node is less than or equal to 0 (except the SIGNAL status), the node needs to be updated to the SIGNAL status, and false is returned to indicate that it cannot currently enter the waiting status

Among them, for the third case where the status of the predecessor node is updated to SIGNAL through CAS, since it does not guarantee that the update will be successful, it will still return false to indicate that the current node cannot enter the waiting state. At this time, AQS will continue to call this method in the outer code until it returns true (for example, resource acquisition fails again).

That is, the waiting state can only be entered when the predecessor node status of the current node is SIGNAL when entering the shouldParkAfterFailedAcquire method. In other cases, false will be returned after the repair and update operations to indicate that the waiting state cannot be entered at present.

Combined with the semantics described above for the SIGNAL state and the calling conditions defined for the unparkSuccessor method, it is not difficult to understand why the waiting state can only be entered when the precursor node is in the SIGNAL state. Simply put, only when the current head node is SIGNAL will it go to Wake up its successor node, so here it must be guaranteed that only when the status of its predecessor node is set to SIGNAL can it enter the waiting state.

Node Wakeup Propagation

The backward propagating wake-up signal implemented in the shared mode is mainly realized through the doReleaseShared method, but we cannot directly call it, but need to indirectly trigger the call of doReleaseShared through the releaseShared method and the setHeadAndPropagate method (mentioned above and). Based on a deeper understanding of the implementation of the backward propagation wake-up signal, here the author will analyze from the releaseShared method and the setHeadAndPropagate method.

  • For the releaseShared method, it is mainly used to wake up the nodes that still have waiting resources after the held resources are successfully released.
  • /** * Releases in shared mode. Implemented by unblocking one or more * threads if {@link #tryReleaseShared} returns true. * * @param arg the release argument. This value is conveyed to * {@link #tryReleaseShared} but is otherwise uninterpreted * and can represent anything you like. * @return the value returned from {@link #tryReleaseShared} */ @ReservedStackAccess public final boolean releaseShared(int arg) { // Determine whether the resource can be released if (tryReleaseShared(arg)) { // Propagate the wake-up signal backwards doReleaseShared(); return true; } return false; } Copy code
  • For the setHeadAndPropagate method, it is mainly used to propagate the wake-up signal backwards when the resource may still exist after the head node successfully acquires the resource.
  • /** * Sets head of queue, and checks if successor may be waiting * in shared mode, if so propagating if either propagate > 0 or * PROPAGATE status was set. * * @param node the node * @param propagate the return value from a tryAcquireShared */ private void setHeadAndPropagate(Node node, int propagate) { Node h = head; // Record old head for check below // 将当前节点设置为head节点(即,将原head节点出队),详情可阅读上文 setHead(node); /* * Try to signal next queued node if: * Propagation was indicated by caller, * or was recorded (as h.waitStatus either before * or after setHead) by a previous operation * (note: this uses sign-check of waitStatus because * PROPAGATE status may transition to SIGNAL.) * and * The next node is waiting in shared mode, * or we don't know, because it appears null * * The conservatism in both of these checks may cause * unnecessary wake-ups,but only when there are multiple * racing acquires/releases, so most need signals now or soon * anyway. */ // Determine whether to propagate the wake-up signal backwards if (propagate > 0 || h == null || h.waitStatus < 0 || (h = head) == null || h.waitStatus < 0) { Node s = node.next; // Determine whether the successor node is shared mode if (s == null || s.isShared()) / / Propagate the wakeup signal backwards doReleaseShared(); } } Copy code

Based on the design concept of the sharing mode, as long as there is a possibility of remaining resources, the wake-up signal will be propagated backwards, so that more nodes (threads) can obtain resources and execute them.

For the releaseShared method, after the resource is successfully released, the wake-up signal will be propagated backwards to allow the successor node to obtain the resource that has just been released. The logic of this is not difficult to understand. In contrast, the judgment condition of the setHeadAndPropagate method to propagate the wake-up signal backwards is more difficult to understand. Here, the author extracts this part of the code separately:

/*
 * Try to signal next queued node if:
 *   Propagation was indicated by caller,
 *     or was recorded (as h.waitStatus either before
 *     or after setHead) by a previous operation
 *     (note: this uses sign-check of waitStatus because
 *      PROPAGATE status may transition to SIGNAL.)
 * and
 *   The next node is waiting in shared mode,
 *     or we don't know, because it appears null
 *
 * The conservatism in both of these checks may cause
 * unnecessary wake-ups, but only when there are multiple
 * racing acquires/releases, so most need signals now or soon
 * anyway.
 */
 if (propagate > 0 || h == null || h.waitStatus < 0 || (h = head) == null || h.waitStatus < 0) 
复制代码

From the above code, it can be concluded that as long as one of the conditions is met, the doReleaseShared method can be executed to propagate the wake-up signal backwards. Here, the author separates these five conditions and analyzes them one by one:

  1. propagate > 0, allow backward propagation if propagate > 0
  2. h == null, allowing backward propagation when the original head node is null
  3. h.waitStatus < 0, allowing backward propagation when the waitStatus of the original head node < 0
  4. (h = head) == null, allowing backward propagation if the new head node is null
  5. h.waitStatus < 0, allowing backward propagation when the waitStatus of the new head node < 0

It should be noted that each condition from top to bottom is based on the reverse logic of the previous condition.

For the first point, it should be easier to understand the condition of allowing backward propagation when propagate > 0, because according to the definition of the shared mode, it can be propagated backward when there are remaining resources propagate. It is not so intuitive to understand points 2, 3, 4, and 5, because their establishment means that propagate <= 0, which does not meet the definition of shared mode. Here, we need to analyze it in a different way, that is, the value of the propagate parameter may not be accurate under concurrent conditions, the change of the head node may be in a critical state, etc. If we only judge the propagate, it may stop when there are resources Propagation of the wakeup signal, which in turn causes waiting nodes to never be woken up.

That is to say, when the thread acquires resources successfully in the doAcquireShared method and passes the propagate=0 parameter to the setHeadAndPropagate method, if the current node still has a successor node (by waitStatus < 0, it indicates that it has a successor node) it will proceed conservatively A propagation operation, because at this time there may be a former head node releasing the held resources.

In addition, after it is determined that it can propagate backwards, it will further judge whether its successor node is in shared mode. At this time, if its successor node is null, it will also trigger a doReleaseShared method, because there may be a node when next==null Execute the enqueue operation (next==null does not mean that it is the end of the queue, the specific reason can be reviewed above). That is, a backward propagation is still performed here using a conservative strategy.

To sum up, these conservative boundary judgments under concurrent conditions can ensure better and faster execution of the program, as the comment says: "These checks are conservative and may cause unnecessary wake-up operations."

After the above series of judgments, we start to execute the doReleaseShared method, that is, we will analyze how the doReleaseShared method executes the backward propagation wake-up signal.

/**
 * Release action for shared mode -- signals successor and ensures
 * propagation. (Note: For exclusive mode, release just amounts
 * to calling unparkSuccessor of head if it needs signal.)
 */
private void doReleaseShared() {
    /*
     * Ensure that a release propagates, even if there are other
     * in-progress acquires/releases.  This proceeds in the usual
     * way of trying to unparkSuccessor of head if it needs
     * signal. But if it does not, status is set to PROPAGATE to
     * ensure that upon release, propagation continues.
     * Additionally, we must loop in case a new node is added
     * while we are doing this. Also, unlike other uses of
     * unparkSuccessor, we need to know if CAS to reset status
     * fails, if so rechecking.
     */
    for (;;) {
        Node h = head;
        if (h != null && h != tail) {
            int ws = h.waitStatus;
            // 如果head节点状态为SIGNAL=-1(表示存在后继节点),则将其状态从SIGNAL(-1)设置为INIT(0)
            if (ws == Node.SIGNAL) {
                // 清理head节点状态
                if (!compareAndSetWaitStatus(h, Node.SIGNAL, 0))
                    continue;            // loop to recheck cases
                // 唤醒后继节点
                unparkSuccessor(h);
            }
            // 如果head节点状态为INIT=0(表示不存在可唤醒的后继节点),则将其状态从INIT(0)设置为PROPAGATE(-3)
            else if (ws == 0 &&
                     !compareAndSetWaitStatus(h, 0, Node.PROPAGATE))
                continue;                // loop on failed CAS
        }
        // 如果原head节点与新head节点相同,则表示唤醒的后继节点获取资源失败(或者不存在需唤醒的后继节点),退出循环
        if (h == head)                   // loop if head changed
            break;
    }
}
复制代码

Here the author summarizes the corresponding execution steps of the doReleaseShared method:

  1. Determine whether the current head node has a successor node, otherwise skip to step 3
  2. Determine whether the state of the current head node meets the propagation conditions, otherwise skip to step 3. If the state of the head node is SIGNAL=-1 (indicating that there is a wakeable successor node), set its state from SIGNAL(-1) to INIT( 0), and wake up its successor node. If the head node status is INIT=0 (indicating that there is no wakeable successor node), set its status from INIT(0) to PROPAGATE(-3)
  3. Determine whether the head node has changed (by comparing h == head expression, where h is the snapshot of the head node at the beginning of execution, and head is the latest head node) If there has been a change, skip back to step 1 if it has not happened change, the method execution ends

After the successor node (awakened) successfully acquires resources, the head node will change (that is, h != head), and the method will continue to execute in this case. And if there is no successor node or the successor node fails to acquire resources, the head node has not changed (that is, h == head). In this case, the execution is terminated by jumping out of the loop. This fully embodies the essence of the sharing mode, that is, when there is a successor node, the wake-up signal will be propagated until the wake-up successor node fails to obtain resources (one call can wake up multiple waiting nodes).

A question arises here, why do you need to set state 0 to PROPAGATE (-3) in shared mode? Or to put it another way, why does the state of PROPAGATE(-3) need to exist?

Let's take a look at the status description of PROPAGATE(-3): "A releaseShared should be propagated to other nodes. This is set (for head node only) in doReleaseShared to ensure propagation continues, even if other operations have since intervened." . Simply put, the releaseShared call should propagate the wake-up signal backwards, and setting the state to PROPAGATE(-3) can ensure smooth propagation. In other words, if there is no such PROPAGATE (-3) state, the propagation will be interrupted and an exception will occur, but the specific cause of the exception is not clearly described here.

Based on this, the author checked the relevant information on the Internet and finally found the answer, that is, there is such a problem in a certain version of JDK6 (version before repair): "Concurrent execution of the releaseShared method will cause some waiting nodes (threads) not to be awakened. "For this reason, Doug Lea submitted a commit to fix this problem, that is, "6801020: Concurrent Semaphore release may cause some require thread not signaled", and the main solution is described in the commit as follows: "Introduce PROPAGATE waitStatus ", that is, by introducing the PROPAGATE state.

Among them, Semaphore is a semaphore tool class implemented based on the AQS sharing mode.

Now that we know the reason for introducing the PROPAGATE state, here we follow the idea provided by Doug Lea to reproduce the generated BUG, ​​and then solve the BUG by introducing the PROPAGATE state.

First of all, the unit test that triggers the BUG is posted here (for the sake of understanding, the author removed the non-core part):

// 单元测试
Semaphore sem = new Semaphore(0, fair);

Runnable blocker = ()->sem.acquire();
Runnable signaller = ()->sem.release();

Thread b1 = new Thread(blocker);
Thread b2 = new Thread(blocker);
Thread s1 = new Thread(signaller);
Thread s2 = new Thread(signaller);

Thread[] threads = { b1, b2, s1, s2 };
for (Thread thread : threads)
    thread.start();
for (Thread thread : threads) {
    thread.join(60 * 1000);
}
if (sem.availablePermits() != 0)
    throw new Error(String.valueOf(sem.availablePermits()));
if (sem.hasQueuedThreads())
    throw new Error(String.valueOf(sem.hasQueuedThreads()));
if (sem.getQueueLength() != 0)
    throw new Error(String.valueOf(sem.getQueueLength()));
复制代码

Next, let's take a look at the code that caused the BUG before the fix, which are the setHeadAndPropagate method and the releaseShared method:

// 修复前
private void setHeadAndPropagate(Node node, int propagate) {
    setHead(node);
    if (propagate > 0 && node.waitStatus != 0) {
        Node s = node.next;
        if (s == null || s.isShared())
            unparkSuccessor(node);
    }
} 

public final boolean releaseShared(int arg) {
    if (tryReleaseShared(arg)) {
        Node h = head;
        if (h != null && h.waitStatus != 0)
            unparkSuccessor(h);
        return true;
    }
    return false;
}
复制代码

In the setHeadAndPropagate method and releaseShared method of the old version, there is no call to the doReleaseShared method, but to directly call the unparkSuccessor method to perform the wake-up operation after judging the propagate and waitStatus (there is no PROPAGATE status).

So far, the author has made the following assumptions in the code based on the above old version:

  1. First, threads b1 and b2 respectively execute the Semaphore#acquire method (that is, execute the AQS#doAcquireShared method), and because the initialization semaphore is 0 (that is, there is no resource), they fail to acquire resources twice, and enter the waiting state respectively. in queue. As shown below:
  2. Step1: Threads b1 and b2 failed to execute the AQS#doAcquireShared method to obtain resources, and the structure is that the Node node enters the waiting queue state = 0 head (virtual node) +---------+ +------- ---+ +----------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2 )| | +<------+ +<-------+ | +---------+ +----------+ +--- -------+ copy code
  3. Then thread s1 executes the Semaphore#release method (that is, executes the AQS#releaseShared method), at this time thread s1 calls the AQS#tryReleaseShared method to release 1 resource (resource+1), and because the waitStatus== of the head (Node1) node SIGNAL (not equal to 0), so then wake up the successor node Node2 (b1) and set the waitStatus of the head (Node1) node to 0. As shown below:
  4. Step2: Thread s1 executes the AQS#tryReleaseShared method to release 1 resource, and wakes up the successor node Node2(b1) state = 1 head(virtual node) executing +---------+ +------ ----+ +----------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3( b2)| | +<------+ +<-------+ | +---------+ +----------+ +-- --------+ copy code
  5. Then thread b1 executes the AQS#tryAcquireShared method to hold 1 resource (resource-1), and enters the AQS#setHeadAndPropagate method (at the beginning). As shown below:
  6. step3: Thread b1 executes the AQS#tryAcquireShared method and holds 1 resource state = 0 head (virtual node) executing +---------+ +---------+ +-- --------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +<-- ----+ +<-------+ | +---------+ +----------+ +---------- + copy code // before repair private void setHeadAndPropagate(Node node, int propagate) { // o <- here is the critical point // at this time head=Node1, node=Node2(b1), propagate=0 setHead(node) ; if (propagate > 0 && node.waitStatus != 0) { Node s = node.next; if (s == null || s.isShared()) unparkSuccessor(node); } }
  7. Then thread s2 executes the Semaphore#release method (that is, executes the AQS#releaseShared method). At this time, thread s2 calls the AQS#tryReleaseShared method to release 1 resource (resource+1), but because the head node is still Node1, it does not meet the The condition for waking up the successor node (waitStatus==0), so the execution ends directly. As shown below:
  8. Step4: Thread s2 executes the AQS#tryReleaseShared method to release 1 resource, but does not wake up the successor node state = 1 head (virtual node) executing +---------+ +--------- -+ +----------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +<------+ +<-------+ | +---------+ +----------+ +----- -----+ copy code
  9. Finally, thread b1 is set as the head node, but since the passed-in propagate (snapshot) is 0, it does not meet the conditions for waking up the successor node, so the direct execution ends, as shown in the figure below:
  10. step5: thread b1 is set as the head node, but does not wake up the successor node state = 1 head +---------+ +----------+ +------- ---+ | +- ->+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +< --+ +<------ -+ | +---------+ +----------+ +----------+ Copy code

As a result, the propagation of the wake-up signal is stopped when there is still 1 semaphore left, which causes the node Node3(b2) to stay in the queue forever (regardless of interruption).

To this end, Mr. Doug Lea introduced the PROPAGATE (-3) state to solve this problem, and made corresponding modifications in the setHeadAndPropagate method and releaseShared method, and finally solved it. details as follows:

// 修改后
private void setHeadAndPropagate(Node node, long propagate) {
    Node h = head; 
    setHead(node);
    if (propagate > 0 || h == null || h.waitStatus < 0) {
        Node s = node.next;
        if (s == null || s.isShared())
            doReleaseShared();
    }
}

public final boolean releaseShared(long arg) {
    if (tryReleaseShared(arg)) {
        doReleaseShared();
        return true;
    }
    return false;
}

private void doReleaseShared() {
    for (;;) {
        Node h = head;
        if (h != null && h != tail) {
            int ws = h.waitStatus;
            if (ws == Node.SIGNAL) {
                if (!compareAndSetWaitStatus(h, Node.SIGNAL, 0))
                    continue;            // loop to recheck cases
                unparkSuccessor(h);
            }
            else if (ws == 0 &&
                        !compareAndSetWaitStatus(h, 0, Node.PROPAGATE))
                continue;                // loop on failed CAS
        }
        if (h == head)                   // loop if head changed
            break;
    }
}
复制代码

Next, let's re-execute the process according to the above assumptions:

  1. First, threads b1 and b2 respectively execute the Semaphore#acquire method (that is, execute the AQS#doAcquireShared method), and because the initialization semaphore is 0 (that is, there is no resource), they fail to acquire resources twice, and enter the waiting state respectively. in queue. As shown below:
  2. Step1: Threads b1 and b2 failed to execute the AQS#doAcquireShared method to obtain resources, and the structure is that the Node node enters the waiting queue state = 0 head (virtual node) +---------+ +------- ---+ +----------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2 )| | +<------+ +<-------+ | +---------+ +----------+ +--- -------+ copy code
  3. Then thread s1 executes the Semaphore#release method (that is, executes the AQS#releaseShared method). At this time, thread s1 calls AQS#tryReleaseShared to release 1 resource (resource+1), so the doReleaseShared method can be further executed. In the doReleaseShared method, since the waitStatus of the head (Node1) node==SIGNAL, set the waitStatus of the head (Node1) node to 0 and wake up the successor node Node2 (b1). As shown below:
  4. Step2: Thread s1 calls the AQS#tryReleaseShared method to release 1 resource, and calls the doReleaseShared method to wake up the successor node Node2(b1) state = 1 head(virtual node) executing +---------+ +---- ------+ +----------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +<------+ +<-------+ | +---------+ +----------+ + ----------+ copy code
  5. Then thread b1 executes the AQS#tryAcquireShared method to hold 1 resource (resource-1), and enters the AQS#setHeadAndPropagate method (at the beginning). As shown below:
  6. step3: Thread b1 executes the AQS#tryAcquireShared method and holds 1 resource state = 0 head (virtual node) executing +---------+ +---------+ +--- -------+ | +------>+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +<--- ---+ +<-------+ | +---------+ +----------+ +----------+ Copy code // After modification private void setHeadAndPropagate(Node node, long propagate) { Node h = head; // o <- here is the critical point // At this point head=Node1, node=Node2(b1), propagate=0 setHead(node); if (propagate > 0 || h == null || h.waitStatus < 0) { Node s = node.next; if (s == null || s.isShared()) doReleaseShared(); } } copy code
  7. Then thread s2 executes the Semaphore#release method (that is, executes the AQS#releaseShared method). At this time, thread s2 calls the AQS#tryReleaseShared method to release 1 resource (resource+1), so the doReleaseShared method can be further executed. In the doReleaseShared method, since the head (Node1) node is still Node1, branch 2 (waitStatus==0) ​​is executed after the condition judgment here to set the state of the head (Node1) node from 0 to PROPAGATE (-3), and finally End the execution of the method (since the head node has not changed). As shown below:
  8. Step4: Thread s2 calls the AQS#tryReleaseShared method to release 1 resource, and sets the current head node state to PROPAGATE(-3) state = 1 head(virtual node) executing +---------+ +-- --------+ +----------+ | +------>+ +------->+ | | Node1 | | Node2(b1) | | Node3(b2)| | +<------+ +<-------+ | +---------+ +---------- + +----------+ copy code
  9. Then thread b1 is set as the head node, and since the state of the original head node (that is, variable h) is set to PROPAGATE (-3) and waitStatus < 0 meets the condition, it can be further executed and enter the doReleaseShared method. In the doReleaseShared method, since the waitStatus of the head (Node2) node==SIGNAL, set the waitStatus of the head (Node2) node to 0 and wake up the successor node Node3 (b2). As shown below:
  10. step5: thread b1 is set as the head node, and wakes up the successor node state = 1 head executing +---------+ +----------+ +------- ---+ | +- ->+ +------->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +< --+ +<------ -+ | +---------+ +----------+ +----------+ Copy code
  11. Finally, thread b2 executes the AQS#tryAcquireShared method to hold 1 resource (resource-1), and enters the AQS#setHeadAndPropagate method, and sets the node Node3 (b2) as the head node. Finally, because it has no successor nodes (ie, propagate==0 && waitStatus==0), the direct execution ends. As shown below:
  12. step6: thread b2 is set as the head node, and ends execution state = 0 head +---------+ +---------+ +--------- -+ | +- ->+ +-- ->+ | | Node1 | | Node2(b1)| | Node3(b2)| | +< --+ +<- --+ | +------ ---+ +----------+ +----------+ copy code

So far, AQS has avoided the recurrence of the propagation interruption problem after introducing the PROPAGATE (-3) state.

Thinking: When analyzing the function of PROPAGATE, it is found that it is only set in the doReleaseShared method, but there is no special equation used for its comparison, which is basically judged by waitStatus < 0, which can be said more directly It is born for the conditional judgment of the setHeadAndPropagate method. This can't help but make me ponder, is the state PROPAGATE necessary? Or can the state PROPAGATE be replaced by SIGNAL without considering the semantics?

blocking principle

In the application of AQS, some underlying methods of the waiting queue (such as doAcquire, doAcquireShared, etc.) are generally not called directly, but indirectly through the wrapper method on the upper layer. Next, we will analyze the principle of thread blocking from the upper dimension.

exclusive blocking

In AQS, if you want to perform blocking acquisition of resources in exclusive mode, you can achieve the goal by calling the acquire or acquireInterruptibly method, that is, call the acquire or acquireInterruptibly method to obtain the resource and execute the corresponding logic when the resource is sufficient; while in the resource Calling the acquire or acquireInterruptibly method under insufficient conditions will enter the blocking state.

/**
 * Acquires in exclusive mode, ignoring interrupts.  Implemented
 * by invoking at least once {@link #tryAcquire},
 * returning on success.  Otherwise the thread is queued, possibly
 * repeatedly blocking and unblocking, invoking {@link
 * #tryAcquire} until success.  This method can be used
 * to implement method {@link Lock#lock}.
 *
 * @param arg the acquire argument.  This value is conveyed to
 *        {@link #tryAcquire} but is otherwise uninterpreted and
 *        can represent anything you like.
 */
public final void acquire(int arg) {
    if (!tryAcquire(arg) &&
        acquireQueued(addWaiter(Node.EXCLUSIVE), arg))
        selfInterrupt();
}

/**
 * Acquires in exclusive mode, aborting if interrupted.
 * Implemented by first checking interrupt status, then invoking
 * at least once {@link #tryAcquire}, returning on
 * success.  Otherwise the thread is queued, possibly repeatedly
 * blocking and unblocking, invoking {@link #tryAcquire}
 * until success or the thread is interrupted.  This method can be
 * used to implement method {@link Lock#lockInterruptibly}.
 *
 * @param arg the acquire argument.  This value is conveyed to
 *        {@link #tryAcquire} but is otherwise uninterpreted and
 *        can represent anything you like.
 * @throws InterruptedException if the current thread is interrupted
 */
public final void acquireInterruptibly(int arg)
        throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    if (!tryAcquire(arg))
        doAcquireInterruptibly(arg);
}
复制代码

The core process of the acquire or acquireInterruptibly method is the same, that is, it will first try to acquire the resource through the tryAcquire method (the specific acquisition semantics need to be customized by the user), and if the resource is successfully acquired, it will return directly; if the resource acquisition fails, then Through methods such as acquireQueued and doAcquireInterruptibly, threads are constructed as Node nodes inserted into the waiting queue for blocking waiting.

After the resource-obtaining thread completes its execution, it needs to call the corresponding release method to release the held resource, and in the exclusive mode, it will wake up a subsequent waiting thread to continue trying to obtain the resource.

/**
 * Releases in exclusive mode.  Implemented by unblocking one or
 * more threads if {@link #tryRelease} returns true.
 * This method can be used to implement method {@link Lock#unlock}.
 *
 * @param arg the release argument.  This value is conveyed to
 *        {@link #tryRelease} but is otherwise uninterpreted and
 *        can represent anything you like.
 * @return the value returned from {@link #tryRelease}
 */
public final boolean release(int arg) {
    if (tryRelease(arg)) {
        Node h = head;
        if (h != null && h.waitStatus != 0)
            unparkSuccessor(h);
        return true;
    }
    return false;
}
复制代码

In the release method, it will first try to release the resource through the tryRelease method (the specific release semantics need to be customized by the user), if the resource is released successfully, it will wake up the successor node (if the condition is met) and return true, otherwise return false directly to exit implement.

shared blocking

If you want to perform blocking acquisition of resources in shared mode, you can achieve the goal by calling the acquireShared method, that is, call the doAcquireShared method to obtain resources when the resources are sufficient, and continue to propagate the wake-up signal to allow more The waiting thread can obtain resources and finally execute the corresponding logic; and if the resource is insufficient, calling the doAcquireShared method will also enter the blocked state.

/**
 * Acquires in shared mode, ignoring interrupts.  Implemented by
 * first invoking at least once {@link #tryAcquireShared},
 * returning on success.  Otherwise the thread is queued, possibly
 * repeatedly blocking and unblocking, invoking {@link
 * #tryAcquireShared} until success.
 *
 * @param arg the acquire argument.  This value is conveyed to
 *        {@link #tryAcquireShared} but is otherwise uninterpreted
 *        and can represent anything you like.
 */
public final void acquireShared(int arg) {
    if (tryAcquireShared(arg) < 0)
        doAcquireShared(arg);
}
复制代码

In the acquireShared method, it first tries to acquire resources through the tryAcquireShared method (the specific acquisition semantics need to be customized by the user), and returns directly if the resource acquisition is successful; and if the resource acquisition fails, the thread is constructed as a Node through the doAcquireShared method Nodes are inserted into the waiting queue for blocking waiting.

After the resource-obtaining thread is executed, it needs to call the corresponding releaseShared method to release the held resource, and in the shared mode, it will wake up a subsequent waiting thread to continue trying to obtain the resource, and continue to spread the wake-up if the conditions are met Signals allow more waiting threads to obtain resources, and finally execute the corresponding logic.

/**
 * Releases in shared mode.  Implemented by unblocking one or more
 * threads if {@link #tryReleaseShared} returns true.
 *
 * @param arg the release argument.  This value is conveyed to
 *        {@link #tryReleaseShared} but is otherwise uninterpreted
 *        and can represent anything you like.
 * @return the value returned from {@link #tryReleaseShared}
 */
public final boolean releaseShared(int arg) {
    if (tryReleaseShared(arg)) {
        doReleaseShared();
        return true;
    }
    return false;
}
复制代码

In the releaseShared method, it will first try to release the resource through the tryReleaseShared method (the specific release semantics need to be customized by the user), and if the resource is released successfully, it will wake up the successor node (if the condition is met, the wake-up signal will be propagated backwards) and Return true, otherwise directly return false to exit execution.

It should be noted here that although the blocking mechanisms of methods such as acquireQueued, doAcquireInterruptibly, and doAcquireShared are fair, if you do not pay attention when implementing a synchronizer/lock, you may get an unfair synchronizer/lock. This is because the tryAcquire/tryAcquireShared method will be executed first every time the acquire/acquireShared method is called. If it is executed successfully, it means that the resource has been obtained, and there is no need to fall into a blocked state, that is, the thread with the longest waiting time is preempted. Therefore, if you need to implement a fair synchronizer/lock, you need to make a fuss about the tryAcquire/tryAcquireShared method.

condition queue

AQS not only supports synchronous blocking mechanism, but also supports conditional waiting mechanism. Among them, the conditional waiting mechanism is realized by condition variables (also known as condition queues), and in order to echo the above, the following uniformly use condition queues to name them.

The concept of condition variables comes from Monitor, which represents a queue of nodes (threads) associated with a mutex. Each node (thread) in the queue will enter the waiting state without occupying the associated mutex (in the form of Allow other nodes (threads) to obtain the mutex) until a certain condition is met.

Condition interface

The condition queues of AQS are implemented based on the Condition interface. The following author sorts out some key points of the Condition interface from the official documents:

  1. A Condition instance needs to be bound to a Lock, which is usually bound when creating a Condition instance through the Lcok#newCondition method.
  2. Condition provides a way for us to release the associated Lock and suspend the execution of the thread until it is notified by other threads when the condition is met, that is, the condition queue.

In Java, Object#monitor (that is, Object#wait() and Object#notify()) has a similar function to Condition, except that Object#monitor needs to be used in conjunction with synchronized, while Condition needs to be used in conjunction with Lock. In addition, the final implemented Condition may not be exactly the same as Object#monitor in terms of behavior and semantics (such as order guarantee, lock holding guarantee, etc.), which depends on the specific implementation.

In Condition, two types of methods are mainly defined, namely, the await type method used to let the thread enter the waiting state and the signal type method used to notify the waiting thread. Let's take a look at how Condition defines them:

  • await class method
  • void await() throws InterruptedException;
  • Because their core processes are basically the same for await methods with functions such as interruption, non-interruption, and timeout, they will not be expanded here.
  • The definition of await is officially described as follows: executing await will automatically release the Lock associated with the Condition and let the current thread enter the waiting state until the thread is notified or awakened by an interrupt. Among them, there are the following methods to wake up the waiting thread:
    • Other threads execute Condition#signal(), and the current thread is just selected to wake up
    • Other threads execute Condition#signalAll() to wake up all waiting threads
    • Other threads execute Thread#interrupt() to interrupt the current thread (if it is the method awaitUninterruptibly, this wake-up rule will not be saved)
    • The current thread waits for a timeout to be woken up (if the method has a timeout setting, such as awaitNanos, awaitUntil)
    • The current thread is spuriously woken up
  • It should be noted that after the waiting thread is woken up, it needs to successfully obtain the Lock associated with the Condition before returning from the await method (which makes the thread enter the waiting state), otherwise the thread will continue to wait (guarantee that the Lock is held when await returns)
  • signal class method
  • void signal(); void signalAll();
  • In the method definition of await, we have explained the functions of the signal method and the signalAll method, that is, the signal method is used to wake up a waiting thread, and the signalAll method is used to wake up all waiting threads, but whether it is through the signal method or signalAll The thread awakened by the method must re-acquire the Lock to make the thread return from the await method (that puts the thread into the waiting state).

ConditionObject implementation

After understanding the definition of condition queue by AQS, let's take a look at its specific implementation ConditionObject.

public abstract class AbstractQueuedSynchronizer
    extends AbstractOwnableSynchronizer
    implements java.io.Serializable {

    public class ConditionObject implements Condition, java.io.Serializable {

        /** First node of condition queue. */
        private transient Node firstWaiter;
        /** Last node of condition queue. */
        private transient Node lastWaiter;

        public ConditionObject() {}
    }
}
复制代码

ConditionObject reuses the data structure Node of the waiting queue when implementing the condition queue, and links each Node node through the extra field nextWaiter, and then points to its head node and tail node through the firstWaiter and lastWaiter links in ConditionObject, and finally Form a conditional queue. The specific structure is shown in the figure below:

The conditional queue is a one-way linked list implemented through the nextWaiter link, while the waiting queue is a doubly linked list implemented through the next link and the prev link.

condition队列:

                 Node                     Node                      Node                     Node
           +---------------+        +---------------+         +---------------+        +---------------+
           | thread        |        | thread        |         | thread        |        | thread        |
           | waitStatus    |        | waitStatus    |         | waitStatus    |        | waitStatus    |
           | prev          |        | prev          |         | prev          |        | prev          |
           | next          |        | next          |         | next          |        | next          |
           |               |        |               |         |               |        |               |
           |               |        |               |         |               |        |               |
           |   nextWaiter+--------->+   nextWaiter+---------->+   nextWaiter+--------->+   nextWaiter+--------->null
           +-------+-------+        +---------------+         +---------------+        +--------+------+
                   ^                                                                            ^
                   |                                                                            |
 firstWaiter+------+                                                                            +-------------+lastWaiter
复制代码

After understanding the data structure of ConditionObject, let's take a look at how it implements the await class method and signal class method defined on Condition.

await wait

In the case of holding the Lock, executing the await method will cause the thread to enter the conditional waiting state and release the held Lock. Below I posted the code related to the await method:

The prerequisite for using ConditionObject is to hold the Lock, that is to say, the await method can only be called when the Lock is held.

public class ConditionObject implements Condition, java.io Serializable {

    public final void await() throws InterruptedException {
        if (Thread.interrupted())
            throw new InterruptedException();
        // 1. 把当前节点(线程)链接到条件队列
        Node node = addConditionWaiter();
        // 2. 将当前节点(线程)持有的资源释放掉,并唤醒其后继节点(如有)
        int savedState = fullyRelease(node);
        int interruptMode = 0;
        // 3. 自旋判断当前节点(线程)是否在等待队列
        while (!isOnSyncQueue(node)) {
            LockSupport.park(this);
            // 判断是否存在中断,如存在则跳出循环
            if ((interruptMode = checkInterruptWhileWaiting(node)) != 0)
                break;
        }
        // 4. 在等待队列中对于当前节点(线程)执行节点出队操作
        if (acquireQueued(node, savedState) && interruptMode != THROW_IE)
            interruptMode = REINTERRUPT;
        // 5. 将条件队列中的取消节点移除(执行清理操作)
        if (node.nextWaiter != null) // clean up if cancelled
            unlinkCancelledWaiters();
        // 6. 对等待过程中发生的中断进行处理(重新标记中断/抛出中断异常)
        if (interruptMode != 0)
            reportInterruptAfterWait(interruptMode);
    }
}
复制代码

The execution of the await method can be mainly divided into 6 steps:

  1. Link the current node (thread) to the conditional queue
  2. Release the resources held by the current node (thread) and wake up its successor node (if any)
  3. Spin to judge whether the current node (thread) is waiting in the queue. If yes, end the conditional waiting and execute step 4 (if an interruption occurs, the conditional waiting will also end) If not, continue to execute the conditional waiting and execute step 3 (if it is interrupted) wake)
  4. Execute node dequeue operation for the current node (thread) in the waiting queue. If the dequeue is successful, return and execute step 5 (to prove that the resource acquisition is successful). If the dequeue fails, block waiting (spin + block, please read for details) above), and execute step 4 (if awakened)
  5. Remove the cancel node from the condition queue (perform cleanup operation)
  6. Handle interrupts that occur while waiting (re-mark interrupts/throw interrupt exceptions)

On the conditional queue based on ConditionObject, the await operation can only be performed by holding resources in an exclusive manner (that is, transferring the node from the waiting queue to the conditional queue), so we do not see many CAS-like operations in the await method.

Although the entire execution process is divided into 6 steps in the await method, the core process of conditional blocking is only 3 steps, which are "link the current node to the conditional queue", "remove the current node from the waiting queue " and "Perform a blocking operation on the current node in the conditional queue".

  1. Link the current node (thread) to the conditional queue
  2. /** * Adds a new waiter to wait queue. * @return its new wait node */ private Node addConditionWaiter() { Node t = lastWaiter; // If lastWaiter is canceled, clean out. if (t != null && t .waitStatus != Node.CONDITION) { unlinkCancelledWaiters(); t = lastWaiter; } // Construct and add the node to the condition queue Node node = new Node(Thread.currentThread(), Node.CONDITION); if (t == null) firstWaiter = node; else t.nextWaiter = node; lastWaiter = node; return node; }
  3. In the addConditionWaiter method, the normal queue tail node will be obtained first (if the queue tail node is canceled, the cleaning operation will be performed), and then the current thread will be constructed as a Node node linked to the queue tail of the condition queue (here, the node status will be set It is in the CONDITION state, indicating that it is in the condition queue).
  4. It should be noted here that because the operations of the conditional queue are all carried out under the condition that the resource is exclusively held, the nodes in the queue do not have a critical state. That is to say, the waitStatus of the node in the condition queue has only two situations. One is the CONDITION state, which means staying in the condition queue; the other is the CANCELLED state, which means it has been cancelled. Therefore, when a node in a non-CONDITION state is encountered in the addConditionWaiter method, it is directly cleaned up as a cancel node.
  5. Among them, when it is found that the node at the end of the queue is in the CANCELLED state (that is, the node is cancelled), execute the unlinkCancelledWaiters method to clean up the canceled node:
  6. /** * Unlinks cancelled waiter nodes from condition queue. * Called only while holding lock. This is called when * cancellation occurred during condition wait, and upon * insertion of a new waiter when lastWaiter is seen to have * been cancelled. This method is needed to avoid garbage * retention in the absence of signals. So even though it may * require a full traversal, it comes into play only when * timeouts or cancellations occur in the absence of * signals. It traverses all nodes rather than stopping at a * particular target to unlink all pointers to garbage nodes * without requiring many re-traversals during cancellation * storms. */ private void unlinkCancelledWaiters() { Node t = firstWaiter; // trail用于记录被取消节点的前驱节点 Node trail = null; while (t != null) { // 获取当前节点的后继节点 Node next = t.nextWaiter; // 如果当前节点被取消了 if (t.waitStatus != Node.CONDITION) { // Clear the link with the successor node t.nextWaiter = null; // If the current node is the head node, directly use firstWaiter to point to the successor node of the canceled node (equivalent to throwing away the current canceled node) if (trail == null ) firstWaiter = next; // Otherwise, point the predecessor node to the successor node (equivalent to losing the current cancellation node) else trail.nextWaiter = next; // If the successor node is null, use lastWaiter to point to the predecessor node directly (equivalent to losing the current cancellation node ) if (next == null) lastWaiter = trail; } else // record the predecessor node of the canceled node trail = t; // continue to traverse the next node t = next; } } copy code
  7. The general idea of ​​the unlinkCancelledWaiters method is no different from deleting nodes from the singly linked list, that is, it traverses from the head node to the tail node one by one, and cleans up (removes) the canceled nodes during the period. Specifically, the author has marked comments at key positions in the code, so I won't expand here.
  8. Release the resources held by the current node (thread) and wake up its successor node
  9. After the thread is constructed as a Node node linked to the conditional queue, the next step is to perform a release operation on the nodes holding resources in the waiting queue. Note that the node holding the resource in exclusive mode is the head (of the waiting queue).
  10. /** * Invokes release with current state value; returns saved state. * Cancels node and throws exception on failure. * @param node the condition node for this wait * @return previous sync state */ final int fullyRelease(Node node) { boolean failed = true; try { // Get the state value int savedState = getState(); // Release the state if (release(savedState)) { failed = false; // Return if successful, the outer layer is saved, and then wake up to get } else { throw new IllegalMonitorStateException(); } } finally { if (failed) node.waitStatus = Node.CANCELLED; } }
  11. In the fullyRelease method, the resource will be released by calling the release method, and the successor node will be awakened to kick the current head node out of the queue. (The execution logic of the specific release method can be read above)
  12. For the definition of traditional Condition, after blocking through the condition queue, the held resources will be released (so that other threads can acquire resources). Of course, it depends on our final implementation. If we remove this part of the feature in the implementation, it must be specially marked in the corresponding usage documentation.
  13. Block the current node (thread) until it is woken up
  14. After transferring the node from the waiting queue to the conditional queue, the next step is to put the current node (thread) into a blocked state. This step is mainly implemented based on the judgment logic of the isOnSyncQueue method, that is, if this condition is not met, LockSupport#park will be executed to make the current node (thread) enter the blocked state.
  15. /** * Returns true if a node, always one that was initially placed on * a condition queue, is now waiting to reacquire on sync queue. * @param node the node * @return true if is reacquiring */ final boolean isOnSyncQueue( Node node) { // 1. Determine whether the current node status is CONDITION, if it is, prove that it exists in the condition queue, and return false if (node.waitStatus == Node.CONDITION) return false; // 2. Determine the predecessor of the current node Whether the node (prev link) is null, if it is, it proves that it does not exist in the waiting queue, return false (waitStatus!=CONDITION) if (node.prev == null) return false; // 3. Determine the successor node of the current node Whether it is null, if it is, it proves that it exists in the waiting queue and returns true (waitStatus!=CONDITION && node.prev!=null) if (node.next != null) // If has successor, it must be on queue return true ; /* * node.prev can be non-null, but not yet on queue because * the CAS to place it on queue can fail. So we have to * traverse from tail to make sure it actually made it. It * will always be near the tail in calls to this method,and * unless the CAS failed (which is unlikely), it will be * there, so we hardly ever traverse much. */ // 4. Determine whether the current node exists in the waiting queue (by traversing forward from the tail node), if Return true if yes, otherwise return false return findNodeFromTail(node); } /** * Returns true if node is on sync queue by searching backwards from tail. * Called only when needed by isOnSyncQueue. * @return true if present */ private boolean findNodeFromTail(Node node) { Node t = tail; for (;;) { if (t == node) return true; if (t == null) return false; t = t.prev; } }* @return true if present */ private boolean findNodeFromTail(Node node) { Node t = tail; for (;;) { if (t == node) return true; if (t == null) return false; t = t.prev; } } 复制代码* @return true if present */ private boolean findNodeFromTail(Node node) { Node t = tail; for (;;) { if (t == node) return true; if (t == null) return false; t = t.prev; } } 复制代码
  16. In the isOnSyncQueue method, various boundary conditions are used to judge whether the current node exists in the waiting queue, and the judgment conditions can be divided into 4 points:
    1. Determine whether the current node status is CONDITION, if so, prove that it exists in the condition queue, and return false
    2. Because the waitStatus of the nodes in the condition queue is CONDITION, this condition can prove that the node does not exist in the waiting queue
    3. Determine whether the predecessor node (prev link) of the current node is null, if it is, it proves that it does not exist in the waiting queue, and returns false
    4. Because the conditional waiting thread will have a false wake-up, if the signal method (wake-up) happens to be executed at this time, and the following critical code is executed:
    5. boolean transferForSignal(Node node) { if (!compareAndSetWaitStatus(node, Node.CONDITION, 0)) return false; // o <== execute critical code Node p = enq(node); int ws = p.waitStatus; if (ws > 0 || !compareAndSetWaitStatus(p, ws, Node.SIGNAL)) LockSupport.unpark(node.thread); return true; }
    6. It is not difficult to see that the node is still in the condition queue when waitStatus is not equal to CONDITION. In this case, AQS judges whether it is in the waiting queue by checking whether the predecessor node (prev link) of the current node is null. (In the waiting queue, there are precursor nodes in all nodes except the head node. Even if the waiting queue is empty, it will create a virtual head node as its precursor node. For details, please read the relevant section above)
    7. Determine whether the successor node of the current node is null, if so, prove that it exists in the waiting queue, and return true
    8. Because for the insertion of the waiting queue (such as through the enq method), the prev link is set first, and then the next link is set after the CAS succeeds (to ensure the insertion atomicity). That is, when the prev link is not null, it does not mean that the node has been inserted into the waiting queue (there is a CAS failure), and when the next link is not null, it can prove that the node has been inserted into the waiting queue. For details, please read relevant part above
    9. Determine whether the current node exists in the waiting queue (by traversing forward from the tail node), if so, return true, otherwise return false
    10. If none of the above conditions are met, it is impossible to determine whether the node is in the waiting queue. At this time, it is necessary to traverse forward one by one from the tail node of the waiting queue to determine whether the current node exists in the queue.

signal wake up

In the case of holding the Lock, executing the signal method will wake up the node with the longest (conditional) waiting time, and executing the signalAll method will wake up all nodes that are waiting for the condition.

The prerequisite for using ConditionObject is to hold the Lock, that is to say, the signal method or signalAll method can only be called when the Lock is held.

public class ConditionObject implements Condition, java.io Serializable {
    /**
     * Moves the longest-waiting thread, if one exists, from the
     * wait queue for this condition to the wait queue for the
     * owning lock.
     *
     * @throws IllegalMonitorStateException if {@link #isHeldExclusively}
     *         returns {@code false}
     */
    public final void signal() {
        if (!isHeldExclusively())
            throw new IllegalMonitorStateException();
        // 获取第一个节点
        Node first = firstWaiter;
        if (first != null)
            doSignal(first);
    }

    /**
     * Moves all threads from the wait queue for this condition to
     * the wait queue for the owning lock.
     *
     * @throws IllegalMonitorStateException if {@link #isHeldExclusively}
     *         returns {@code false}
     */
    public final void signalAll() {
        if (!isHeldExclusively())
            throw new IllegalMonitorStateException();
        // 获取第一个节点
        Node first = firstWaiter;
        if (first != null)
            doSignalAll(first);
    }
}
复制代码

Here, the isHeldExclusively method is used to determine whether the current thread holds the resource exclusively. If not, an exception IllegalMonitorStateException is thrown.

In the signal and signalAll methods, they will delegate the specific wake-up logic to the doSignal method and the doSignalAll method (in the case of exclusive holding resources and nodes in the conditional queue).

/**
 * Removes and transfers nodes until hit non-cancelled one or
 * null. Split out from signal in part to encourage compilers
 * to inline the case of no waiters.
 * @param first (non-null) the first node on condition queue
 */
private void doSignal(Node first) {
    // 将头节点开始向后逐个遍历
    do {
        // 将firstWaiter链接指向其后继节点
        if ( (firstWaiter = first.nextWaiter) == null)
            // 如果条件队列只有一个节点,则将lastWaiter链接也设置为null(firstWaiter链接已经被设置为null)
            lastWaiter = null;
        // 将当前节点的nextWaiter链接设置为null,表示将其踢出条件队列
        first.nextWaiter = null;
    } while (!transferForSignal(first) // 唤醒条件等待的节点,并判断操作是否成功,如果成功则直接退出循环(唤醒等待时间最长的节点),否则继续执行循环
                && (first = firstWaiter) != null); // 将first指向后一个节点,让其继续执行循环(直到队尾)
}

/**
 * Removes and transfers all nodes.
 * @param first (non-null) the first node on condition queue
 */
private void doSignalAll(Node first) {
    // 将firstWaiter链接和lastWaiter链接设置为null
    lastWaiter = firstWaiter = null;
    // 将头节点开始向后逐个遍历
    do {
        // 获取当前节点的后继节点
        Node next = first.nextWaiter;
        // 将当前节点的nextWaiter链接设置为null,表示将其踢出条件队列
        first.nextWaiter = null;
        // 唤醒条件等待的节点
        transferForSignal(first);
        // 将first指向后一个节点,让其继续执行循环(直到队尾)
        first = next;
    } while (first != null);
}
复制代码

In the doSignal and doSignalAll methods, they will traverse backward one by one from the incoming first node (the head node in the conditional waiting queue) and wake up the waiting nodes by calling the transferForSignal method, among which the node with the longest waiting time for wakeup Whether to wake up all nodes that are waiting for the condition depends on the semantics defined by the two. Here, let's take a look at how the transferForSignal method that actually performs the wake-up operation is implemented:

/**
 * Transfers a node from a condition queue onto sync queue.
 * Returns true if successful.
 * @param node the node
 * @return true if successfully transferred (else the node was
 * cancelled before signal)
 */
final boolean transferForSignal(Node node) {
    /*
     * If cannot change waitStatus, the node has been cancelled.
     */
    // 先将节点状态从CONDITION转为0,如果节点被取消则此处会执行失败,即返回false
    if (!compareAndSetWaitStatus(node, Node.CONDITION, 0))
        return false;

    /*
     * Splice onto queue and try to set waitStatus of predecessor to
     * indicate that thread is (probably) waiting. If cancelled or
     * attempt to set waitStatus fails, wake up to resync (in which
     * case the waitStatus can be transiently and harmlessly wrong).
     */
    // 将node节点插入到等待队列
    Node p = enq(node);
    int ws = p.waitStatus;
    // 如果当前节点被取消或者其前驱节点状态变更失败,则将当前node节点(线程)唤醒
    if (ws > 0 || !compareAndSetWaitStatus(p, ws, Node.SIGNAL))
        LockSupport.unpark(node.thread);
    return true;
}
复制代码

The transferForSignal method is mainly divided into 3 steps:

  1. Change the status of the current node from CONDITION to 0, if it fails, return false directly (indicating that the current node is cancelled)
  2. Insert the current node into the waiting queue through the enq method (the outer code has dequeued the node from the conditional queue, so the transfer here is relatively successful)
  3. If the current node is canceled or its predecessor node status change fails (change to SIGNAL), the current node (thread) is awakened by the LockSupport#unpark method

For the insertion of the waiting queue, we need to set its predecessor node to the SIGNAL state before it is truly completed (to indicate that there is a successor node that needs to be awakened, for details, please read the relevant section above). Therefore, in step 3, if the predecessor node is successfully set to the SIGNAL state, it means that the node has been successfully transferred from the condition queue to the waiting queue, and if the predecessor node is canceled or the SIGNAL state setting fails, it needs to pass LockSupport# The unpark method wakes up the current node (thread) and lets it complete the abnormal correction (skip the canceled node or set the status of the predecessor node). Finally, go back to step 3 of the await method and continue to execute:

public final void await() throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    
    //...

    // 3. 自旋判断当前节点(线程)是否在等待队列
    while (!isOnSyncQueue(node)) {
        LockSupport.park(this);
        // ...
    }
    // 4. 在等待队列中对于当前节点(线程)执行节点出队操作
    if (acquireQueued(node, savedState) && interruptMode != THROW_IE)
        interruptMode = REINTERRUPT;

    // ...
}
复制代码

Except that when the signal and signalAll methods are called to transfer the node back to the waiting queue and an exception occurs, the await method will be triggered to allow it to continue to execute the third step of judgment (wake-up). ) wakes up its successor node after the execution is completed, it will also trigger the third step of the await method to continue to execute (wake up). Among them, two will also jump out of the step 3 loop (because it has been transferred back to the waiting queue), and execute the acquireQueued method to correct, wait and acquire (the specific logic can be read in the relevant part above).

In general, the execution of the await method essentially transfers the node from the waiting queue to the conditional queue, while the execution of the signal method essentially transfers the node from the conditional queue to the waiting queue, which queue the node is in depends on It is judged by the waitStatus status of the Node node.

Summarize

This article interprets AQS from shallow to deep, which mainly includes its data structure, waiting queue and conditional queue. However, although AQS is very fine and complex in principle, there are not many things that need to be done as a synchronization/blocking framework. The following author will list the usage of AQS from the perspective of users.

semantic realization

In the implementation of AQS, our first task is to implement resource acquisition semantics. The specific methods to be implemented are shown in the following table:

method

describe

tryAcquire

Indicates that resources are acquired in exclusive mode (EXCLUSIVE). If it returns true, it means that the acquisition is successful, otherwise it means that the acquisition fails. Among them, in the implementation of the method, we should judge whether we can currently obtain resources in exclusive mode.

tryRelease

Indicates that resources are released in exclusive mode (EXCLUSIVE). If true is returned, all releases are successful, otherwise, release failures or partial releases are indicated.

tryAcquireShared

Indicates to obtain resources in the shared mode (SHARED). If the return value is greater than 0, it means that the resource is obtained successfully and its successor node may also successfully obtain the resource; 0 means that the acquisition failed. Among them, in the implementation of the method, we should judge whether the resource can be obtained in the shared mode at present.

tryReleaseShared

Indicates that resources are released in shared mode (SHARED). If true is returned, the release is successful, otherwise, the release fails.

isHeldExclusively

Indicates whether the resource is held exclusively. If it returns true, it means it is held exclusively, otherwise it means it is not held exclusively.

For the use of the AQS conditional waiting mechanism, we need to implement the isHeldExclusively method, because there will be methods like signal and signalAll applied to it. And if we do not use Condition, the isHeldExclusively method may not be implemented in the implementation class (if it is not used in other places).

The above is all the work we need to complete when using AQS. We don't need to care about the implementation details related to waiting queues or conditional queues, because this is the work of AQS itself.

method use

In the process of realizing the above semantics, or when we need to monitor it when using AQS, we can use some methods provided by AQS to complete the work more quickly. The author will divide the relevant parts below list:

  • Methods for manipulating state
  • Method description getState get state value setState set state value compareAndSetState modify state value through CAS
  • These methods are protected and final, mainly (only) used in AQS implementation classes, and cannot be modified by inheritance. Generally used to implement the semantics of acquiring/releasing resources.
  • Methods for manipulating execution threads
  • Method description getExclusiveOwnerThread gets the current execution thread setExclusiveOwnerThread sets the current execution thread
  • These methods are mainly used to implement the exclusive mode (EXCLUSIVE), and can realize reentrancy by comparing whether the current thread is the same as the saved thread.
  • Methods for acquiring/releasing resources
  • Method description tryAcquire means to acquire resources in exclusive mode (EXCLUSIVE). If it returns true, it means the acquisition is successful, otherwise it means that the acquisition fails. tryRelease means to release resources in exclusive mode (EXCLUSIVE). If it returns true, it means that all the resources are released successfully, otherwise it means that the release fails or part of them is released. tryAcquireShared means to acquire resources in the shared mode (SHARED). If the return value is greater than 0, the acquisition is successful and its successor nodes may also successfully acquire resources; If it is less than 0, it means that the acquisition failed. tryReleaseShared means to release resources in shared mode (SHARED). If it returns true, it means that the release is successful, otherwise it means that the release fails. isHeldExclusively indicates whether the resource is held exclusively. If it returns true, it means it is held exclusively, otherwise it means it is not held exclusively. acquire means to acquire resources in exclusive mode (EXCLUSIVE). If the acquisition fails, it will be blocked (enter the waiting queue) until the acquisition succeeds. acquireInterruptibly means to acquire resources in exclusive mode (EXCLUSIVE), if the acquisition fails, it will be blocked (enter the waiting queue) until the acquisition succeeds or an exception is thrown. tryAcquireNanos means to acquire resources within the specified time in the exclusive mode (EXCLUSIVE). If the acquisition fails, it will be blocked (enter the waiting queue) until the acquisition succeeds or an exception is interrupted. If the acquisition succeeds within the specified time, it will return true, and if the timeout expires returns false. release means to release resources in exclusive mode (EXCLUSIVE), and return true if the release is successful, otherwise return false. acquireShared means to acquire resources in the shared mode (SHARED), if the acquisition fails, it will be blocked (enter the waiting queue) until the acquisition is successful (compared with the exclusive mode, this method allows multiple threads to acquire resources at the same time). acquireSharedInterruptibly means to acquire resources in the shared mode (SHARED). If the acquisition fails, it will be blocked (enter the waiting queue) until the acquisition succeeds or the interrupt throws an exception (compared with the exclusive mode, this method allows multiple threads to acquire resources at the same time). tryAc requireSharedNanos means to obtain resources in the shared mode (SHARED). If the acquisition fails, it will be blocked (enter the waiting queue) until the acquisition succeeds or an exception is interrupted. If the acquisition succeeds within the specified time, it will return true, and if it times out, it will return false. (Compared to exclusive mode, this method allows multiple threads to acquire resources at the same time). releaseShared means to release resources in shared mode (SHARED). If the release is successful, it returns true, otherwise it returns false.
  • These methods are public and are mainly used for the use of AQS implementation classes (such as ReentrantLock), such as the operation of acquiring/releasing resources.
  • Methods for viewing the waiting queue
  • Method description hasQueuedThreads Check whether there are currently waiting nodes (threads) in the waiting queue. It should be noted that since the head node is not a waiting node, it is not counted. In addition, due to interruptions and timeouts, node cancellation may occur at any time, so there is no guarantee that the value is correct. hasContended checks whether there have been waiting nodes (threads) in the waiting queue, that is, whether there has been competition. getFirstQueuedThread checks the node (thread) that is currently waiting in the waiting queue and has the longest waiting time, and returns null if it does not exist. It should be noted that since the head node is not a waiting node, it is not counted. isQueue checks whether the current entry node (thread) exists in the waiting queue. hasQueuedPredecessors checks whether there is currently a waiting node (thread) in the waiting queue that waits longer than the current node (thread), that is, whether there is a waiting node (thread) in front of the current node (thread). It should be noted that since the head node is not a waiting node, it is not counted. In addition, due to interruptions and timeouts, node cancellation may occur at any time, so there is no guarantee that the value is correct.
  • These methods are public, and are mainly used in the AQS implementation class to determine whether resources can be acquired/released (depending on the semantics), or to monitor the AQS conditional queue.
  • In addition, there is a non-public method apparentlyFirstQueuedIsExclusive (default modifier) ​​in AQS, which is used to check whether the first waiting node (thread) in the waiting queue is in exclusive mode. This method is currently only applied to ReentrantReadWriteLock. Among them, the reason for using the default modifier is also because it is only used by ReentrantReadWriteLock (ReentrantReadWriteLock and AQS are located under the same package).
  • Method for monitoring the waiting queue
  • Method description getQueueLength is used to obtain the length (estimated value) of the waiting queue. getQueuedThreads is used to obtain the list (estimated value) of nodes (threads) in the waiting queue. getExclusiveQueuedThreads is used to obtain the list (estimated value) of nodes (threads) in exclusive mode (EXCLUSIVE) in the waiting queue. getSharedQueuedThreads is used to obtain the list (estimated value) of nodes (threads) in shared mode (SHARED) in the waiting queue.
  • These methods are public and are generally used to monitor the usage of AQS waiting queues. Among them, the return value of the method is an estimated value. This is because the waiting queue is implemented based on the linked list. The results of the above methods need to be traversed one by one, and the nodes can be changed dynamically during the traversal process, so we finally get is only an estimate.
  • Methods for monitoring conditional queues
  • Method description owns is used to judge whether the incoming Condition (condition queue) is used for the current AQS, that is, whether it is created by the current AQS. hasWaiters is used to determine whether there is a waiting node (thread) (estimated value) in the incoming Condition (condition queue), and an exception is thrown if the incoming Condition (condition queue) does not belong to the current AQS. getWaitQueueLength is used to view the number (estimated value) of the current waiting nodes (threads) in the incoming Condition (conditional queue), and an exception is thrown if the incoming Condition (conditional queue) does not belong to the current AQS. getWaitingThreads is used to view the set (estimated value) of the current waiting nodes (threads) in the incoming Condition (condition queue), and throws an exception if the incoming Condition (condition queue) does not belong to the current AQS.
  • These methods are public and are generally used to monitor the usage of AQS conditional queues. Among them, the return value of the method is an estimated value. This is because the conditional queue is implemented based on the linked list. The results of the above methods need to be traversed one by one, and the nodes can be changed dynamically during the traversal process, so we finally get is only an estimate.

Implementation example

Finally, here the author posted the AQS implementation example given in the official document, we can read it in context to deepen our understanding of AQS.

/*
 * <p>Here is a non-reentrant mutual exclusion lock class that uses
 * the value zero to represent the unlocked state, and one to
 * represent the locked state. While a non-reentrant lock
 * does not strictly require recording of the current owner
 * thread, this class does so anyway to make usage easier to monitor.
 * It also supports conditions and exposes
 * one of the instrumentation methods:
 */
class Mutex implements Lock, java.io.Serializable {

    // Our internal helper class
    private static class Sync extends AbstractQueuedSynchronizer {
        // Reports whether in locked state
        protected boolean isHeldExclusively() {
            return getState() == 1;
        }


        // Acquires the lock if state is zero
        public boolean tryAcquire(int acquires) {
            assert acquires == 1; // Otherwise unused
            if (compareAndSetState(0, 1)) {
                setExclusiveOwnerThread(Thread.currentThread());
                return true;
            }
            return false;
        }


        // Releases the lock by setting state to zero
        protected boolean tryRelease(int releases) {
            assert releases == 1; // Otherwise unused
            if (getState() == 0) throw new IllegalMonitorStateException();
            setExclusiveOwnerThread(null);
            setState(0);
            return true;
        }


        // Provides a Condition
        Condition newCondition() { return new ConditionObject(); }


        // Deserializes properly
        private void readObject(ObjectInputStream s)
            throws IOException, ClassNotFoundException {
            s.defaultReadObject();
            setState(0); // reset to unlocked state
        }
    }


    // The sync object does all the hard work. We just forward to it.
    private final Sync sync = new Sync();


    public void lock()                { sync.acquire(1); }
    public boolean tryLock()          { return sync.tryAcquire(1); }
    public void unlock()              { sync.release(1); }
    public Condition newCondition()   { return sync.newCondition(); }
    public boolean isLocked()         { return sync.isHeldExclusively(); }
    public boolean hasQueuedThreads() { return sync.hasQueuedThreads(); }
    public void lockInterruptibly() throws InterruptedException {
        sync.acquireInterruptibly(1);
    }
    public boolean tryLock(long timeout, TimeUnit unit)
        throws InterruptedException {
        return sync.tryAcquireNanos(1, unit.toNanos(timeout));
    }
}
复制代码
/* 
 * <p>Here is a latch class that is like a
 * {@link java.util.concurrent.CountDownLatch CountDownLatch}
 * except that it only requires a single {@code signal} to
 * fire. Because a latch is non-exclusive, it uses the {@code shared}
 * acquire and release methods.
 */
class BooleanLatch {

    private static class Sync extends AbstractQueuedSynchronizer {
        boolean isSignalled() { return getState() != 0; }

        protected int tryAcquireShared(int ignore) {
            return isSignalled() ? 1 : -1;
        }

        protected boolean tryReleaseShared(int ignore) {
            setState(1);
            return true;
        }
    }

    private final Sync sync = new Sync();
    public boolean isSignalled() { return sync.isSignalled(); }
    public void signal()         { sync.releaseShared(1); }
    public void await() throws InterruptedException {
        sync.acquireSharedInterruptibly(1);
    }
}
复制代码

In the example, the exclusive mode (EXCLUSIVE) and the shared mode (SHARED) are respectively implemented. The specific implementation details can be found in the corresponding part above, so I won't go into details here. But you may notice that in the example, the implementation class of AQS is the internal class, and then delegated to the external class to call. This point is also mentioned in the official document. Here I post the relevant notes:

/*
 * <p>Subclasses should be defined as non-public internal helper
 * classes that are used to implement the synchronization properties
 * of their enclosing class.  Class
 * {@code AbstractQueuedSynchronizer} does not implement any
 * synchronization interface.  Instead it defines methods such as
 * {@link #acquireInterruptibly} that can be invoked as
 * appropriate by concrete locks and related synchronizers to
 * implement their public methods.
 */
复制代码

That is, because AQS does not implement any lock/synchronizer interface (does not conform to the usage specification), we should implement it through internal class delegation during use.

The interface has the function of encapsulation. For most callers, they only recognize the interface of the relevant lock/synchronizer. If you use AQS directly, it will be very unfriendly to the caller.

Guess you like

Origin blog.csdn.net/m0_48922996/article/details/125818738