In-depth understanding of Java memory model (5)-lock

Lock release-get the established happens before relationship

Lock is the most important synchronization mechanism in Java concurrent programming. In addition to mutually exclusive execution of critical sections, locks also allow the thread that releases the lock to send messages to the thread that acquired the same lock.

The following is the sample code of lock release-acquisition:

class MonitorExample {

    int a = 0;

    public synchronized void writer() {  //1
        a++;                             //2
    }                                    //3

    public synchronized void reader() {  //4

        int i = a;                       //5
        ……
    }                                    //6
}

Suppose thread A executes the writer() method, and then thread B executes the reader() method. According to the happens before rule, the happens before relationship included in this process can be divided into two categories:

  1. According to the program sequence rules, 1 happens before 2, 2 happens before 3; 4 happens before 5, 5 happens before 6.
  2. According to the monitor lock rule, 3 happens before 4.
  3. According to the transitivity of happens before, 2 happens before 5.

The graphical representation of the above happens before relationship is as follows:

In the figure above, the two nodes linked by each arrow represent a happens before relationship. The black arrow indicates the program sequence rule; the orange arrow indicates the monitor lock rule; the blue arrow indicates the happens before guarantee provided by combining these rules.

The above figure shows that after thread A releases the lock, thread B subsequently acquires the same lock. In the image above, 2 happens before 5. Therefore, all shared variables that were visible to thread A before releasing the lock will become visible to thread B immediately after thread B acquires the same lock.

 

Memory semantics of lock release and acquisition

When the thread releases the lock, JMM flushes the shared variables in the local memory corresponding to the thread to the main memory. Take the above MonitorExample program as an example, after the A thread releases the lock, the state diagram of the shared data is as follows:

When a thread acquires a lock, JMM will invalidate the local memory corresponding to the thread. As a result, the critical section code protected by the monitor must read shared variables from the main memory. The following is a schematic diagram of the state of lock acquisition:

Comparing the memory semantics of lock release-acquisition with the memory semantics of volatile write-read, it can be seen that: lock release has the same memory semantics as volatile write; lock acquisition has the same memory semantics as volatile read.

The following is a summary of the memory semantics of lock release and lock acquisition:

  • Thread A releases a lock. In essence, thread A sends a message (modified by thread A to the shared variable) to a thread that will acquire the lock next.
  • Thread B acquires a lock. In essence, thread B receives a message sent by a previous thread (modified shared variables before releasing the lock).
  • Thread A releases the lock, and then thread B acquires the lock. This process is essentially thread A sending a message to thread B through main memory.

 

Implementation of lock memory semantics

This article will use the source code of ReentrantLock to analyze the specific implementation mechanism of lock memory semantics.

Please see the sample code below:

class ReentrantLockExample {

    int a = 0;
    ReentrantLock lock = new ReentrantLock();

    public void writer() {

        lock.lock();         // 获取锁 
        try {
            a++;
        } finally {
            lock.unlock();  // 释放锁 
        }
    }

    public void reader () {

        lock.lock();        // 获取锁 
        try {
            int i = a;
            ……
        } finally {
            lock.unlock();  // 释放锁 
        }
    }
}

In ReentrantLock, call the lock() method to acquire the lock; call the unlock() method to release the lock.

The implementation of ReentrantLock relies on the java synchronizer framework AbstractQueuedSynchronizer (referred to as AQS in this article). AQS uses an integer volatile variable (named state) to maintain the synchronization state. We will see shortly that this volatile variable is the key to the realization of ReentrantLock's memory semantics. The following is the class diagram of ReentrantLock (only the parts relevant to this article are drawn):

ReentrantLock is divided into fair lock and unfair lock. We first analyze fair lock.

When using fair lock, the method call trace of the lock method lock() is as follows:

  1. ReentrantLock : lock()
  2. FairSync : lock()
  3. AbstractQueuedSynchronizer : acquire(int arg)
  4. ReentrantLock : tryAcquire(int acquires)

In step 4, the real start of locking, the following is the source code of the method:

protected final boolean tryAcquire(int acquires) {

    final Thread current = Thread.currentThread();
    int c = getState();   // 获取锁的开始,首先读 volatile 变量 state

    if (c == 0) {
        if (isFirst(current) && compareAndSetState(0, acquires)) {
            setExclusiveOwnerThread(current);
            return true;
        }
    }
    else if (current == getExclusiveOwnerThread()) {
        int nextc = c + acquires;
        if (nextc < 0) { 
            throw new Error("Maximum lock count exceeded");
        }

        setState(nextc);
        return true;
    }

    return false;
}

From the above source code, we can see that the lock method first reads the volatile variable state.

When using fair lock, the method call trace of the unlock method unlock() is as follows:

  1. ReentrantLock : unlock()
  2. AbstractQueuedSynchronizer : release(int arg)
  3. Sync : tryRelease(int releases)

In step 3, the lock is actually released. Here is the source code of the method:

protected final boolean tryRelease(int releases) {

    int c = getState() - releases;
    if (Thread.currentThread() != getExclusiveOwnerThread()){
        throw new IllegalMonitorStateException();
    }
    boolean free = false;
    if (c == 0) {
        free = true;
        setExclusiveOwnerThread(null);
    }
    setState(c);           // 释放锁的最后,写 volatile 变量 state
    return free;
}

 

From the source code above, we can see that the volatile variable state is written at the end of the lock release.

The fair lock writes the volatile variable state at the end of the lock release; the volatile variable is read first when the lock is acquired. According to the happens-before rule of volatile, the shared variable that is visible before the thread releasing the lock writes the volatile variable will become visible to the thread acquiring the lock immediately after the thread acquiring the lock reads the same volatile variable.

Now we analyze the implementation of the memory semantics of unfair locks.

The release of unfair locks is exactly the same as that of fair locks, so only the acquisition of unfair locks is analyzed here.

When using fair lock, the method call trace of the lock method lock() is as follows:

  1. ReentrantLock : lock()
  2. NonfairSync : lock()
  3. AbstractQueuedSynchronizer : compareAndSetState(int expect, int update)

In the third step, the actual start of locking, the following is the source code of the method:

protected final boolean compareAndSetState(int expect, int update) {
    return unsafe.compareAndSwapInt(this, stateOffset, expect, update);
}

This method updates the state variable in an atomic operation. This article calls the Java compareAndSet() method call abbreviated as CAS. The JDK documentation describes this method as follows: If the current state value is equal to the expected value, the synchronization state is atomically set to the given update value. This operation has memory semantics of volatile read and write.

Here we analyze from the perspective of compiler and processor respectively, how CAS has the memory semantics of volatile read and volatile write at the same time.

As we mentioned earlier, the compiler will not reorder any memory operations after volatile reads and volatile reads; the compiler will not reorder any memory operations before volatile writes and volatile writes. Combining these two conditions means that in order to achieve the memory semantics of volatile read and volatile write at the same time, the compiler cannot reorder CAS and any memory operations before and after CAS.

Let's analyze how CAS has both the memory semantics of volatile read and volatile write in common intel x86 processors.

Here is the source code of the compareAndSwapInt() method of the sun.misc.Unsafe class:

public final native boolean compareAndSwapInt(Object o, 
                                              long offset,
                                              int expected,
                                              int x);

You can see that this is a local method call. The c++ code that this native method calls in turn in openjdk is: unsafe.cpp, atomic.cpp and atomic windows x86.inline.hpp. The final implementation of this native method is in the following location of openjdk: openjdk-7-fcs-src-b147-27 jun 2011\openjdk\hotspot\src\os cpu\windows x86\vm\ atomic windows x86.inline.hpp (corresponding to Windows operating system, X86 processor). The following is a snippet of the source code corresponding to the intel x86 processor:

// Adding a lock prefix to an instruction on MP machine
// VC++ doesn't like the lock prefix to be on a single line
// so we can't insert a label after the lock prefix.
// By emitting a lock prefix, we can define a label after it.
#define LOCK_IF_MP(mp) __asm cmp mp, 0  \
                       __asm je L0      \
                       __asm _emit 0xF0 \
                       __asm L0:

inline jint  Atomic::cmpxchg (jint exchange_value, volatile jint* dest, jint     compare_value) {
  // alternative for InterlockedCompareExchange
  int mp = os::is_MP();
  __asm {
    mov edx, dest
    mov ecx, exchange_value
    mov eax, compare_value
    LOCK_IF_MP(mp)
    cmpxchg dword ptr [edx], ecx
  }
}

As shown in the source code above, the program will determine whether to add the lock prefix to the cmpxchg instruction according to the current processor type. If the program is running on multiple processors, add the lock prefix (lock cmpxchg) to the cmpxchg instruction. On the contrary, if the program is running on a single processor, omit the lock prefix (the single processor itself will maintain the order consistency within the single processor, and the memory barrier effect provided by the lock prefix is ​​not required).

Intel's manual describes the lock prefix as follows:

  1. Ensure that the read-modify-write operation of the memory is performed atomically. In Pentium and pre-Pentium processors, instructions with the lock prefix will lock the bus during execution, making other processors temporarily unable to access memory through the bus. Obviously, this will bring expensive overhead. Starting from Pentium 4, Intel Xeon and P6 processors, Intel has made a significant optimization based on the original bus lock: if the area of ​​memory to be accessed is already in the processor during the execution of the lock prefix instruction The internal cache is locked (that is, the cache line containing the memory area is currently exclusive or modified), and the memory area is completely contained in a single cache line (cache line), then the processor will directly execute the instruction. Since the cache line will always be locked during the execution of the instruction, other processors cannot read/write the memory area to be accessed by the instruction, so the atomicity of the instruction execution can be guaranteed. This operation is called cache locking. Cache locking will greatly reduce the execution cost of the lock prefix instruction. However, when the competition between multiprocessors is high or the memory addresses accessed by instructions are not aligned, the bus will still be locked. .
  2. It is forbidden to reorder this instruction with the previous and subsequent read and write instructions.
  3. Flush all data in the write buffer to the memory.

The memory barrier effect of points 2 and 3 above is sufficient to realize the memory semantics of volatile read and volatile write at the same time.

After the above analysis, we can finally understand why the JDK document says that CAS has both volatile read and volatile write memory semantics.

Now let’s summarize the memory semantics of fair locks and unfair locks:

  • When a fair lock and an unfair lock are released, a volatile variable state must be written at the end.
  • When a fair lock is acquired, the volatile variable is read first.
  • When an unfair lock is acquired, the volatile variable is first updated with CAS. This operation has both volatile read and volatile write memory semantics.

From the analysis of ReentrantLock in this article, it can be seen that there are at least two ways to achieve the memory semantics of lock release-acquisition:

  1. Use the memory semantics of volatile variable write-read.
  2. Use the memory semantics of volatile read and volatile write attached to CAS.

 

Implementation of the concurrent package

Since Java's CAS has both volatile read and volatile write memory semantics, the communication between Java threads now has the following four ways:

  1. Thread A writes the volatile variable, and then thread B reads the volatile variable.
  2. Thread A writes a volatile variable, and then thread B uses CAS to update the volatile variable.
  3. Thread A uses CAS to update a volatile variable, and then thread B uses CAS to update the volatile variable.
  4. Thread A uses CAS to update a volatile variable, and then thread B reads the volatile variable.

Java's CAS will use the efficient machine-level atomic instructions provided on modern processors. These atomic instructions perform read-modify-write operations on memory in an atomic manner. This is the key to achieving synchronization in multiple processors (in essence, A computing machine that can support atomic read-modify-write instructions is an asynchronous equivalent machine for sequential calculation of Turing machines, so any modern multiprocessor will support some kind of atomic read-modify-write operations on memory Atomic instructions). At the same time, the read/write of volatile variables and CAS can realize the communication between threads. Integrating these features together forms the cornerstone of the realization of the entire concurrent package. If we carefully analyze the source code implementation of the concurrent package, we will find a generalized implementation mode:

  1. First, declare the shared variable as volatile;
  2. Then, use the atomic condition update of CAS to achieve synchronization between threads;
  3. At the same time, the communication between threads is realized with volatile read/write and the memory semantics of volatile read and write of CAS.

AQS, non-blocking data structure and atomic variable classes (classes in the java.util.concurrent.atomic package). The basic classes in these concurrent packages are implemented using this mode, and the high-level classes in the concurrent package are Rely on these basic classes to achieve. From the overall point of view, the implementation diagram of the concurrent package is as follows:

Previous articleIn-     depth understanding of Java memory model (4)-volatile keyword
Next article In-    depth understanding of Java memory model (6)-final keyword

 

Thanks to the author for his contribution to this article

Cheng Xiaoming, Java software engineer, nationally certified system analyst and information project manager. Focus on concurrent programming and work at Fujitsu Nanda. Personal email: [email protected].
---------------------
Author: World coding
Source: CSDN
Original: https://blog.csdn.net/dgxin_605/article/details/86183651
Copyright: This article is the original article of the blogger, please attach a link to the blog post if you reprint it!

Guess you like

Origin blog.csdn.net/dgxin_605/article/details/86183651