"The Art of Java Concurrent Programming" Notes (Part 1)

How to reduce context switching

Methods to reduce context switching includelock-free concurrent programming, CAS algorithm< a i=4>, using minimal threads, and using coroutines.

  • Lock-free concurrent programming: When multiple threads compete for locks, context switching will occur. Therefore, when multiple threads process data, some methods can be used to avoid using locks. For example, if the ID of the data is segmented according to the Hash algorithm, it will be different. Threads process different segments of data.
  • CAS algorithm: Java'sAtomic package usesCAS algorithm to update data without locking.
  • Use the fewest threads: Avoid creating unnecessary threads. For example, if there are few tasks, but many threads are created to process them, this will cause a large number of threads to be in a waiting state.
  • Coroutine: Implement multi-task scheduling in a single thread, and maintain switching between multiple tasks in a single thread

How does volatile ensure visibility?

Let us use tools to obtain the assembly instructions generated by the JIT compiler under the What will happen.

The Java code is as follows:

instance = new Singleton(); // instance 是 volatile 变量

Convert it into assembly code as follows:

0x01a3de1d: movb $0×0,0×1104800(%esi);
0x01a3de24: lock addl $0×0,(%esp);

Yes volatile When writing a shared variable modified by a variable, a second line of assembly code will appear

Lock prefixed instructions will cause two things on multi-core processors:

  • 1)Write the data of the current processor cache line back to the system memory.
  • 2)This write-back operation will invalidate the data cached at this memory address in other CPUs.

In order to improve the processing speed, the processor does not communicate directly with the memory. Instead, it first reads the data in the system memory into the internal cache (L1, L2 or other) before performing the operation. However, after the operation is completed, it does not know where to go. will be written to the memory. If a write operation is performed on a variable declared with Volatile, the JVM will send an instruction with the Lock prefix to the processor to cache the variable. The row's data is written back to system memory. However, even if it is written back to memory, if the values ​​cached by other processors are still old, there will be problems when performing calculation operations. Therefore, under multi-processors, in order to ensure that the cache of each processor is consistent, a cache consistency protocol will be implemented. Each processor passes sniffing Detect the data transmitted on the bus to check whether its cached value has expired. When the processor finds that the memory address corresponding to its cache line has been modified, it will set the current processor's cache line to an invalid state. When the processor wants to When modifying this data, the data will be forced to be read from the system memory into the processor cache again.

The following is a detailed explanation of the two implementation principles of Volatile.

  • 1)Lock prefix instructions will cause the processor cache to be written back to memory. The Lock prefix instruction causes the processor's LOCK# signal to be asserted during execution of the instruction. In a multiprocessor environment, the LOCK# signal ensures that the processor can exclusively occupy any shared memory while the signal is asserted. However, in recent processors, LOCK# signals generally do not lock the bus, but lock the cache, after all, the bus is locked The overhead is relatively large. During a lock operation, the LOCK# signal is always asserted on the bus. But in P6 and current processors, if the memory area being accessed is already cached within the processor, theLOCK# signal will not be asserted. Instead, it locks the cache of this memory area and writes it back to memory, and uses the cache consistency mechanism to ensure the atomicity of the modification. This operation Known as "cache locking", the cache coherence mechanism prevents simultaneous modification of data in a memory region cached by more than two processors.
  • 2)Writing the cache of one processor back to memory will invalidate the cache of other processors. IA-32 processors and Intel 64 processors use the MESI (Modify, Exclusive, Shared, Invalidate) control protocol to maintain internal cache and other processor caches consistency. When operating on a multi-core processor system, IA-32 and Intel 64 processors can sniff other processors accessing system memory and their internal caches. The processor uses sniffing techniques to ensure that the data in its internal cache, system memory, and caches of other processors are consistent on the bus. For example: on Pentium and P6 family processors, if one processor is sniffed to detect that another processor intends to write to a memory address that is currently shared, the sniffing processor will invalidate its cache line , forcing a cache line fill the next time the same memory address is accessed.

How does synchronized achieve synchronization?

Let’s first look at the basis of using synchronized to achieve synchronization: every object in Java can be used as a lock. The specific performance is in the following 3 forms.

  • For ordinary synchronization methods, the lock is the current instance object; before entering the synchronization code, you must obtain the lock of the current instance;
  • For static synchronization methods, the lock is the object of the current classClass; before entering the synchronization code, you must obtain the lock of the current class object;
  • For synchronized method blocks, the lock issynchonized()the object configured in the brackets. This requires specifying the locked object, and obtaining the lock of the specified object before entering the synchronization code.

When a thread attempts to access a synchronized code block, it must first obtain the lock, and the lock must be released when it exits or throws an exception. So where do the locks exist? What information will be stored in the lock?

You can see from the JVM specification the implementation principle ofSynchonized in JVM,JVM is based on entry and exitMonitor Object to implement method synchronization and code block synchronization, but the implementation details of the two are different. Code block synchronization is implemented using the monitorenter and monitorexit directives and , and the details are not specified in the JVM specification. However, method synchronization can also be achieved using these two instructions. Method synchronization is implemented in another way

monitorenterThe instruction is inserted at the beginning of the synchronized code block after compilation, while monitorexit is inserted at the end of the method and exception. The JVM must ensure that each monitorenter There must be a corresponding monitorexit paired with it. Any object has a monitor associated with it, and when a monitor is held, it will be in a locked state. When the thread executes the monitorenter instruction, it will try to obtain ownership of the object corresponding to monitor, that is, it will try to obtain the object's lock.

synchronizedThe lock used by is stored in the Java object header. If the object is an array type, the virtual machine uses 3 words to store the object header. If the object is a non-array type, the virtual machine uses 2 words to store the object header. head. In a 32-bit virtual machine, 1 word width is equal to 4 bytes, that is, 32 bits, as shown in Table 2-2.

Insert image description here

Java Object HeaderMark Word stores the object by defaultHashCode, Generation Age and Lock Flag Bit a> for 32-bit JVM:Mark Word. Default storage structure of

Insert image description here

During operation, the data stored in Mark Word will change with the lock flag. And change. Mark Work may change to store the following four types of data, as shown in Table 2-4.

Insert image description here

Under64 virtual machine, Mark Word It is64bit in size, and its storage structure is shown in Figure 2-5.

Insert image description here

Insert image description here

How processors implement atomic operations

The 32-bit IA-32 processor uses cache locking or bus locking based on to achieve communication between multiple processors. Atomic operations. First, the processor automatically ensures the atomicity of basic memory operations. The processor guarantees that reading or writing a byte from system memory is atomic, meaning that when one processor reads a byte, other processors cannot access the memory address of this byte. Pentium 6 and the latest processors can automatically ensure that single-processor operations on 16/32/64 bits in the same cache line are atomic. Butthe processor cannot automatically guarantee its atomicityfor complex memory operations, such as access across bus width, across multiple cache lines, and across page tables. However,the processor provides two mechanisms, bus locking and cache locking, to ensure the atomicity of complex memory operations.

  • (1) Use bus lock to ensure atomicity

    The first mechanism is to ensure atomicity throughbus lock. The processor uses bus locks to solve this problem. The so-called bus lock is to use a LOCK# signal provided by the processor. When a processor outputs this signal on the bus, the requests of other processors will be blocked. live, then the processor can exclusively occupy the shared memory.

  • (2) Use cache locks to ensure atomicity

    The second mechanism is to ensure atomicity throughcache locking. At the same time, we only need to ensure that the operation on a certain memory address is atomic, but the bus lock limits the communication between the CPU and the memory. is locked, which prevents other processors from operating data at other memory addresses during the lock period, so the overhead of bus locking is relatively large. Currently, the processor is In some cases, cache locking is used instead of bus locking for optimization.

Frequently used memory will be cached in the processor's L1, L2 and L3 caches, so atomic operations can be performed directly in the processor's internal cache without declaring a bus lock. In Pentium 6 and recent Processors can use "cache locking" to achieve complex atomicity. The so-called "cache lock" means that if the memory area is cached in the processor's cache line and is locked during the Lock operation, then when it executes the lock When an operation writes back memory, the processor does not assert theLOCK# signal on the bus, but instead modifies the internal memory address and allows its cache coherence mechanism to ensure the atomicity of the operation because cache coherence The mechanism prevents simultaneous modification of data in a memory area cached by more than two processors. When other processors write back the data of a locked cache line, the cache line will be invalidated.

But there are two situations where the processor will not use cache locking.

  • The first situation is: when the data being operated cannot be cached within the processor, or the data being operated spans multiple cache lines, the processor will call bus locking.
  • The second case is: some processors do not support cache locking. For Inter486 and Pentium processors, bus locking is invoked even if the locked memory region is in the processor's cache line.

How to implement atomic operations in JAVA

In java, you can use lock and loop CAS way to implement atomic operations.

(1) Use cyclic CAS to implement atomic operations

The CAS operation in a>. and a non-thread-safe counterCAS based on , the following code implements a thread-safe counter method loop the CAS operation until success is toCASCMPXCHG instructions provided by the processor. The basic idea of ​​spinJVM is implemented using the safeCountcount

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class CASTest {
    
    

    private AtomicInteger atomicI = new AtomicInteger(0);
    private int i = 0;

    public static void main(String[] args) throws InterruptedException {
    
    
        final CASTest cas = new CASTest();

        List<Thread> ts = new ArrayList<>(600);
        long start = System.currentTimeMillis();

        for (int j = 0; j < 100; j++) {
    
    
            Thread t = new Thread(() -> {
    
    
                for (int i = 0; i < 10000; i++) {
    
    
                    cas.count();
                    cas.safeCount();
                }
            });
            ts.add(t);
        }

        for (Thread t : ts) {
    
    
            t.start();
        }

        // 等待所有线程执行完成
        for (Thread t : ts) {
    
    
            try {
    
    
                t.join();
            } catch (InterruptedException e) {
    
    
                e.printStackTrace();
            }
        }

        TimeUnit.SECONDS.sleep(1);

        System.out.println(cas.i);
        System.out.println(cas.atomicI.get());
        System.out.println(System.currentTimeMillis() - start);
    }

    /** 使用CAS实现线程安全计数器 */
    private void safeCount() {
    
    
        for (;;) {
    
    
            int i = atomicI.get();
            boolean suc = atomicI.compareAndSet(i, ++i);
            if (suc) {
    
    
                break;
            }
        }
    }

    /** 非线程安全计数器 */
    private void count() {
    
    
        i++;
    }
}

(2) Three major problems in implementing atomic operations in CAS

  • ABA problem: If a value is originally A, becomes B, and then becomes A, then use CAS to Solving ABA ProblemsAtomicStampedReference package provides a classAtomic When checked, its value has not changed. Solution: JDK's
  • Long cycle time and high overhead: SpinCAS If it fails for a long time, it will give the CPU Incurs very large execution overhead
  • Only guarantees atomic operation of a shared variable: We can use loopCAS Atomic operations are guaranteed, but when operating on multiple shared variables, loopCAS cannot guarantee the atomicity of the operation. This can be used Lock.

(3) Use lock mechanism to implement atomic operations

The lock mechanism ensures that only the thread that obtains the lock can operate the locked memory area. JVM implements many kinds of lock mechanisms internally, including biased locks, lightweight locks and mutex locks. What’s interesting is thatIn addition to biased locks, the JVM uses cyclic CAS to implement locks. That is, when a thread wants to enter a synchronized block, it uses cyclic CAS to acquire the lock. When it exits synchronization Use cyclic CAS to release the lock when blocking.

Main memory and local memory structures

Communication between Java threads is controlled by the Java Memory Model (JMM), JMM determines a When a thread's write to a shared variable becomes visible to another thread. From an abstract perspective, JMM defines threads and , A data storage location after optimization by the compiler. hardwareand otherRegisterswrite buffer, cache. Local Memory It covers Local memory is an abstract concept of JMM and does not really exist. Shared variables between threads are stored in main memory (main memory), each Each thread has a private local memory (local memory), which stores a copy of the thread's read/write shared variables:Abstract relationship between main memory

Insert image description here

From the above figure, if thread A and thread B want to communicate, they must go through the following two steps:

  • First, thread A refreshes the updated shared variables in local memory A to the main memory.
  • Then, thread B goes to main memory to read the shared variables that thread A has updated before.

The following is a schematic diagram to illustrate these two steps:

Insert image description here

Reordering from source code to instruction sequence

To improve performance when executing programs, compilers and processors often reorder instructions. There are 3 types of reordering:

  • Compiler-optimized reordering. The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.
  • Instruction-level parallel reordering. Modern processors use instruction-level parallelism (ILP) to execute multiple instructions in an overlapping manner. If there are no data dependencies, the processor can change the order in which statements correspond to machine instructions.
  • Memory system reordering. Because the processor uses cache and read/write buffers, this can make load and store operations appear to be executed out of order.

From the Java source code to the final actually executed instruction sequence, the following three reorderings will be experienced:

Insert image description here

The above 1 belongs to the compiler reordering, and 2 and 3 belong to the processor reordering. These reorderings can cause memory visibility problems in multi-threaded programs. For compilers, JMM's compiler reordering rules prohibit certain types of compiler reordering (not all compiler reorderings are prohibited). For processor reordering, JMM's processor reordering rules require the Java compiler to insert specific types of memory barriers (Intel calls it memory fence) instructions when generating instruction sequences, and use memory barrier instructions to prohibit specific types of instructions. Processor reordering (not all processor reordering must be disabled).

JMM is a language-level memory model that ensures consistent memory visibility guarantees for programmers across different compilers and different processor platforms by prohibiting specific types of compiler reordering and processor reordering.

Definition of happens-before

Starting from JDK 5, Java uses the new JSR-133 memory model. JSR-133 uses the concept of happens-before to illustrate memory visibility between operations. In JMM, if the results of one operation need to be visible to another operation, then there must be a happens-before relationship between the two operations. The two operations mentioned here can be within one thread or between different threads.

JSR-133 defines the happens-before relationship as follows

  • 1)If an operation happens-before another operation, then the execution result of the first operation will be visible to the second operation, and the execution order of the first operation before the second operation.
  • 2)The existence of a happens-before relationship between two operations does not mean that the specific implementation of the Java platform must be executed in the order specified by the happens-before relationship. If the execution result after the reordering is consistent with the execution result according to the happens-before relationship, then this reordering is not illegal (that is, JMM allows this reordering).

The happens-before rules that are closely relevant to programmers are as follows:

  • Program sequence rules: Each operation in a thread happens-before any subsequent operation in the thread. (Code with dependencies)
  • Monitor lock rule: The unlocking of a lock happens-before the subsequent locking of the lock. (Refresh main memory in time + atomize a set of instructions)
  • volatile variable rules: A write to a volatile field happens-before any subsequent read of the volatile field. (Refresh main memory in time + prohibit rearrangement with previous instructions)
  • Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
  • Thread startup rules: The Thread object's start() method occurs first in each thread of this thread. an action;
  • Thread interruption rules: The call to the threadinterrupt() method occurs first in the interrupted thread The code detects the occurrence of the interrupt event;
  • Thread termination rule: All operations in the thread occur first in the thread's termination detection. We can passThread.join()< /span> detects that the thread has terminated execution; Thread.isAlive() The method ends, and the return value of
  • Object finalization rule: The initialization of an object occurs first in its finalize() method the beginning of;

A happens-before rule corresponds to one or more compiler and processor reordering rules.

The above 1) is JMM’s commitment to programmers. From a programmer's perspective, the happens-before relationship can be understood like this:If A happens-before B, then the Java memory model will guarantee to the programmer that the result of the A operation will happen to B It can be seen that the execution order of A is before B. Note that this is only a guarantee made by the Java memory model to programmers!

The above 2) is the binding principle of JMM on compiler and processor reordering. As mentioned earlier, JMM is actually following a basic principle:As long as the execution result of the program is not changed (referring to single-threaded programs and correctly synchronized multi-threaded programs) the compiler and processor It doesn't matter how you optimize it. The reason why JMM does this is:Programmers do not care whether these two operations are actually reordered. What programmers care about is that the semantics of the program during execution cannot be changed (i.e., the execution results cannot be changed). Therefore, the happens-before relationship is essentially the same semantics as as-if-serial.

as-if-serial semantics

as-if-serial semantics means:No matter how you reorder (compilers and processors in order to improve parallelism), the execution results of the (single-threaded) program cannot be Change. The compiler, runtime, and processor must adhere to as-if-serial semantics.

In order to comply with as-if-serial semantics,the compiler and processor will not reorder operations with data dependencies, Because this reordering will change the execution results. However, if there are no data dependencies between operations, these operations may be reordered by the compiler and processor.

  • The as-if-serial semantics ensure that the execution results of a single-threaded program are not changed, and the happens-before relationship ensures that the execution results of correctly synchronized multi-threaded programs are not changed.
  • as-if-serial semantics creates an illusion for programmers who write single-threaded programs: single-threaded programs are executed in program order. The happens-before relationship creates an illusion for programmers who write correctly synchronized multi-threaded programs: correctly synchronized multi-threaded programs are executed in the order specified by happens-before.

The purpose of as-if-serial semantics and happens-before is to increase the parallelism of program execution as much as possible without changing the program execution results.

program sequence rules

double pi = 3.14;         // A 
double r = 1.0;           // B 
double area = pi * r * r; // C 

According to the happens-before program sequence rules, the above example code for calculating the area of ​​a circle has three happens-before relationships:

1)A happens-before B。
2)B happens-before C。
3)A happens-before C。

The third happens-before relationship here is derived based on the transitivity of happens-before.

Here A happens- before B, but B can be executed before A during actual execution. If A happens-before B, JMM does not require that A must be executed before B. JMM only requires that the previous operation (result of execution) be visible to the following operation, and that the previous operation precedes the second operation in order. Here, the execution result of operation A does not need to be visible to operation B; and the execution result after reordering operation A and operation B is consistent with the result of executing operation A and operation B in the happens-before order. In this case, JMM will consider that this reordering is not illegal (not illegal), and JMM allows this reordering.

Sequentially consistent memory model

characteristic

  • All operations in a thread must be performed in the order of execution of the program
  • All threads can only see a single order of operation execution (regardless of whether it is correctly synchronized), and each operation must be executed atomically and visible to all threads at once.

Insert image description here

Conceptually, the sequential consistency model has a single global memory. This memory can be connected to any thread through a left-right swing switch. At the same time, each thread must perform memory reads/writes in the order of the program. write operation. As can be seen in the above figure, Only one thread can connect to the memory at most at any time. Therefore,when multiple threads execute concurrently, the switch device in the figure can serialize all memory read/write operations (that is, in sequential consistency There is a total order relationship between all operations in the model).

Assume that thread A and thread B use monitor locks to synchronize correctly. Thread A releases the monitor lock after executing three operations, and then thread B acquires the same monitor lock. Then the execution effect of the program in the sequential consistency model is as follows:

Insert image description here

Assuming that thread A and thread B are not synchronized, another possible effect of this unsynchronized program in the sequential consistency model is as follows:

Insert image description here

Although the overall execution order of unsynchronized programs is disordered in the sequential consistency model, all threads can only see a consistent overall execution order. Taking the above figure as an example, the execution order seen by threads A and B is: A1 - B1 - A2 - B2 - A3 - B3. This guarantee is achieved becauseevery operation in the sequentially consistent memory model must be immediately visible to any thread.

However, in JMM there is no such guarantee. Not only is the overall execution sequence of an unsynchronized program out of order in JMM, but the order of operation execution seen by all threads may also be inconsistent. For example, when the current thread caches written data in local memory, this write operation is only visible to the current thread before it is refreshed to the main memory; from the perspective of other threads, it will be considered that this write operation is not considered by the current thread at all. thread execution. Only after the current thread flushes the data written in the local memory to the main memory can this write operation be visible to other threads. In this case, various operating results will occur.

The basic policy of JMM in specific implementation is to open the door as conveniently as possible for compiler and processor optimization without changing (correctly synchronizing) program execution results.

Insert image description here

In the sequential consistency model, all operations are executed serially in the order of the program. In JMM, the code in the critical section can be reordered (but JMM does not allow the code in the critical section to "escape" outside the critical section, which will destroy the semantics of the monitor lock) . JMM will do some special processing at the key time points of entering the critical section and exiting the critical section, so that the thread has the same memory view in the sequential consistency model at these two time points. Although thread A has reordered in the critical section, due to the mutually exclusive execution characteristics of the monitor lock, thread B cannot "observe" thread A's reordering in the critical section a>. This reordering improves execution efficiency without changing the execution results of the program.

Operations on long and double types

On some 32-bit processors, if the write operation of 64-bit data is required to be atomic, there will be a very large synchronization overhead. To accommodate such processors, the Java Language Specification encourages but does not require the JVM to support 64-bit long and double types. Variable write operations of type are atomic. When the JVM runs on such a processor, it may split a write operation of a 64-bit long/double variable into two 32-bit writes. operation to perform. These two 32-bit write operations may be assigned to different bus transactions for execution. At this time, the write operation to this 64-bit variable is not atomic.

When individual memory operations are not atomic, there can be unintended consequences.

Insert image description here

As shown in the figure above, assume that processor A writes a variable of type long, and processor B wants to read this variable of type long. A 64-bit write operation in processor A is split into two 32-bit write operations, and the two 32-bit write operations are assigned to different transactions for execution. At the same time, 64-bit read operations in processor B are assigned to a single read transaction for execution. When processors A and B execute according to the above sequence, processor B will see an invalid value that was only "half-written" by processor A.

Note: In the old memory model of JSR-133, a 64-bitlong/double variableRead/write operation can be split into two 32-bit read/write operations to perform. Starting from the JSR-133 memory model (i.e. JDK1.5), only a 64-bit long/double variable is allowed. 7>The write operation is split into two 32-bit write operations for execution. Any read operation must be atomic in JSR-133.

volatile write-read memory semantics

  • Memory semantics of volatile writing:When writing a volatile variable, JMM will store the shared variable value in the local memory corresponding to the thread. Flush to main memory.
  • Memory semantics of volatile read:When reading a volatile variable, JMM will invalidate the local memory corresponding to the thread. The thread will next read the shared variable from main memory.

The following is a summary of the memory semantics of volatile writes and volatile reads.

  • Thread A writes a volatile variable. In essence, thread A sends a message (about its modifications to the shared variable) to a thread that will next read the volatile variable.
  • When thread B reads a volatile variable, in essence, thread B accepts the message sent by a previous thread (the modification to the shared variable before writing the volatile variable).
  • Thread A writes a volatile variable, and then thread B reads the volatile variable. This process is essentially thread A sending a message to thread B through main memory.

In order to implement volatile memory semantics,when the compiler generates bytecode, it will insert memory barriers in the instruction sequence to prohibit specific types of processor reordering . It is almost impossible for the compiler to discover an optimal arrangement that minimizes the total number of insertion barriers. To this end, JMM adopts a conservative strategy. The following is a JMM memory barrier insertion strategy based on a conservative strategy.

  • Insert a StoreStore barrier before each volatile write operation
  • Insert a StoreLoad barrier after each Volatile write operation
  • Insert a LoadLoad barrier after each volatile read operation
  • Insert a LoadStore barrier after each volatile read operation

The above memory barrier insertion strategy is very conservative, but it can ensure that correct volatile memory semantics can be obtained in any program on any processor platform.

Memory semantics of lock release and acquisition

  • When a thread releases the lock, JMM will refresh the shared variables in the local memory corresponding to the thread to the main memory.
  • When a thread acquires a lock, JMM will invalidate the local memory corresponding to the thread. As a result, the critical section code contained in the monitor must read the shared variables from main memory.

As shown below

Insert image description here
Insert image description here

Comparing the memory semantics of lock release-acquisition with the memory semantics of volatile write-read, it can be seen that:Lock release and volatile write have the same memory semantics; lock acquisition and volatile read have the same memory semantics.

The following is a summary of the memory semantics of lock release and lock acquisition:

  • Thread A releases a lock. In essence, thread A sends a message (about the modifications made by thread A to the shared variable) to a thread that will acquire the lock next.
  • Thread B acquires a lock. In essence, thread B receives a message sent by a previous thread (which modified the shared variable before releasing the lock).
  • Thread A releases the lock, and then thread B acquires the lock. This process is essentially thread A sending a message to thread B through main memory.

ReentrantLock

class ReentrantLockExample {
    
    
    int a = 0;

    ReentrantLock lock = new ReentrantLock();

    public void writer() {
    
    
        lock.lock(); // 获取锁
        try {
    
    
            a++;
        } finally {
    
    
            lock.unlock(); // 释放锁
        }
    }

    public void reader() {
    
    
        lock.lock(); // 获取锁
        try {
    
    
            int i = a;
            ...
        } finally {
    
    
            lock.unlock(); // 释放锁
        }
    }
}

InReentrantLock, call the lock() method to acquire the lock; call the unlock() method to release the lock.

ReentrantLockThe implementation of relies on the Java synchronizer frameworkAbstractQueuedSynchronizer(AQS). AQS uses an integer volatile variable (state) to maintain synchronization status. Thisvolatilevariable is the key to the implementation ofReentrantLockmemory semantics.

Insert image description here

ReentrantLock is divided into fair lock and unfair lock, We first analyzefair lock. When using fair lock, the calling trace of the locking methodlock() is as follows:

1)ReentrantLocklock()
2)FairLocklock()
3)AbstractQueuedSynchronizeracquire(int arg)
4)ReentrantLocktryAcquire(int acquires)

The fourth step is to actually start locking. The following is the source code of this method.

/**
* Fair version of tryAcquire. Don't grant access unless
* recursive call or no waiters or is first.
*/
protected final boolean tryAcquire(int acquires) {
    
    
    final Thread current = Thread.currentThread();
    int c = getState(); // 获取锁的开始,首先读取 volatile 变量 state
    if (c == 0) {
    
    
        if (!hasQueuedPredecessors() && compareAndSetState(0, acquires)) {
    
    
            setExclusiveOwnerThread(current);
            return true;
        }
    } else if (current == getExclusiveOwnerThread()) {
    
    
        int nextc = c + acquires;
        if (nextc < 0)
            throw new Error("Maximum lock count exceeded");
        setState(nextc);
        return true;
    }
    return false;
}

As can be seen from the source code above, the locking method first reads the volatile variablestate.

When using fair lock, the unlocking methodunlock()The calling trace is as follows

1)ReentrantLockunlock()
2)AbstractQueuedSynchronizerrelease(int arg)
3)SynctryRelease(int releases)

The third step actually begins to release the lock. The following is the source code of this method.

protected final boolean tryRelease(int releases) {
    
    
    int c = getState() - releases;
    if (Thread.currentThread() != getExclusiveOwnerThread())
        throw new IllegalMonitorStateException();
    boolean free = false;
    if (c == 0) {
    
    
        free = true;
        setExclusiveOwnerThread(null);
    }
    setState(c); // 释放锁的最后,写 volatile 变量 state
    return free;
}

As can be seen from the above source code, writes the volatile variable state at the end of releasing the lock.

Fair lock writes the volatile variable state last when the lock is released, and reads this volatile variable first when acquiring the lock . According to the happens-before rule of volatile, the thread that releases the lock writes the shared variable visible before the volatile variable, and the thread that acquires the lock reads the same >volatileThe variable immediately becomes visible to the thread that acquired the lock.

Now let’s analyze the implementation of memory semantics ofunfair lock. The release of unfair locks is exactly the same as that of fair locks, so only the acquisition of unfair locks is analyzed here. When using unfair lock, the calling trace of the locking methodlock() is as follows

1)ReentrantLocklock()
2)NonFairSynclock()
3)AbstractQueuedSynchronizercompareAndSetState(int expect,int update)

The third step actually starts locking. The following is the source code of this method.

/**
* 该方法以原子操作的方式更新state变量,JDK中对该方法的说明:如果当前状态值等于预期值,
* 则以原子方式将同步状态设置为给定的更新值。此操作具有volatile读和写的内存语义。
*/
protected final boolean compareAndSetState(int expect, int update) {
    
    
    // See below for intrinsics setup to support this
    return unsafe.compareAndSwapInt(this, stateOffset, expect, update);
}

This method updates the state variable in an atomic operation. This article refers to Java's compareAndSet() method call as CAS. The JDK documentation explains this method as follows: If the current state value equals the expected value, atomically sets the synchronization state to the given updated value. This operation has memory semantics ofread and write. volatile

Now let’s summarize the memory semantics of fair locks and unfair locks.

  • When the fair lock and unfair lock are released, a volatile variable must be written at the endstate,
  • When acquiring a fair lock, the variable volatile will be read first,
  • When unfair lock is acquired, CAS will first be used to update the volatile variable. This operation has both volatile memory semantics of reading and writing.

From the analysis of ReentrantLock in this article, we can see that there are at least two ways to implement lock release-acquisition memory semantics a>

  • 1)Use the memory semantics of write-read of volatile variables
  • 2)Use the volatile read and volatile write memory semantics that come with CAS

Implementation of current package

Since Java'sCAS has both volatile read and volatile writes memory semantics, so communication between Java threads now has the following 4 ways

1) Thread A writes the volatile variable, and then thread B reads the volatile variable.
2) Thread A writes the volatile variable, and then thread B updates the volatile variable with CAS.
3) Thread A uses CAS to update a volatile variable, and then thread B uses CAS to update the volatile variable.
4) Thread A uses CAS to update a volatile variable, and then thread B reads the volatile variable.

Perform read-modify-write operations on memory atomically, which is key to achieving synchronization in multiple processors. At the same time, the reading/writing of volatile variables and CAS can realize threading. communication between. Integrating these together forms the cornerstone for the implementation of the entire concurrent package. If we carefully analyze the source code of the concurrent package, we will find a general implementation pattern:

First, declares the shared variable as volatile,
Then, uses CAS’s atomic conditional update to achieve Synchronization between threads,
At the same time, is equipped with volatile read/write and volatile read and write memory semantics of CAS to achieve between threads Communication.

AQS, non-blocking data structures and atomic variable classes, the basic classes in these concurrent packages are all implemented using this mode, and the high-level classes in the concurrent package are It is implemented based on these basic classes.

Insert image description here

Memory semantics of final fields

Reordering rules for final fields

For final fields, both the compiler and processor obey two reordering rules

  • 1) Writing a final field in the constructor and subsequently assigning a reference to the constructed object to a reference variable cannot be reordered.
  • 2) The first reading of a reference to an object containing a final field and the subsequent first reading of this final field cannot be reordered between the two operations.

Write final field reordering rules

writesfinal Field reordering rules prohibits reordering writes of final fields to constructors In addition, the implementation of this rule mainly includes two aspects:

  • 1) JMM prohibits the compiler from reordering writes to final fields outside of the constructor;
  • 2) The compiler will insert a storestore barrier after the final field is written and before the constructor returns. This barrier prevents the processor from reordering writes to final fields outside of the constructor.

Writing reordering rules for final fields ensures that:The object's final fields have been correctly initialized before the object reference is visible to any thread, while ordinary domains do not have this guarantee.

Read final field reordering rules

The reordering rules for reading final fields are:In a thread, the first time the object reference is read and the first time the final field contained in the object is read, JMM will prohibit the reordering of these two operations. Sort. (Note that this rule is only for the processor). The processor will insert a in front of the read final field operation. LoadLoad barrier. In fact, there is an indirect dependence between reading the object's reference and reading the object's final field, and the general processor will not reorder these two operations. However, some processors will reorder, so this prohibition of reordering rule is set for these processors.

The reordering rules for reading final fields ensure that: Before reading the final field of an object, the reference to the object containing the final field must be read first.

The final field is a reference type

The final field we saw above is a basic data type. What will be the effect if the final field is a reference data type? Please see the sample code below.

public class FinalRefrenceExample {
    
     

	final int[] intArray;                    // final 是引用类型
	static FinalReferencExample obj;
	
	public FinalReferenceExample() {
    
             // 构造函数 
		intArray = new int[1];               // 1 
		intArray[0] = 1;                     // 2 
	}
	
	public static void writeOne() {
    
               // 写线程 A 执行 
		obj = new FinalReferenceExample();    // 3
	}
	
	public static void writerTwo() {
    
             // 写线程 B 执行 
		obj.intArray[0] = 2;                 // 4 
	}
	
	public static void reader() {
    
               // 读线程 C 执行 
		if (obj != null) {
    
    	                // 5 
			int templ = obj.intArray[0];	// 6 
		}
	}
}

For reference data types, the reordering rules for writing final field pointers add the following constraints to the compiler and processor:Repair members of a final-modified object in the constructor The writing of the field and the subsequent assignment of a reference to the constructed object to a reference variable outside the constructor cannot be reordered.

For the above example program, assume that thread A first executes the writerOne() method, and after the execution, thread B executes the writerTwo() method. After execution, thread C executes the reader() method. The figure below is a possible thread execution timing.

Insert image description here

In the above figure, 1 is writing to the final field, 2 is writing to the member field of the object referenced by the final field, and 3 is assigning the reference of the constructed object to a reference variable. In addition to the aforementioned 1 cannot be reordered with 3, 2 and 3 cannot be reordered either.

JMM can ensure that the reading thread C can at least see the writing of the member fields of the final reference object by the writing thread A in the constructor. That is, C can at least see that the value of array index 0 is 1. The writing of array elements by writing thread B may or may not be visible to reading thread C. JMM does not guarantee that thread B's writing is visible to reading thread C, because there is data competition between writing thread B and reading thread C, and the execution results at this time are unpredictable.

If you want to ensure that the reading thread C sees the writing of the array elements by the writing thread B, a synchronization primitive (lock or < /span>volatile) to ensure memory visibility.

Why final references cannot "spill" out of constructors

As mentioned earlier, writing the reordering rules for final fields can ensure that:Before the reference variable is visible to any thread, the final field of the object pointed to by the reference variable has already been in the constructor is correctly initialized. In fact, to get this effect, you need a guarantee: Within the constructor, the reference of the constructed object cannot be seen by other threads, that is, the object reference cannot be in the constructor." Escape”. To illustrate this problem, let's look at the following sample code:

public class FinalReferenceEscapeExample {
    
     
	final int i;
	static FinalReferenceEscapeExample obj; 

	public FinalReferenceEscapeExample() {
    
     
		i = 1;                             // 1 写 final 域 
		obj = this;                        // 2 this 引用在此“逸出”
	}
	
	public static void writer() {
    
    
		new FinalReferenceEscapeExample(); 
	}
	
	public static void reader() {
    
    
		if (obj != null) {
    
                    // 3 
			int temp = obj.i;             // 4
		}
	}
} 

Insert image description here

Suppose one thread A executes the writer() method, and another thread B executes the reader() method. Operation 2 here makes the object visible to thread B before it has completed construction. Even if operation 2 here is the last step of the constructor, and operation 2 is listed after operation 1 in the program, the thread executing the read() method may still not be able to see final The value after the field is initialized, because operation 1 and operation 2 here may be reordered. The actual execution timing may be as shown in Figure 3-32.

It can be seen from Figure 3-32:Before the constructor returns, the reference of the constructed object cannot be seen by other threads, because the final field at this time may not have is initialized. After the constructor returns, any thread is guaranteed to see the correctly initialized value of the final field.

JSR-133 why final semantics should be enhanced

In the old Java memory model, one of the most serious flaws was that a thread might see the value of a final field change. For example, a thread currently sees that the value of an integer final field is 0 (the default value before initialization). After a while, when the thread reads the value of the final field, it finds that the value has changed to 1 (the default value before initialization). value after a thread is initialized). The most common example is that in the old Java memory model, the value of a String may change.

To fix this vulnerability, the JSR-133 expert group enhanced the semantics of final. By adding write and read reordering rules to the final field, Java programmers can be provided with initialization safety guarantees: as long as the object is constructed correctly (the reference of the constructed object has no "escape" in the constructor) "out"), then there is no need to use synchronization (referring to the use of lock and volatile) to ensure that any thread can see the value of this final field after it is initialized in the constructor.

The origin of double-check locking

public class UnsafeLazyInitialization {
    
    
	private static Instance instance; 

	public static Instance getInstance() {
    
    
		if (instance = null) {
    
              // 1:A 线程执行
			instance = new Instance();  // 2: B 线程执行
		}
	 	return instance;
	}
}

InUnsafeLazyInitialization, assume that thread A executes code 1 while thread B executes code 2. At this time, thread A may see that the object referenced by instance has not been initialized (see the "Root of the Problem" below for the reason for this situation).

ForUnsafeLazyInitialization, we can synchronize the getInstance() method to achieve thread-safe delayed initialization. The sample code is as follows:

Insert image description here

Due to synchronization of getInstance(), synchronized will cause performance overhead. If getInstance() is called frequently by multiple threads, it will lead to a decrease in program execution performance. On the other hand, if getInstance() will not be called frequently by multiple threads, then this delayed initialization scheme will provide satisfactory performance.

In early JVMs,synchronized (even race-free onessynchronized) had this huge performance overhead. Therefore, people came up with a "clever" trick: double-checked locking. People want to reduce the overhead of synchronization through double-checked locking. Here is sample code that uses double-checked locking to implement lazy initialization:

Insert image description here
As shown in the above code, if the first check instance is not null, then there is no need to perform the following locking and initialization operations. Therefore, the performance overhead caused by synchronized can be greatly reduced. On the surface, the above code seems to have the best of both worlds:

  • When multiple threads try to create an object at the same time, locking is used to ensure that only one thread can create the object.
  • After the object is created, executinggetInstance() will not require acquiring the lock and will directly return the created object.

Double-checked locking may seem perfect, but it's a wrong optimization! When the thread executes the 4th line of code and reads that instance is not null, instanceThe referenced object may not have been initialized.

the root of the problem

Line 7 of the previous double-checked locking example code (instance = new Singleton();) creates an object. This line of code can be broken down into the following 3 lines of pseudocode:

Insert image description here

Between 2 and 3 in the above three lines of pseudocode, may be reordered. The execution timing after reordering between 2 and 3 is as follows:

Insert image description here

According to the Java Language Specification, all threads must comply with intra-thread semantics when executing Java programs. Intra-thread semantics ensures that reordering does not change the results of program execution within a single thread. In other words, intra-thread semantics allows reordering within a single thread that does not change the execution results of a single-threaded program. Although the above three lines of pseudocode are reordered between 2 and 3, this reordering does not violate intra-thread semantics. This reordering can improve the execution performance of the program without changing the execution results of the single-threaded program.

In order to better understand intra-thread semantics, please look at the following diagram (assuming that a thread A accesses the object immediately after constructing the object):

Insert image description here

As shown in the figure above, as long as 2 is ensured to be ranked before 4, intra-thread semantics will not be violated even if there is reordering between 2 and 3.

Next, let us look at the situation when multiple threads are executed concurrently. Please see the diagram below:

Insert image description here

Since intra-thread semantics must be adhered to in a single thread, it is guaranteed that the program execution result of thread A will not be changed. But when threads A and B execute according to the timing sequence in the figure above, thread B will see an object that has not been initialized.

Back to the topic of this article, DoubleCheckedLocking Line 7 of the sample code (instance = new Singleton();) If a reordering occurs, another concurrently executing thread B will It is possible to determine at line 4 that instance is not null. Thread B will next access the object referenced by instance, but this object may not have been initialized by thread A at this time! The following is the specific execution timing of this scenario:

Insert image description here

Although A2 and A3 are reordered here, the intra-thread semantics of the Java memory model will ensure that A2 will be executed before A4. Therefore thread A's intra-thread semantics have not changed. However, the reordering of A2 and A3 will cause thread B to determine that instance is not empty at B1, and thread B will next access the object referenced by instance. At this time, Thread B will access an object that has not yet been initialized.

After knowing the root cause of the problem, we can think of two ways to achieve thread-safe delayed initialization:

    1. Reordering 2 and 3 is not allowed;
    1. Reordering 2 and 3 is allowed, but other threads are not allowed to "see" this reordering.

The two solutions introduced later correspond to the above two points respectively.

Volatile-based double-check locking solution

For the previous solution to implement delayed initialization based on double-checked locking (referring to the DoubleCheckedLocking sample code), we only need to make a small modification (replace instanceDeclared as type volatile), thread-safe delayed initialization can be achieved. Please see the sample code below:

Insert image description here

The "source of the problem" is the reordering between 2 and 3 in the three lines of pseudocode after declaring the reference to the object as volatile, Will be disabled in a multi-threaded environment. The above sample code will be executed in the following timing sequence:

Insert image description here

Solution based on class initialization

JVM is in the initialization phase of the class (that is, in the ClassAfter being loaded and before being used by the thread), the initialization of theclass will be performed. During the initialization of the execution class, the JVM will acquire a lock. This lock can synchronize the initialization of the same class by multiple threads.

Based on this feature, another thread-safe delayed initialization scheme can be implemented (this scheme is called Initialization On Demand Holder idiom):

public class Instance {
    
    
	private Instance() {
    
    }
	
	private static class InstanceHolder {
    
    
		public static Instance instance = new Instance(); 
	}
	
	public static Instance getInstance() {
    
    
		return InstanceHolder.instance; // 这里将导致InstanceHoLder类被初始化 
	}
}

Assume two threads execute concurrentlygetInstance(), the following is a schematic diagram of the execution:

Insert image description here

The essence of this solution is: 2 and 3 of the three lines of pseudocode that are the "source of the problem" are allowed to be reordered, but do not allow non-construction threads (here refers to thread B ) "see" this reordering.

Initialize a class, including performing static initialization of this class and initializing Static fields. According to the Java Language Specification, a class or interface type T will be initialized immediately the first time any of the following situations occurs:

  • 1) T is a class, and an instance of type T is created;
  • 2) T is a class, and a static method declared in T is called;
  • 3) A static field declared in T is assigned;
  • 4) A static field declared in T is used, and this field is not a constant field;
  • 5) T is a top level class (see §7.6 of the Java Language Specification), and an assertion statement nested inside T is executed.

In the InstanceFactory sample code, the first execution of the thread of getInstance() will cause the InstanceHolder class to be initialized (in line with case 4).

Since the Java language is multi-threaded, multiple threads may try to initialize the same class or interface at the same time (for example, multiple threads may call getInstance() at the same time to initialize InstanceHolder class). Therefore, when initializing a class or interface in Java, careful synchronization processing is required.

The Java language specification stipulates that for each class or interface C, there is a unique initialization lock LC corresponding to it. The mapping from C to LC is freely implemented by the specific implementation of the JVM.

The JVM will acquire this initialization lock during class initialization, and each thread will acquire the lock at least once to ensure that the class has been initialized.

Insert image description here

By comparing the double-checked locking scheme based on volatile and the scheme based on class initialization, we will find that The implementation code of the solution based on class initialization is simpler.

But the scheme based onvolatile's double-checked locking has anadditional advantage :In addition to lazy initialization of static fields, lazy initialization of instance fields can also be implemented.

Lazy initialization reduces the cost of initializing a class or creating an instance, but increases the cost of accessing lazily initialized fields. Most of the time, normal initialization is better than lazy initialization.

  • If you really need to use thread-safe delayed initialization for instance fields, please use the volatile-based delayed initialization solution introduced above;
  • If you really need to use thread-safe lazy initialization for static fields, use the class-based initialization scheme described above.

Guess you like

Origin blog.csdn.net/lyabc123456/article/details/134834235