The underlying implementation principle of Java concurrent concurrency mechanism

Preface

The code finally needs to be converted into assembly instructions to be executed on the CPU. The concurrency mechanism used in Java depends on the implementation of the JVM and the instructions of the CPU.
This article will go deep into the bottom layer to understand the underlying implementation principle of the concurrency mechanism. Will introduce:

  1. The realization principle of volatile
  2. Final realization principle
  3. The realization principle of synchronized
  4. The realization principle of atomic operation

1. The definition and realization principle of volatile

  • Volatile defines the
    Java language to allow threads to access shared variables. In order to ensure that shared variables can be updated accurately and consistently, threads should ensure that this variable is obtained separately through an exclusive lock.
    If a variable is declared as volatile, the Java thread memory model will ensure that the value of this variable is consistent across all threads. It guarantees the visibility of shared variables by inserting memory barriers in multi-processor development and prohibits instruction reordering.

  • CPU term definition and memory barrier types
    CPU term definition
    Memory barrier type

  • How does volatile ensure visibility?
    By looking at the generated assembly instructions, it is found that when writing to volatile:

    1. The Lock prefix instruction writes the data of the current processor cache line back to the system memory
      • In order to improve the processing speed, the processor does not directly communicate with the memory, but first reads the data in the system memory to the internal cache (thread working memory) before performing the operation, but it will not write to the main memory immediately after the operation.
      • If it is a write operation to a volatile variable, the JVM will send a Lock prefix instruction to the processor to write the data of the cache line where the variable is located back to the system memory. But the caches of other processors are still old, and there will be problems when performing calculation operations.
    2. Writing the cache of one processor back to the memory will invalidate the data cached at that memory address in other CPUs
      • Therefore, in order to ensure that the caches of each processor are consistent, a cache coherency protocol will be implemented.
        Each processor checks whether the value of its cache is expired by sniffing the data spread on the bus. When the processor finds that the memory address corresponding to its cache line has been modified, it will set the current processor’s cache line to an invalid state. When the processor modifies this data, it will read the data from the main memory to the processor cache again.
  • The principle of volatile:
    in each insert a pre-volatile write StoreStorebarrier
    in each volatile write operation to insert a StoreLoadbarrier
    after each volatile read operation to insert a LoadLoadbarrier
    insert a read operation after each volatile LoadStorebarrier
    which is based on a conservative JMM The memory barrier insertion strategy of the strategy, in the specific implementation, as long as the memory semantics of volatile write-read is not changed, the compiler can omit unnecessary barriers according to the specific situation.

  • Volatile only guarantees the visibility and order of concurrency, but does not guarantee atomicity. Be careful when using it.

2. Memory semantics of final fields

  • For the final domain, the compiler and processor must comply with two reordering rules:

    1. Write final domain reordering rules: Write to a final domain in the constructor, and then assign the reference of the constructed object to a reference variable, the two operations cannot be reordered.
    2. Reordering rules for reading the final domain: the first time you read a reference to an object that contains the final domain, and the subsequent read of this final domain for the first time, the two operations cannot be reordered.
    3. For reference types: writing to the member field of a final reference object in the constructor, and subsequently assigning the reference of the constructed object to a reference variable outside the constructor, these two operations cannot be reordered (that is, the first Initialize the member variable of the final reference variable).
  • Writing the reordering rules for final fields can ensure that the final field of the object has been initialized correctly before the object reference is visible to any thread, and the ordinary field does not have this guarantee.

  • The reordering rules for reading the final domain can ensure that: before reading the final domain of an object, the reference of the object containing the final domain must be read first.

  • As long as the object is correctly constructed, that is, the reference to the constructed object does not escape in the constructor, then there is no need to use synchronization to ensure that any thread can see the value of the final field initialized in the constructor.

  • Final implementation principle: The
    reordering rules for writing the final field require the compiler to insert a StoreStorebarrier after the final field is written and before the constructor return ; the
    reordering rules for reading the final field require the compiler to insert a LoadLoadbarrier before the operation of reading the final field .

3. The realization principle of synchronized

This section introduces the optimization of synchronized, the biased locks and lightweight locks introduced in Java 6 in order to reduce the performance consumption of acquiring and releasing locks, as well as the storage structure and upgrade process of locks.

  • When a thread tries to access a synchronized code block, it must first obtain the lock, and must release the lock when it exits or throws an exception; every object in Java can be used as a lock, which can be expressed in the following three forms:
    1. For ordinary synchronization methods, the lock is the current instance object
    2. For static synchronization methods, the lock is the Class object of the current class
    3. For the synchronized method block, the lock is the object configured in Synchronized brackets

As can be seen from the JVM specification, JVM implements method synchronization and code block synchronization based on entering and exiting the Monitor object. But the implementation details of the two are different.

  1. Code block synchronization
    After compilation, by inserting monitorenterinstructions into the beginning of the synchronized code block, the monitorexitinstructions are inserted at the end of the method and the exception;
    any object has a monitor associated with it, and when the thread executes the monitorenterinstruction, it will try to obtain the object The ownership of the corresponding monitor is an attempt to obtain the lock of the object;
    when and a monitor is held, it will be in a locked state.

  2. Method synchronization When the
    synchronized method method_info结构has a AAC_synchronizedmark, the thread will recognize the mark and acquire the corresponding lock to achieve method synchronization.

The implementation details of the two are different, but they are essentially the acquisition of a monitor (Monitor) of an object.
When a thread executes to a synchronized code block or a synchronized method, the thread executing the method must first obtain the monitor of the object before it can execute; the thread that does not obtain the monitor will block, enter the synchronization queue, and change the state Blocked. When the thread that successfully acquires the monitor releases the lock, the thread blocked in the synchronization queue will be awakened to make it retry to acquire the monitor.

The types and upgrades of locks will be introduced in the next article.

4. The realization principle of atomic operation

CPU term definition

First, the processor automatically guarantees the atomicity of basic memory operations. For complex operations (cross-bus width, cross-multiple cache lines, and cross-page table access), the processor provides two mechanisms: bus lock and cache lock to ensure the atomicity of its operations.

  1. Use bus lock to ensure atomicity.
    Use a Lock#signal provided by the processor . When a processor outputs this signal on the bus, the requests of other processors will be blocked, and the processor can monopolize the shared memory.
    The bus lock locks the communication between the CPUT and the memory. During the lock, other processors cannot operate the data of other memory addresses, so the overhead of the bus lock is relatively large.
    So the processor uses cache lock to replace bus lock in some situations.

  2. Use cache lock to ensure atomicity.
    Cache lock means that if the memory area is cached in the processor's cache line and locked during the Lock operation, then when it performs the lock operation and writes back to the memory, the processor does not claim LOCK on the bus # Signal, but to modify the internal memory address and allow its cache coherency mechanism to ensure the atomicity of the operation, because the cache coherency mechanism will prevent the data in the memory area cached by more than two processors from being modified at the same time, when other processing When the device writes back the data of a locked cache line, it will invalidate the cache line.

  3. Two situations where cache locking cannot be used

    1. Processor does not support
    2. When the data of the current operation cannot be cached in the processor, or the data of the operation spans multiple cache lines, the processor will call the bus lock.

5. How Java implements atomic operations

By locking and cyclic CAS manner to guarantee atomicity operation.

1. Use CAS to achieve atomic operations CAS operations in
Java are implemented using the CMPXCHG instruction of the processor. The basic idea of ​​spin CAS is to loop CAS operations until it succeeds.

  • Disadvantages of CAS implementation of atomic operations
    1. ABA problem
      If a value turns out to be A, becomes B, and then becomes A, then CAS will find that its value has not changed when it checks, but it has actually changed.
      The solution to the ABA problem is to use the version number, add the version number before the variable, and add 1 to the version number each time the variable is updated.
      JDK provides AtomicStampedReference to solve such problems.
    2. Long cycle time and high cost
      If spin CAS is unsuccessful for a long time, it will bring great execution cost to the CPU.
    3. Only the atomic operation of a shared variable can be guaranteed.
      For multiple shared variables, cyclic CAS cannot guarantee its atomicity, and locks can be used at this time.
      The tricky way is to combine multiple shared variables into one shared variable to operate, such as i_j.
      JDK provides the AtomicReference class to ensure atomicity between referenced objects, and multiple objects can be placed in one object for CAS operations.
  1. Use the lock mechanism to achieve atomic operations The
    lock mechanism ensures that only the thread that obtains the lock can operate the locked memory area.
    In addition to the biased lock, the JVM implements locks using cyclic CAS. When a thread performs a synchronized block, CAS is used to acquire the lock, and when the thread exits the synchronized block, it uses CAS to release the lock.

Conclusion

This article introduces the implementation principles of volatile, final, synchronized, and atomic operations. Understanding these will be more helpful for subsequent understanding of concurrency frameworks and containers.

Reference:
ABA problem example

Guess you like

Origin blog.csdn.net/u014099894/article/details/102792177