[Learn JAVA from scratch | Article 41] In-depth JAVA lock mechanism

Table of contents

Foreword:         

Introduce:

Lock mechanism: 

CAS algorithm:

Optimistic lock and pessimistic lock:

Summarize:


Foreword:         

In multithreaded programming, cooperation and resource sharing among threads is an important topic. When multiple threads operate on shared data at the same time, problems such as data inconsistency or race conditions may occur. In order to solve these problems, Java provides a powerful locking mechanism, which enables multi-threaded programs to safely share resources and achieve synchronization between threads.

The Java lock mechanism allows us to control the access of multiple threads to shared resources, ensuring that only one thread can access public data or execute a specific code block at any time. This mechanism can be used not only to protect the consistency of shared variables, but also to implement mutually exclusive access to critical sections.

Introduce:

Before the lock mechanism appeared, multithreading often had the following two problems:

1. Data inconsistency : When multiple threads read and write shared data at the same time, data inconsistency may occur. For example, if a thread is modifying a variable while another thread reads the value of the variable at the same time, if there is no protection of the lock mechanism, it may read the old value before it was modified, resulting in inconsistent data Condition.

        Suppose two threads read and modify shared variables at the same time:

int sharedVariable = 0;

// 线程1的代码
sharedVariable = 10;

// 线程2的代码
int value = sharedVariable;
System.out.println(value);

In the above code, thread 1  sharedVariable modifies the value of the shared variable to 10. Meanwhile, thread 2 reads  sharedVariable the value of the shared variable and prints it. If there is no proper synchronization mechanism, thread 2 may read the old value 0 before modification, resulting in data inconsistency

2. Race condition : When multiple threads modify shared resources at the same time, due to the uncertainty of the execution order between threads, the execution result may depend on the relative order of thread execution. This uncertainty can lead to race conditions that cause programs to behave incorrectly. For example, if multiple threads perform self-increment operations on the same counter at the same time, if there is no proper synchronization mechanism, the counter value may be incorrect.

        Suppose there are two threads simultaneously incrementing a counter:

int counter = 0;

// 线程1的代码
counter++;

// 线程2的代码
counter++;

In the above code, if there is no proper synchronization mechanism,  counter++ thread switching may occur between the two threads when performing operations, resulting in an uncertain execution order between thread 1 and thread 2. In this case, if thread 1 performs the self-increment operation first, and then thread 2 performs the self-increment operation, the final counter value may only increase by 1 instead of the expected 2. This is a typical race condition, causing incorrect program behavior.

Therefore, we need a way to protect the call data when other threads are called, so we created a lock mechanism , which can solve these problems by using the lock mechanism. The lock mechanism can ensure that when a thread modifies the shared data, other threads cannot read or modify it at the same time, thereby avoiding data inconsistency and race conditions.

Before officially introducing the lock mechanism, let's first understand the memory structure of the JVM runtime:

In this graph, we need to know

  • The green part is the data area shared by all threads
  • The yellow part is the data area that each thread enjoys separately

So let's start with the formal introduction to locks:

Lock mechanism: 

In JAVA, each object has a lock , which is stored in the object header , and the lock records which thread the current object is occupied by.

        Components of an object:

  1. Object Header: The object header contains some information for storing object metadata, such as the hash code of the object, lock information, GC (garbage collection) related marks, etc. The size of the object header varies across different Java virtual machine implementations.

  2. Instance Data (Instance Data): The instance data is the actual storage space of the member variables of the object. They are part of the state of the object, that is, the member variables we define in the class. The size of the instance data depends on the number and type of member variables of the object.

  3. Alignment Padding: In order to improve the efficiency of memory access, the Java virtual machine requires that the starting address of an object must be a multiple of a specific value. If the size of the instance data is not a multiple of this specified value, it needs to be aligned by padding bytes.

OK, let's start to explain the first lock we encountered in our study

synchronized

In Java, when the synchronized keyword is used to modify a method or code block, two bytecode instructions are generated after compilation: monitorenterand monitorexit. These two instructions are used to acquire and release locks to achieve thread synchronization.

  1. monitorenterInstruction: This instruction is used to acquire the lock of the object (built-in lock). When a thread executes to a synchronizeddecorated code block or method, it will first try to acquire the object's lock . If the lock is not held by other threads, the thread will successfully acquire the lock and continue to execute the following instructions. If the lock is held by another thread, the current thread will enter a blocked state until the lock is released.

  2. monitorexitInstruction: This instruction is used to release the lock of the object. When the thread finishes executing synchronizedthe decorated code block or method, or exits abnormally, the thread will release the lock of the object. This ensures that other threads can acquire the lock and execute the associated code.

Example:

public class MyClass {
    private final Object lock = new Object();

    public void synchronizedMethod() {
        synchronized (lock) {
            // 被synchronized修饰的代码块
        }
    }
}

The corresponding compiled bytecode instructions are as follows:

0: aload_0           ; 将当前对象加载到操作数栈
1: getfield #1       ; 加载对象的字段(锁对象)
4: dup               ; 复制栈顶元素(锁对象)
5: astore_1          ; 将锁对象存储到局部变量
6: monitorenter      ; 进入同步块获取锁
7: /* 同步代码块 */   ; 执行同步块的代码
8: aload_1           ; 加载局部变量(锁对象)
9: monitorexit       ; 退出同步块释放锁

These bytecode instructions ensure that only one thread executes in the synchronized modified code block, and guarantee mutual exclusion and correct memory synchronization between threads. This ensures that multiple threads can safely access shared resources, avoiding concurrency issues.

And this is the operating mechanism of synchronized, which realizes the synchronization mechanism of threads through two bytecodes.

Unfortunately, synchronized has performance problems, because it is actually two bytecode instructions after being compiled, and these two bytecode files are all dependent on the mutex lock of the operating system, and the JAVA thread is essentially Mapping to operating system threads. Therefore, whenever a thread is operated or suspended, the operating system kernel state must be switched , and this operation is too time-consuming, and in all cases, even the switching time exceeds the application time.

Since JAVA6, synchronized has been optimized and biased locks and lightweight locks have been introduced.

At this point, there are four types of locks:

no lock, biased lock, lightweight lock, heavyweight lock

  1. Lock-Free : Lock-free is a concurrency control mechanism that allows multiple threads to modify shared resources at the same time without explicitly using locks. Lock-free algorithms usually use atomic operations (such as CAS, Compare and Swap) to ensure the atomicity and thread safety of multi-threaded operations. The goal of lock-free is to achieve maximum concurrency performance in a contention-free manner.

  2. Biased Locking : Biased Locking is a lock optimization mechanism implemented by the JVM for scenarios without competition. Its goal is to reduce the overhead of lock operations in the case of no contention. In the biased lock state, when a thread accesses the lock, the JVM will mark the lock object as the biased thread ID, and then the thread will not perform synchronization operations when it accesses the lock again, thereby improving performance.

  3. Lightweight Locking : Lightweight Locking is a lock optimization mechanism for situations where competition is not intense. It locks and releases by using CAS operations without requiring mutually exclusive kernel-mode operations. When a thread tries to acquire a lightweight lock, it uses a CAS operation to update the flag in the object header to the thread ID pointed to by the Lock Record. If the operation is successful, the thread can continue to execute the critical section code; if the operation fails, it means that there is competition and needs to be upgraded to a heavyweight lock.

  4. Heavyweight Locking : Heavyweight Locking is a traditional locking mechanism and the default lock implementation. When multiple threads compete for a lock, the JVM will upgrade the lock from a lightweight lock to a heavyweight lock. Heavyweight locks perform mutually exclusive kernel-mode operations at the operating system level, such as using mutexes. It ensures mutual exclusion between multiple threads, but also brings more overhead.

These four states are incremental, no lock -> biased lock -> lightweight lock -> heavyweight lock. And this status can be upgraded or downgraded. 

After we have learned the underlying mechanism of the mutex and the four states of the mutex, let's introduce it

CAS algorithm:

CAS (Compare and Swap) is asynchronization primitive used to implement lock-free algorithms . It is mainly used for atomic operations on shared data in a multi-threaded environment, providing a thread-safe way to update data.

The CAS algorithm involves three operands: the memory address (V), the old expected value (A), and the new value (B) . The execution process of the CAS algorithm is as follows:

  1. First, the thread reads the value in the memory address V, which is recorded as the current value currentV.

  2. Then, the thread checks whether the current value currentV is equal to the expected value A. If they are equal, it means that no other thread has modified the value, and the thread can perform the update operation.

  3. If the current value currentV is not equal to the expected value A, it means that other threads have modified the value, and the thread does not perform the update operation. You can choose to retry or take other strategies to deal with.

  4. If the current value currentV is equal to the expected value A, the thread writes the new value B to memory address V.

  5. Finally, the thread determines whether the write operation was successful. If it is successful, it means that the update operation is completed; if it is unsuccessful, it means that other threads have performed the update operation before this thread, and the entire CAS algorithm needs to be re-executed.

The core idea of ​​the CAS algorithm is to judge whether the shared data has been modified by comparing whether the current value is equal to the expected value. If it has not been modified, the update operation will be performed; if it has been modified, it means that other threads have modified the data first, and a retry is required. Therefore, the CAS algorithm can avoid the overhead of thread blocking and context switching caused by traditional locks, increasing concurrency performance.

However, the CAS algorithm also has some problems, such as the ABA problem (the value read twice is the same, but the intermediate process has changed) and the long cycle time and high overhead. In order to solve these problems, Java provides some atomic classes under the Atomic package, such as AtomicReference and AtomicStampedReference, which can solve the ABA problem in the CAS algorithm and provide more advanced packaging and functions.

Let's use pictures to demonstrate the CAS algorithm:

A and B represent two threads, and C represents the resource file that A and B are competing for at this time. If thread A is lucky enough to grab resource C, it will compare its own old value with C. If they are consistent, Change the state of C to 1, and obtain the right to operate C.

At this time, B is compared with C, 0! =1, so B will give up the swap operation, but in actual operation, B will not give up directly, but let it spin . The so-called spin is to continuously perform CAS operations. If the state of C becomes 0, B will perform the comparison and exchange operation again

 Let's use a piece of code to show the CAS function:

int cas(long * addr ,long oldvalue,long newvalue)
{
    if(*addr != old)
        return 0;
    *addr= new ;
    return 1;
}

In fact, there is still a problem with this code. CAS is divided into two parts: compare and swap. Since this method does not perform any synchronization operations, if thread A obtains a time slice but modifies the state of C, thread B obtains time again. At this time, isn't it that the two threads AB have obtained the operation power to the resource data at the same time?

But fortunately, CAS has already passed the underlying design and endowed it with atomicity.

Finally, let's introduce optimistic locking and pessimistic locking

Optimistic lock and pessimistic lock:

Optimistic locking and pessimistic locking are two ideas of concurrency control, which are mainly used in access control of shared data in a multi-threaded environment. The main difference between them is the strategy and mechanism for handling concurrency conflicts.

  1. Pessimistic lock:
    The idea of ​​pessimistic lock is to assume that other threads may modify the data during the entire data operation process, so when the data is operated on, it is assumed that conflicts will occur by default, so the method of blocking and waiting is adopted. Pessimistic locking mainly uses methods such as thread blocking and locking shared resources to ensure that only one thread can access shared data at a time.

Common pessimistic locking implementations include:

  • Mutex locks (such as the synchronized keyword in Java, ReentrantLock): Use mutex locks to ensure exclusive access to shared resources, and other threads need to wait for the lock to be released before they can access.
  • Read-write lock (such as ReentrantReadWriteLock in Java): By distinguishing between read and write operations, multiple threads are allowed to read shared resources at the same time, but only a single thread is allowed to write.
  1. Optimistic lock:
    The idea of ​​​​optimistic lock is to assume that no concurrency conflicts will occur during the entire data operation process, so instead of blocking and waiting, check whether conflicts occur when updating data. If a conflict is found, an appropriate strategy (such as retrying or aborting the update) is taken. Optimistic locking is usually implemented using a lock-free algorithm such as CAS.

Optimistic locking uses the CAS lock-free algorithm in most cases, so don't think that optimistic locking is a lock when you see the word lock! 

Common optimistic locking implementations include:

  • Versioning: Add a version number field to the data record, and check whether there is a conflict by comparing the version numbers each time it is updated.
  • Timestamp (Timestamp): Add a timestamp field to the data record, and judge whether a conflict occurs by comparing the timestamp each time it is updated.
  • CAS (Compare and Swap): Use atomic operations to update data, and determine whether a conflict occurs by comparing whether the current value is consistent with the expected value.

Optimistic locking is suitable for scenarios with very frequent read operations but relatively few write operations, which can improve concurrency performance. However, optimistic locking needs to ensure that the assumption that data will not be modified concurrently is established, otherwise it will cause data inconsistency problems. If the frequency of conflicts is high, optimistic locking may cause a large number of retries and reduce performance.

In practical applications, the choice of pessimistic locking or optimistic locking depends on specific scenarios, considering factors such as the frequency of concurrency conflicts, data consistency requirements, and performance requirements. Sometimes it is also possible to combine the advantages of the two and use an appropriate locking mechanism to meet the requirements.

Summarize:

        The bottom layer of the lock is really complicated, and we can’t explain it clearly in one or two articles, so I write this article more to attract everyone’s interest. If you are interested, you can go to learn more about various locks. Lock.

If my content is helpful to you, please like, comment and bookmark . Creation is not easy, everyone's support is my motivation to persevere!

 

Guess you like

Origin blog.csdn.net/fckbb/article/details/132154367