14 - Multi-threaded lock optimization (Part 2): Optimizing parallel operations using optimistic locks

The first two lectures discussed the synchronization lock mechanism implemented by Synchronized and Lock. Both synchronization locks are pessimistic locks and are the most intuitive way to protect thread safety.

We know that in high-concurrency scenarios with pessimistic locking, intense lock competition will cause thread blocking. A large number of blocked threads will cause system context switching and increase system performance overhead. Is it possible to implement a non-blocking lock mechanism to ensure thread safety? The answer is yes. Today I will take you to learn the optimization method of optimistic locking and see how to use it to maximize its value.

1. What is optimistic locking?

Before starting to optimize, let's briefly review the definition of optimistic locking.

Optimistic locking, as the name suggests, means that when operating shared resources, it always operates with an optimistic attitude. It believes that it can successfully complete the operation. But in fact, when multiple threads operate a shared resource at the same time, only one thread will succeed, so what about the failed thread? They do not hang in the operating system like pessimistic locks, but just return, and the system allows the failed thread to retry, and also allows the exit operation to be automatically abandoned.

Therefore, compared with pessimistic locks, optimistic locks will not cause deadlocks, starvation and other active fault problems, and the mutual influence between threads is far smaller than pessimistic locks. More importantly, optimistic locking has no system overhead caused by competition, so it is better in performance.

2. Implementation principle of optimistic locking

I believe you have a certain understanding of the above content. Let's take a look at the implementation principle of optimistic locking, which will help us fundamentally summarize the optimization method.

CAS is the core algorithm for implementing optimistic locking. It contains 3 parameters: V (variable that needs to be updated), E (expected value) and N (latest value).

Only when the variable that needs to be updated is equal to the expected value, the variable that needs to be updated will be set to the latest value. If the update value is different from the expected value, it means that other threads have already updated the variable that needs to be updated. At this time, the current thread will not do anything. Operation, returns the true value of V.

2.1. How CAS implements atomic operations

In the concurrent package in the JDK, the classes under the atomic path are all implemented based on CAS. AtomicInteger is a thread-safe integer class implemented based on CAS. Let's learn how to use CAS to implement atomic operations through the source code.

We can see that AtomicInteger's auto-increment method getAndIncrement uses Unsafe's getAndAddInt method. Obviously AtomicInteger relies on the local method Unsafe class. The operation method in the Unsafe class will call the CPU's underlying instructions to implement atomic operations.

    // 基于 CAS 操作更新值
    public final boolean compareAndSet(int expect, int update) {
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }
    // 基于 CAS 操作增 1
    public final int getAndIncrement() {
        return unsafe.getAndAddInt(this, valueOffset, 1);
    }
    
    // 基于 CAS 操作减 1
    public final int getAndDecrement() {
        return unsafe.getAndAddInt(this, valueOffset, -1);

2.2. How does the processor implement atomic operations?

CAS calls the underlying instructions of the processor to implement atomic operations. So how does the underlying processor implement atomic operations?

Communication between the processor and physical memory is much slower than processing between processors, so the processors have their own internal caches. As shown in the figure below, when operations are performed, frequently used memory data is cached in the processor's L1, L2, and L3 caches to speed up frequent reads.

Normally, a single-core processor can guarantee that basic memory operations are atomic. When a thread reads a byte, all processes and threads see bytes from the same cache. Other threads cannot access the memory address of this byte.

But today's servers are usually multi-processor, and each processor is multi-core. Each processor maintains a byte of memory, and each core maintains a byte of cache. At this time, multi-thread concurrency will cause cache inconsistency, resulting in data inconsistency.

At this time, the processor provides two mechanisms , bus locking and cache locking, to ensure the atomicity of complex memory operations.

When a processor wants to operate a shared variable, it will send a Lock signal on the bus. At this time, other processors cannot operate the shared variable, and the processor will exclusively own the variables in this shared memory. However, when the bus lock blocks other processors' operation requests to obtain the shared variable, it may also cause a large number of blockings, thus increasing the performance overhead of the system.

Therefore, later processors provide a cache locking mechanism, which means that when a processor operates on a shared variable in the cache, it will notify other processors to give up storing the shared resource or re-read the shared resource. The latest processors currently support cache locking mechanisms.

3. Optimize CAS optimistic locking

Although optimistic locking is superior to pessimistic locking in terms of concurrency performance, in operation scenarios where writing is greater than reading, the possibility of CAS failure will increase. If you do not give up the CAS operation, you will need to cycle through CAS retries. This is undoubtedly Will occupy the CPU for a long time.

In Java7, we can see from the following code: A for loop is used in the getAndSet method of AtomicInteger to continuously retry the CAS operation. If it is unsuccessful for a long time, it will bring very large execution overhead to the CPU. In Java 8, although the for loop has been removed, when we decompile the Unsafe class, we can find that the loop is actually encapsulated in the Unsafe class, and the CPU execution overhead still exists.

   public final int getAndSet(int newValue) {
        for (;;) {
            int current = get();
            if (compareAndSet(current, newValue))
                return current;
        }
    }

In JDK1.8, Java provides a new atomic class LongAdder. LongAdder will perform better than AtomicInteger and AtomicLong in high concurrency scenarios, at the cost of consuming more memory space.

The principle of LongAdder is to reduce the number of concurrent operations on shared variables, that is, to distribute the operating pressure of a single shared variable to multiple variable values, and to distribute the value of each competing writing thread into an array, and different threads will hit In different slots of the array, each thread only performs CAS operations on the value values ​​in its own slot. Finally, when reading the value, the shared variable of the atomic operation will be added to each value value scattered in the array, and an approximately accurate value will be returned. value.

LongAdder is internally composed of a base variable and a cell[] array. When there is only one writing thread and there is no competition, LongAdder will directly use the base variable as the atomic operation variable and modify the variable through CAS operation; when there are multiple writing threads competing, in addition to one writing thread occupying the base variable , each other thread will write the modified variables into its own slot cell[] array, and the final result can be calculated by the following formula:

We can find that the return value of LongAdder after the operation is only an approximately accurate value, but LongAdder ultimately returns an accurate value, so in some scenarios that require high real-time performance, LongAdder cannot replace AtomicInteger or AtomicLong.

4. Summary

In daily development, the most common scenario for using optimistic locking is database update operations. In order to ensure the atomicity of operating the database, we often define a version number for each piece of data and obtain it before updating. When it is time to update the database, we must also determine whether the obtained version number has been updated. If not, , then perform this operation.

CAS optimistic locking is relatively limited in normal use. It can only guarantee the atomicity of a single variable operation. When multiple variables are involved, CAS is powerless. However, the pessimistic locking mentioned in the first two lectures can be done by locking the entire code block. Add a lock to do this.

CAS optimistic locking In a scenario where concurrent writes are greater than reads, the atomic operations of most threads will fail, and the failed threads will continue to retry the CAS atomic operations, which will cause a large number of threads to occupy CPU resources for a long time, causing great problems to the system. Brings a lot of performance overhead. In JDK1.8, Java added a new atomic class LongAdder, which uses the space-for-time method to solve the above problems.

In Chapters 11 to 13, I explained in detail the synchronized lock implemented based on JVM, the synchronized lock implemented by AQS, and the optimistic lock implemented by CAS. I believe you are also curious about which of these three locks has the best performance. Now let's compare the performance of the locks in three different implementation methods.

Since performance comparison tests divorced from actual business scenarios are meaningless, we can conduct tests in three scenarios: "read more and write less", "read less and write more" and "read and write almost". And because the performance of the lock is also related to the intensity of competition, in addition, we will also conduct performance tests of three types of locks under different competition levels.

Based on the above conditions, I will perform a stress test on five locks in four modes: Synchronized, ReentrantLock, ReentrantReadWriteLock, StampedLock and optimistic lock LongAdder.

Here is a brief explanation: I composed four sets of tests using different numbers of read and write threads under different competition levels. The test code used to calculate the concurrent counter. The read thread will read the counter value, and the write thread will read the counter value. It will operate and change the counter value, and the running environment is a 4-core i7 processor. The results have been given, and the specific test code can be viewed and downloaded by clicking on Github .

Through the above results, we can find that: in the scenario where reading is greater than writing, the reading and writing performance of read-write locks ReentrantReadWriteLock, StampedLock and optimistic lock is the best; in the scenario where writing is greater than reading, the performance of optimistic lock is the best. The performance of the other four locks is similar; in scenarios where reading and writing are similar, the performance of the two read-write locks and the optimistic lock is better than Synchronized and ReentrantLock.

5. Thinking questions

What are the ABA issues that we should pay attention to when using CAS operations?

Guess you like

Origin blog.csdn.net/qq_34272760/article/details/132714044