Principles of CAS, AtomicInteger and LongAdder

Table of contents

1. CAS

1 Introduction

2. CAS and volatile

3. Why lock-free is efficient

4. Summary

2. Atomic integers

3. Atomic references

1 Introduction

2. ABA problem

3、AtomicStampedReference

4、AtomicStampedReference

4. Atomic Accumulator

1 Introduction

2. LongAdder important key fields

CAS lock

Principle of false sharing

3. LongAdder source code

Add

long Accumulate method

5. Unsafe


1. CAS

1 Introduction

Let’s take a look at this code. We have created a new AtomicInteger to achieve thread safety. When updating, we first get the old value, then modify it, and then call the compareAndSet method to update. If it succeeds, it will return, and if it fails, it will continue to retry in a loop .

The key is this compareAndSet. This operation is to detect whether the modification before and after the modification is atomic. If it is, it succeeds. The abbreviation is CAS

The approximate slow process is like this. If thread 1 uses cas to operate, if there is a thread in the middle of the thread modification process to change the data, it will fail and try again.

Note: The bottom layer of cas is the atomicity guaranteed by the lock cmpxchg instruction (x86 architecture) under both single-core cpu and multi-core cpu

(In the multi-core state, when a core executes an instruction with a lock, the CPU will lock the bus. When the core executes the instruction, it will open the bus again. This process will not be interrupted by the scheduling mechanism of other threads. , which ensures the accuracy of memory operations by multiple threads and is atomic)

2. CAS and volatile

We clicked on the AtomicInteger class and found that its value attribute has a volatile flag added to ensure the visibility of the variable under multi-threading. Because you must get the latest one to compare with cas every time, if you get the old one, it will definitely succeed directly, so cas must cooperate with volatile to play a role

3. Why lock-free is efficient

In the case of no lock, even if the retry fails, the thread is always running at high speed without stopping, and syn will make the thread context switch when the lock is not acquired (the thread state changes from running to blocking, and the cpu context switching will cost more , because he wants to save the thread information and restore it when he wakes up again)

But in the case of no lock, because the thread needs to be guaranteed to run, it needs additional cpu support. The cpu is like a high-speed runway here. Without an extra runway, the thread cannot run. Although it will not be blocked, because there is no time slice, Will it still go into a runnable state, or will it cause a context switch.

4. Summary

Combining cas and volatile can achieve lock-free concurrency, which can be used in scenarios with fewer threads and multi-core cpu

cas is based on the idea of ​​optimistic locking: the most optimistic estimate, not afraid of other threads to modify shared variables, and retry after changing

Synchronized is based on the idea of ​​pessimistic locking: it is necessary to prevent other threads from modifying shared variables and lock them before operation

CAS embodies lock-free concurrency and non-blocking concurrency. Because no syn is used, multi-threading will not be blocked. This is one of the factors that improve efficiency, but if competition is fierce, it will affect efficiency.

2. Atomic integers

AtomicInteger is an example, the principles of AtomicBoolean and AtomicLong are similar to him

As mentioned earlier, the bottom layer is to use volatile modified value to ensure visibility, and use cas to ensure thread safety

(volatile can guarantee the order, and it is the same as the principle of the singleton. If the cpu instructions are rearranged, there may be problems when reading, but adding a volatile write barrier will not cause such line assignment problems)

There is an incrementAndGet method in it which means ++i, and the getAndIncrement method is i++

To manually implement a thread-safe computation:

(The IntUaryOperator is an interface with only one method, a functional interface, which can be written with lamda expressions, so that a strategy pattern is used to realize what operation to do, and just pass a method of the implementation class in)

public static void updateAndGet(AtomicInteger i, IntUnaryOperator operator){
    while(ture){
        int prev = i.get();
        int next = operator.applyAsInt(prev);
        if(i.compareAndSet(prev, next)){
            break;
        }
    }
}

3. Atomic references

1 Introduction

AtomicReference, the type we want to protect is not necessarily a basic type. If you want to protect a decimal type like BigDecimal, you need to use atomic references to ensure thread safety.

2. ABA problem

In the cas process, it is only to judge whether the front and rear values ​​are the same, and the same is successful, but in this process, other threads change and change back, he can still succeed, this is the aba problem, in fact, the variable has been modified But that thread is not aware of it, and it will not affect the business in most scenarios

3、AtomicStampedReference

If we want to make the thread aware of whether the cas has been modified, we need to use AtomicStampedReference

The bottom layer is to add a version number on the basis of AtomicReference, and the version number will increase every time it is modified

4、AtomicStampedReference

With the AtomicStampedReference version number mechanism, we can know how many times it has been modified in the middle, but we don't need to know how many times it has been modified in the middle, we just want to know whether it has been modified

In fact, it is to use boolean to mark whether it has been changed. It is true at the beginning, as long as it is modified, it will become false. If it is false, it will not succeed. After it succeeds, it will also change it to false.

4. Atomic Accumulator

1 Introduction

After JAVA 8, in order to speed up the self-increment efficiency of atomic integers, the specially designed auto-increment class is LongAdder. It is the work of concurrency master Doug Lea, and the design is very delicate

The principle of performance improvement: because each time a unit is updated, cas will continue to retry when there is competition, which may affect efficiency, so he set up multiple units, thread 1 is changed to 1 unit cell, and thread 2 is changed to 2 units, so Reduce the number of cas retries to improve performance, but it will not exceed the number of cores of the cpu, because it is meaningless

2. LongAdder important key fields

There are several key designs in this self-incrementing class LongAdder

CAS lock

The cas lock is to use an atomicInteger to modify it. If it is 0, use cas to modify it to 1. If the modification is successful, it means that the lock is successful. When the lock is released, it is changed to 0. Because only the lock thread is released, there is no need to lock it. This kind of cas lock should not be written in normal projects, because it may cause problems. Threads that do not get the lock will keep retrying and occupy CPU resources.

In fact, the cellsBusy in the source code is similar to the above cas lock , which is used as a lock mark to ensure thread safety in certain situations. We will use it when creating or expanding Cell[] .

Principle of false sharing

Where Cell is the accumulation unit

The Cell class has a value attribute to record the number of increments, and then the constructor assigns it a value, and then there is a cas method to do the self-increment, but we can see that there is an annotation Contended on the class, which is to prevent cache line false shared

What is a cache line?

In fact, there are many layers of cache. The closer the cache is, the faster it is. The speed of the first-level slave is dozens of times faster than that of the memory. The cache is a cache line unit. Each cache line corresponds to a piece of memory, generally 64bytes (8 long)

Although caching can improve efficiency, it may cause data copies . The same data will be cached in the cache lines of different cores. The CPU must ensure data consistency. If a CPU core changes the data, the entire cache corresponding to other CPU cores The line must be invalidated , which may affect the efficiency, that is, all the cache lines will be

Take a chestnut:

As shown in the figure above, our cell array is stored continuously in the memory, and a cell is 24 bytes, so the cache line can store 2 cell objects, so the problem arises, the core 1 needs to be changed to cell[0], Core 1 needs to change cell[1], no matter who succeeds, it will invalidate the cache line of the other core, because they are in one cache line, and it will become invalid if it is modified by others, so it needs to go to the memory to read again

The @sum.misc.Contended annotation is used to solve this problem. Its principle is to add 128 bytes of padding before and after the object or field using this annotation, so that the CPU occupies different cache lines when pre-reading the object into the cache. , so that it will not cause invalidation of the other party's cache line

Why 128?

GPT: In JDK 8, @Contendedannotations are implemented by adding a certain number of padding (Padding) bytes before and after the variable marked by the annotation. These padding bytes separate the annotated variable from other variables, preventing multiple threads from accessing different variables of the same cache line at the same time. The length of padding bytes is usually an integer power of 2, because the length of a cache line is usually an integer power of 2. On most modern processors, the cache line length is usually 64 bytes or 128 bytes. Therefore, @Contendedthe length of the annotation added to the cache line is usually an integer multiple of the cache line length, which can ensure that there are enough padding bytes between the variable marked by the annotation and other variables, thereby avoiding the false sharing problem. In JDK 8, @Contendedthe default padding bytes for annotations is 128 bytes, since that is the length of a cache line on most modern processors.

3. LongAdder source code

Add

He will first judge whether the cells array is empty. The cells array is created lazily. It is null when there is no competition at the beginning. When competition occurs, it will try to create the array and the accumulation unit cell.

If it is judged to be empty, it means that there is no competition. Go directly to the basic data base to accumulate. If the accumulation is successful, it will return. If it is not successful, it will enter the longAccumlate method to create cells and cells.

If it is judged that the cells are not empty, check whether the current thread creates a cell. If it is created, cas will accumulate the cells. If the accumulation fails or the cell is not created, longAccumulate

long Accumulate method

This method will be entered when the thread base accumulation fails or the cell accumulation of the current thread fails or no cells are created.

create cells

When cells are not created, he will go to create cells

If the cellsBusy flag is 0 (cas lock flag, used to ensure the safety of creating an array), cells==as, it means that it has not been created by other threads, and there is a condition that only when the cas lock is successful can the cells be created successfully and Initialize a cell (create an array with a size of 2 at the beginning and default a cell unit, and then only create a cell for the current thread , which is to be combined with 1 & randomly assigned to 0 or 1 position, only initialize one cell, after lazy loading Used to reinitialize the cell)

If the lock fails, the cas will be accumulated on the base, if it succeeds, it will return, and if it fails, it will go back to Shun Xun and try again

create cells

The creation of the array will only create the accumulation unit cell for the current thread. If other threads see that there are array cells, but there are no cells, they will be created.

It will also create the cas lock of cellsBusy first. If it is 0, it can be locked, create a cell object, and then judge whether the cas lock is successful. It will also check whether the array is empty and whether the cell has been created. If there is no problem, just The object is assigned to the empty slot, and succeeds. If there is a judgment failure in the middle, recycle and try again

cas accumulation cell

First judge that both cells and cells exist. If successful, cas will accumulate the cells and return successfully. If it fails, check whether the array length is greater than the online of the cpu. If it is greater than that, it will not expand. Just now there is no way to expand the capacity when it is larger than the cpu. At this time, I will try to change a cell for him and recycle to see if the accumulation can be successful. If the direct cpu is smaller than and the cas lock is obtained, directly expand the capacity.

Expansion is to create a new array with the original length <<1 (twice the size), and then copy the contents of the old array to the new array and replace it. Finally, if the expansion is successful, it will recycle. This cycle may create a new one. cell object to increment

sum method

The final statistical operation of so many accumulation units is to use this sum method. In fact, it is to directly cycle through this number. If it is not empty, it will keep accumulating, and finally return

5. Unsafe

The Unsafa object provides a very low-level method for manipulating memory and threads , which cannot be called directly, but can only be obtained through reflection

It is a class under the sum.misc package. Final cannot be inherited. It has a private static final single-column variable, so it can only be activated through reflection. Because it is relatively low-level, it is not recommended for programmers to use it. It is called unsafe

AtomicInteger's incrementAndGet (++i) uses the getAndAddInt method of this unsafe object

Use the objectFeildOffset method of unsafe to get its offset in memory, and then you can directly operate the memory (use the compareAndSwap method of cas to pass in the object and offset and the value before and after modification)

Guess you like

Origin blog.csdn.net/weixin_54232666/article/details/131277341