ConcurrentHashMap source code article: LongAdder principle analysis

Preface

Recently, I was studying the source code of ConcurrentHashMap and found that it uses a more unique way to count the number of elements in the map. Naturally, it is necessary to study its principles and ideas, and at the same time to better understand ConcurrentHashMap itself.

The main idea of ​​this article is divided into the following 4 parts

1. The effect of counting

2. An intuitive illustration of the principle

3. Detailed analysis of the source code

4. Comparison with AtomicInteger

5. The abstraction of thought

The entrance to learning is naturally the put method of map

public V put(K key, V value) {
    return putVal(key, value, false);
}

View putVal method

There is not much discussion about the principle of ConcurrentHashMap itself, so we jump directly to the counting part

final V putVal(K key, V value, boolean onlyIfAbsent) {
    ...
    addCount(1L, binCount);
    return null;
}

Whenever an element is successfully added, the addCount method will be called to accumulate the number by 1, which is the goal of our research

Because the original intention of ConcurrentHashMap is to solve the map operation in multi-threaded concurrent scenarios, it is natural to consider thread safety when adding values.

Of course, multi-threaded value accumulation is generally the first lesson in learning concurrent programming. It is not very complicated. You can use AtomicInteger or locks to solve this problem.

However, if we look at the method, we will find that an accumulation method that should be relatively simple, but its logic looks quite complicated

Here I only posted the core part of the accumulation algorithm

private final void addCount(long x, int check) {
    CounterCell[] as; long b, s;
    if ((as = counterCells) != null ||
            !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
        CounterCell a; long v; int m;
        boolean uncontended = true;
        if (as == null || (m = as.length - 1) < 0 ||
                (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
                !(uncontended =
                        U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
            fullAddCount(x, uncontended);
            return;
        }
        if (check <= 1)
            return;
        s = sumCount();
    }
    ...
}

Let's study the idea of ​​implementing this logic. This idea actually copied the logic of the LongAdder class, so we directly look at the original class of the algorithm

1. Use of LongAdder class

Let's first look at the effect of LongAdder

LongAdder adder = new LongAdder();
int num = 0;

@Test
public void test5() throws InterruptedException {
    Thread[] threads = new Thread[10];
    for (int i = 0; i < 10; i++) {
        threads[i] = new Thread(() -> {
            for (int j = 0; j < 10000; j++) {
                adder.add(1);
                num += 1;
            }
        });
        threads[i].start();
    }
    for (int i = 0; i < 10; i++) {
        threads[i].join();
    }
    System.out.println("adder:" + adder);
    System.out.println("num:" + num);
}

Output result

adder:100000
num:40982

It can be seen that the adder can ensure the cumulative thread safety in terms of the effect of use.

2. Intuitive understanding of LongAdder principle

In order to better analyze the source code, we need to understand its principle intuitively, otherwise, if you look at the code directly, you will be confused.

The count of LongAdder is mainly divided into 2 objects

A field of type long: base

An array of Cell objects, the Cell object maintains a long field value for counting

/**
 * Table of cells. When non-null, size is a power of 2.
 */
transient volatile Cell[] cells;

/**
 * Base value, used mainly when there is no contention, but also as
 * a fallback during table initialization races. Updated via CAS.
 */
transient volatile long base;

Concurrent programming-a better solution for multi-thread counting: LongAdder principle analysis

 

When there is no thread competition, accumulation will occur on the base field, which is equivalent to a single-threaded accumulation twice, but the accumulation of base is a cas operation

Concurrent programming-a better solution for multi-thread counting: LongAdder principle analysis

 

When thread competition occurs, there must be a thread that fails the cas accumulation operation of base, so it first determines whether the Cell has been initialized, if not, initializes an array of length 2 and finds the corresponding according to the hash value of the thread Array index, and accumulate the value value in the indexed Cell object (this accumulation is also an operation of cas)

Concurrent programming-a better solution for multi-thread counting: LongAdder principle analysis

 

If there are a total of 3 threads competing, then the first thread successfully accumulates the cas of the base, and the remaining 2 threads need to accumulate the elements in the Cell array. Because the accumulation of the value value in the Cell is also a cas operation, if the array index corresponding to the hash value of the second thread and the third thread is the same, then competition will also occur. If the second thread succeeds, the first The three threads will rehash their own hash value. If the new hash value obtained corresponds to another array subscript whose element is null, then a new Cell object is added and the value value is accumulated

Concurrent programming-a better solution for multi-thread counting: LongAdder principle analysis

 

If thread 4 participates in the competition at the same time, then for thread 4, cas may fail in the competition with thread 3 even after rehashing. At this time, if the current array capacity is less than the number of CPUs available in the system, then it The array will be expanded, and then rehash again, repeatedly trying to accumulate a subscript object in the Cell array

Concurrent programming-a better solution for multi-thread counting: LongAdder principle analysis

 

The above is the overall intuitive understanding, but there are still many details in the code that are worth learning, so we start to enter the link of source code analysis

3. Source code analysis

The entry method is add

public void add(long x) {
    Cell[] as; long b, v; int m; Cell a;
    /**
     * 这里优先判断了cell数组是否为空,之后才判断base字段的cas累加
     * 意味着如果线程不发生竞争,cell数组一直为空,那么所有的累加操作都会累加到base上
     * 而一旦发生过一次竞争导致cell数组不为空,那么所有的累加操作都会优先作用于数组中的对象上
     */
    if ((as = cells) != null || !casBase(b = base, b + x)) {
        /**
         * 这个字段是用来标识在对cell数组中的对象进行累加操作时是否发生了竞争
         * 如果发生了竞争,那么在longAccumulate方法中会多进行一次rehash的自旋
         * 这个在后面的方法中详细说明,这里先有个印象
         * true表示未发生竞争
         */
        boolean uncontended = true;
        /**
         * 如果cell数组为空或者长度为0则直接进入主逻辑方法
         */
        if (as == null || (m = as.length - 1) < 0 ||
                /**
                 * 这里的getProbe()方法可以认为就是获取线程的hash值
                 * hash值与(数组长度-1)进行位与操作后得到对应的数组下标
                 * 判断该元素是否为空,如果不为空那么就会尝试累加
                 * 否则进入主逻辑方法
                 */
                (a = as[getProbe() & m]) == null ||
                /**
                 * 对数组下标的元素进行cas累加,如果成功了,那么就可以直接返回
                 * 否则进入主逻辑方法
                 */
                !(uncontended = a.cas(v = a.value, v + x)))
            longAccumulate(x, null, uncontended);
    }
}

When there is no thread competition, the accumulation operation will be taken care of by the casBase in the first if, corresponding to the situation in the previous illustration.

When thread competition occurs, the accumulating operation will be taken care of by the cell array, corresponding to the case 2 illustrated earlier (the array is initialized in the longAccumulate method)

Then we look at the main logic method, because the method is relatively long, so I will analyze it section by section

longAccumulate method

Parameters in the signature

x represents the value to be accumulated

fn indicates how to accumulate, generally pass null, it is not important

wasUncontended indicates whether the outer method has encountered a competition failure, because the judgment logic of the outer layer is multiple "or" (as == null || (m = as.length-1) <0 || (a = as[ getProbe() & m]) == null) , so if the array is empty or the corresponding subscript element has not been initialized, this field will remain false

final void longAccumulate(long x, LongBinaryOperator fn,
                          boolean wasUncontended) {
  ...
}

First determine whether the hash value of the thread is 0, if it is 0, you need to do an initialization, namely rehash

Afterwards, wasUncontended will be set to true, because even if it has been conflicted before, after rehash it will first assume that it can find an array subscript with non-conflicting elements

int h;//线程的hash值,在后面的逻辑中会用到
if ((h = getProbe()) == 0) {
    ThreadLocalRandom.current(); // force initialization
    h = getProbe();
    wasUncontended = true;
}

Then there is an endless loop. There are 3 big if branches in the endless loop. The logic of these 3 branches acts when the array is not initialized . Once the array is initialized, it will all enter the main logic, so I will put the main logic here Extract them and put them separately later, which can also avoid the influence of outer branches on ideas

/**
 * 用来标记某个线程在上一次循环中找到的数组下标是否已经有Cell对象了
 * 如果为true,则表示数组下标为空
 * 在主逻辑的循环中会用到
 */
boolean collide = false;
/**
 * 死循环,提供自旋操作
 */
for (; ; ) {
    Cell[] as;
    Cell a;
    int n;//cell数组长度
    long v;//需要被累积的值
    /**
     * 如果cells数组不为空,且已经被某个线程初始化成功,那么就会进入主逻辑,这个后面详细解释
     */
    if ((as = cells) != null && (n = as.length) > 0) {
        ...
        /**
         * 如果数组为空,那么就需要初始化一个Cell数组
         * cellsBusy用来标记cells数组是否能被操作,作用相当于一个锁
         * cells == as 判断是否有其他线程在当前线程进入这个判断之前已经初始化了一个数组
         * casCellsBusy 用一个cas操作给cellsBusy字段赋值为1,如果成功可以认为拿到了操作cells数组的锁
         */
    } else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
        /**
         * 这里就是初始化一个数组,不解释了
         */
        boolean init = false;
        try {                           
            if (cells == as) {
                Cell[] rs = new Cell[2];
                rs[h & 1] = new Cell(x);
                cells = rs;
                init = true;
            }
        } finally {
            cellsBusy = 0;
        }
        if (init)
            break;
        /**
         * 如果当前数组是空的,又没有竞争过其他线程
         * 那么就再次尝试去给base赋值
         * 如果又没竞争过(感觉有点可怜),那么就自旋
         * 另外提一下方法签名中的LongBinaryOperator对象就是用在这里的,不影响逻辑
         */
    } else if (casBase(v = base, ((fn == null) ? v + x :
            fn.applyAsLong(v, x))))
        break;                          // Fall back on using base
}

Then look at the main logic of accumulating the elements of the cell array

/**
 * 如果cells数组不为空,且已经被某个线程初始化成功,进入主逻辑
 */
if ((as = cells) != null && (n = as.length) > 0) {
    /**
     * 如果当前线程的hash值对应的数组元素为空
     */
    if ((a = as[(n - 1) & h]) == null) {
        /**
         * Cell数组并未被其他线程操作
         */
        if (cellsBusy == 0) {
            /**
             * 这里没有理解作者为什么会在这里初始化单个Cell
             * 作者这里的注释是Optimistically create,如果有理解的同学可以说一下
             */
            Cell r = new Cell(x);
            /**
             * 在此判断cell锁的状态,并尝试加锁
             */
            if (cellsBusy == 0 && casCellsBusy()) {
                boolean created = false;
                try {
                    /**
                     * 这里对数组是否为空等状态再次进行校验
                     * 如果校验通过,那么就将之前new的Cell对象放到Cell数组的该下标处
                     */
                    Cell[] rs;
                    int m, j;
                    if ((rs = cells) != null &&
                            (m = rs.length) > 0 &&
                            rs[j = (m - 1) & h] == null) {
                        rs[j] = r;
                        created = true;
                    }
                } finally {
                    cellsBusy = 0;
                }
                /**
                 * 如果创建成功,就说明累加成功,直接退出循环
                 */
                if (created)
                    break;
                /**
                 * 走到这里说明在判空和拿到锁之间正好有其他线程在该下标处创建了一个Cell
                 * 因此直接continue,不rehash,下次就不会进入到该分支了
                 */
                continue;
            }
        }
        /**
         * 当执行到这里的时候,因为是在 if ((a = as[(n - 1) & h]) == null) 这个判断逻辑中
         * 就说明在第一个if判断的时候该下标处没有元素,所以赋值为false
         * collide的意义是:上一次循环中找到的数组下标是否已经有Cell对象了
         * True if last slot nonempty
         */
        collide = false;
    /**
     * 这个字段如果为false,说明之前已经和其他线程发过了竞争
     * 即使此时可以直接取尝试cas操作,但是在高并发场景下
     * 这2个线程之后依然可能发生竞争,而每次竞争都需要自旋的话会很浪费cpu资源
     * 因此在这里先直接增加自旋一次,在for的最后会做一次rehash
     * 使得线程尽快地找到自己独占的数组下标
     */
    } else if (!wasUncontended) 
        wasUncontended = true;
    /**
     * 尝试给hash对应的Cell累加,如果这一步成功了,那么就返回
     * 如果这一步依然失败了,说明此时整体的并发竞争非常激烈
     * 那就可能需要考虑扩容数组了
     * (因为数组初始化容量为2,如果此时有10个线程在并发运行,那就很难避免竞争的发生了)
     */
    else if (a.cas(v = a.value, ((fn == null) ? v + x :
            fn.applyAsLong(v, x))))
        break;
    /**
     * 这里判断下cpu的核数,因为即使有100个线程
     * 能同时并行运行的线程数等于cpu数
     * 因此如果数组的长度已经大于cpu数目了,那就不应当再扩容了
     */
    else if (n >= NCPU || cells != as)
        collide = false;
    /**
     * 走到这里,说明当前循环中根据线程hash值找到的数组下标已经有元素了
     * 如果此时collide为false,说明上一次循环中找到的下边是没有元素的
     * 那么就自旋一次并rehash
     * 如果再次运行到这里,并且collide为true,就说明明竞争非常激烈,应当扩容了
     */
    else if (!collide)
        collide = true;
    /**
     * 能运行到这里,说明需要扩容数组了
     * 判断锁状态并尝试获取锁
     */
    else if (cellsBusy == 0 && casCellsBusy()) {
        /**
         * 扩容数组的逻辑,这个扩容比较简单,就不解释了
         * 扩容大小为2倍
         */
        try {
            if (cells == as) { 
                Cell[] rs = new Cell[n << 1];
                for (int i = 0; i < n; ++i)
                    rs[i] = as[i];
                cells = rs;
            }
        } finally {
            cellsBusy = 0;
        }
        collide = false;
        /**
        * 这里直接continue,因为扩容过了,就先不rehash了
        */
        continue;               
    }
    /**
     * 做一个rehash,使得线程在下一个循环中可能找到独占的数组下标
     */
    h = advanceProbe(h);
}

At this point, the source code of LongAdder is actually over. In fact, there is not much code, but his ideas are worth learning.

4. Comparison with AtomicInteger

In fact, the source code of light analysis is still a little worse. We have not yet understood why the author should design such a very complicated class when there is already AtomicInteger.

So first, let's analyze the principle of AtomicInteger to ensure thread safety

View the most basic getAndIncrement method

public final int getAndIncrement() {
    return unsafe.getAndAddInt(this, valueOffset, 1);
}

Called the getAndAddInt method of the Unsafe class, continue to look down

public final int getAndAddInt(Object var1, long var2, int var4) {
    int var5;
    do {
        var5 = this.getIntVolatile(var1, var2);
    } while(!this.compareAndSwapInt(var1, var2, var5, var5 + var4));

    return var5;
}

Here we no longer delve into the specific implementation of getIntVolatile and compareAndSwapInt methods, because they are already native methods

It can be seen that the bottom layer of AtomicInteger uses cas+spin to solve the atomicity problem, that is, if an assignment is unsuccessful, then spin until the assignment is successful.

Then it can be inferred that when a large number of threads are concurrent and the competition is very fierce, AtomicInteger may cause some threads to continue to compete and fail, and continue to spin, thereby affecting the throughput of tasks.

In order to solve the spin problem under high concurrency, the author of LongAdder added an array to change the competing object from one value to multiple values ​​during the design, thereby reducing the frequency of competition, thereby alleviating self The problem of spin, of course, is the additional storage space.

Finally, I did a simple test to compare the time-consuming of the two counting methods

It can be known from the principle that the advantage of LongAdder will be more obvious only when the thread competition is very fierce, so here I used 100 threads, each thread accumulates the same number 1,000,000 times, and the results are as follows, the gap is very huge, reaching 15 times!

LongAdder time-consuming: 104292242nanos

AtomicInteger time-consuming: 1583294474nanos

Of course, this is just a simple test, which contains a lot of randomness. Interested students can try multiple tests with different levels of competition.

5. The abstraction of thought

Finally, we need to abstract the author’s specific code and implementation logic to clarify the thinking process

1) The problem encountered by AtomicInteger: the competition of a single resource leads to the occurrence of spin

2) The solution idea: expand the competition of a single object to the competition of multiple objects (there are some divide and conquer ideas)

3) Controllability of expansion: Multiple competitors need to pay extra storage space, so they cannot be expanded without thinking (in extreme cases, one thread counts one object, which is obviously unreasonable)

4) Stratification of the problem: Because the scene when using classes is uncontrollable, it is necessary to dynamically expand additional storage space according to the intensity of concurrency (similar to the expansion of synchronized)

5) 3 hierarchical strategies: when there is no competition, use one value to accumulate; when a certain degree of competition occurs, create an array with a capacity of 2 to expand the competing resources to 3; when the competition is more When intense, continue to expand the array (corresponding to the process from 1 thread to 4 threads in the diagram)

6) Strategy details: Add rehash during spin. At this time, although a certain amount of computing time is spent calculating hashes, comparing array objects, etc., this will enable concurrent threads to find their own objects as soon as possible, and not later There will be any competition again (sharing the knife and not cutting wood by mistake, pay special attention to the corresponding solution in the wasUncontended field)

If you think this article is helpful to you, you can forward it and follow it for support

Guess you like

Origin blog.csdn.net/weixin_48182198/article/details/109332883