ConcurrentHashMap core source code analysis

Picture title

One should only forget oneself and love others so that one can be quiet, happy and noble. -Tolstoy, Anna Karenina

0 Preface

Thread-safe Map-ConcurrentHashMap, let us study what is the difference compared with HashMap, why can we ensure thread safety.

1 Inheritance system

Picture title

Picture title
Similar to HashMap, the structure of arrays and linked lists is almost the same. Both implement the Map interface and inherit the AbstractMap abstract class. Most of the methods are also the same. ConcurrentHashMap contains almost all the methods of HashMap.

2 Properties

  • Bin array. Initialization is delayed only after the first insertion. The size is always a power of 2. Directly accessed by the iterator.
    Picture title
  • The next table to use; non-null only when expanding
    Picture title
  • Basic counter value, mainly used when there is no contention, and also used as feedback during table initialization competition. Update via CAS
    Picture title
  • If the control of table initialization and expansion is negative, the table will be initialized or expanded: -1 is used to initialize -N the number of active expansion threads. Otherwise, when table is null, keep the initial table size to be used when creating, or default It is 0. After initialization, keep the element count value of the next expansion table.
    Picture title
  • Index of the next table to be split during expansion (plus 1)
    Picture title
  • Expand and / or spin lock used when creating CounterCell (locked via CAS)
    Picture title
  • Table of counter cells. If non-null, the size is a power of 2.
    Picture title
  • Node: A data structure that holds keys, values, and key hash values, where both value and next are modified with volatile to ensure visibility
    Picture title
  • A special Node node, the hash value of the transfer node is MOVED, -1. It stores the reference of nextTable. The node inserted into the bin head during transfer. ForwardingNode will only play a role as a placeholder when the table is expanded. The symbol is placed in the table to indicate that the current node is null or has been moved,
    Picture title

3 Construction method

3.1 No parameters

  • Create a new empty map using the default initial table size (16)
    Picture title

3.2 Participation

  • Create a new empty map whose initial table size can accommodate the specified number of elements without having to dynamically adjust the size.
    Picture title
    -Create a new map with the same mapping as the given map
    Picture title

Note that sizeCtl will temporarily maintain the capacity of a power of two value.

When instantiating ConcurrentHashMap with parameters, the size of the table will be adjusted according to the parameters. Assuming the parameter is 100, it will eventually be adjusted to 256 to ensure that the size of the table is always a power of 2.

tableSizeFor

  • For a given required capacity, returns the table size in powers of 2
    Picture title

lazy initialization of table

ConcurrentHashMap only initializes the sizeCtl value in the constructor, and does not directly initialize the table, but delays the initialization of the first put operation table. But put can be executed concurrently, how to ensure that the table is initialized only once?

private final Node<K,V>[] initTable() {
    Node<K,V>[] tab; int sc;
    // 进入自旋
    while ((tab = table) == null || tab.length == 0) {
        // 若某线程发现sizeCtl<0,意味着其他线程正在初始化,当前线程让出CPU时间片
        if ((sc = sizeCtl) < 0) 
            Thread.yield(); // 失去初始化的竞争机会; 直接自旋
        else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
            try {
                // 有可能执行至此时,table 已经非空,所以做双重检验
                if ((tab = table) == null || tab.length == 0) {
                    int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                    @SuppressWarnings("unchecked")
                    Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                    table = tab = nt;
                    sc = n - (n >>> 2);
                }
            } finally {
                sizeCtl = sc;
            }
            break;
        }
    }
    return tab;
}
复制代码

The thread performing the first put operation will execute the Unsafe.compareAndSwapInt method to modify sizeCtl to -1, and only one thread can be successfully modified, while other threads can only give up CPU time slices through Thread.yield () to wait for the table initialization to complete.

4 put

The table has been initialized, and the put operation uses CAS + synchronized to implement concurrent insert or update operations.

final V putVal(K key, V value, boolean onlyIfAbsent) {
    if (key == null || value == null) throw new NullPointerException();
    // 计算hash
    int hash = spread(key.hashCode());
    int binCount = 0;
    // 自旋保证可以新增成功
    for (Node<K,V>[] tab = table;;) {
        Node<K,V> f; int n, i, fh;
        // step1. table 为 null或空时进行初始化
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        // step 2. 若当前数组索引无值,直接创建
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            // CAS 在索引 i 处创建新的节点,当索引 i 为 null 时,即能创建成功,结束循环,否则继续自旋
            if (casTabAt(tab, i, null,
                         new Node<K,V>(hash, key, value, null)))
                break;                   // no lock when adding to empty bin
        }
        // step3. 若当前桶为转移节点,表明该桶的点正在扩容,一直等待扩容完成
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        // step4. 当前索引位置有值
        else {
            V oldVal = null;
            // 锁定当前槽点,保证只会有一个线程能对槽点进行修改
            synchronized (f) {
                // 这里再次判断 i 位置数据有无被修改
                // binCount 被赋值,说明走到了修改表的过程
                if (tabAt(tab, i) == f) {
                    // 链表
                    if (fh >= 0) {
                        binCount = 1;
                        for (Node<K,V> e = f;; ++binCount) {
                            K ek;
                            // 值有的话,直接返回
                            if (e.hash == hash &&
                                ((ek = e.key) == key ||
                                 (ek != null && key.equals(ek)))) {
                                oldVal = e.val;
                                if (!onlyIfAbsent)
                                    e.val = value;
                                break;
                            }
                            Node<K,V> pred = e;
                            // 将新增的元素赋值到链表的最后,退出自旋
                            if ((e = e.next) == null) {
                                pred.next = new Node<K,V>(hash, key,
                                                          value, null);
                                break;
                            }
                        }
                    }
                    // 红黑树,这里没有使用 TreeNode,使用的是 TreeBin,TreeNode 只是红黑树的一个节点
                    // TreeBin 持有红黑树的引用,并且会对其加锁,保证其操作的线程安全
                    else if (f instanceof TreeBin) {
                        Node<K,V> p;
                        binCount = 2;
                        // 满足if的话,把老的值给oldVal
                        // 在putTreeVal方法里面,在给红黑树重新着色旋转的时候
                        // 会锁住红黑树的根节点
                        if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                       value)) != null) {
                            oldVal = p.val;
                            if (!onlyIfAbsent)
                                p.val = value;
                        }
                    }
                }
            }
            // binCount不为空,并且 oldVal 有值的情况,说明已新增成功
            if (binCount != 0) {
                // 链表是否需要转化成红黑树
                if (binCount >= TREEIFY_THRESHOLD)
                    treeifyBin(tab, i);
                if (oldVal != null)
                    return oldVal;
                // 槽点已经上锁,只有在红黑树或者链表新增失败的时候
                // 才会走到这里,这两者新增都是自旋的,几乎不会失败
                break;
            }
        }
    }
    // step5. check 容器是否需要扩容,如果需要去扩容,调用 transfer 方法扩容
    // 如果已经在扩容中了,check有无完成
    addCount(1L, binCount);
    return null;
}
复制代码

4.2 Execution process

  1. If the array is empty, then initialize, after completion, go to 2
  2. Calculate whether the current bucket has a value
    • None, CAS is created, continue to spin after failure, until successful
    • Yes, go to 3
  3. Determine whether the bucket is a transfer node (capacity expansion)
    • Yes, it has been spinning and waiting for the expansion to be completed, and then added
    • No, go to 4
  4. Bucket has value, add synchronize lock to current bucket
    • Linked list, add nodes to the end of the chain
    • Red and black tree, new method for red and black tree version
  5. After the addition is complete, check whether expansion is required

The implementation of locking the three axes through spin + CAS + synchronize is very clever and provides us with best practices for designing concurrent code!

5 transfer-expansion

At the end of the put method to check whether expansion is required, enter the transfer method from the addCount method of the put method.

The main thing is to create a new empty array, and then move and copy each element to the new array.

private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
    // 旧数组的长度
    int n = tab.length, stride;
    if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
        stride = MIN_TRANSFER_STRIDE; // subdivide range
    // 如果新数组为空,初始化,大小为原数组的两倍,n << 1
    if (nextTab == null) {            // initiating
        try {
            @SuppressWarnings("unchecked")
            Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
            nextTab = nt;
        } catch (Throwable ex) {      // try to cope with OOME
            sizeCtl = Integer.MAX_VALUE;
            return;
        }
        nextTable = nextTab;
        transferIndex = n;
    }
    // 新数组长度
    int nextn = nextTab.length;
    // 若原数组上是转移节点,说明该节点正在被扩容
    ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
    boolean advance = true;
    boolean finishing = false; // to ensure sweep before committing nextTab
    // 自旋,i 值会从原数组的最大值递减到 0
    for (int i = 0, bound = 0;;) {
        Node<K,V> f; int fh;
        while (advance) {
            int nextIndex, nextBound;
            // 结束循环的标志
            if (--i >= bound || finishing)
                advance = false;
            // 已经拷贝完成
            else if ((nextIndex = transferIndex) <= 0) {
                i = -1;
                advance = false;
            }
            // 每次减少 i 的值
            else if (U.compareAndSwapInt
                     (this, TRANSFERINDEX, nextIndex,
                      nextBound = (nextIndex > stride ?
                                   nextIndex - stride : 0))) {
                bound = nextBound;
                i = nextIndex - 1;
                advance = false;
            }
        }
        // if 任意条件满足说明拷贝结束了
        if (i < 0 || i >= n || i + n >= nextn) {
            int sc;
            // 拷贝结束,直接赋值,因为每次拷贝完一个节点,都在原数组上放转移节点,所以拷贝完成的节点的数据一定不会再发生变化
            // 原数组发现是转移节点,是不会操作的,会一直等待转移节点消失之后在进行操作
            // 也就是说数组节点一旦被标记为转移节点,是不会再发生任何变动的,所以不会有任何线程安全的问题
            // 所以此处直接赋值,没有任何问题。
            if (finishing) {
                nextTable = null;
                table = nextTab;
                sizeCtl = (n << 1) - (n >>> 1);
                return;
            }
            if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                    return;
                finishing = advance = true;
                i = n; // recheck before commit
            }
        }
        else if ((f = tabAt(tab, i)) == null)
            advance = casTabAt(tab, i, null, fwd);
        else if ((fh = f.hash) == MOVED)
            advance = true; // already processed
        else {
            synchronized (f) {
                // 节点的拷贝
                if (tabAt(tab, i) == f) {
                    Node<K,V> ln, hn;
                    if (fh >= 0) {
                        int runBit = fh & n;
                        Node<K,V> lastRun = f;
                        for (Node<K,V> p = f.next; p != null; p = p.next) {
                            int b = p.hash & n;
                            if (b != runBit) {
                                runBit = b;
                                lastRun = p;
                            }
                        }
                        if (runBit == 0) {
                            ln = lastRun;
                            hn = null;
                        }
                        else {
                            hn = lastRun;
                            ln = null;
                        }
                        // 如果节点只有单个数据,直接拷贝,如果是链表,循环多次组成链表拷贝
                        for (Node<K,V> p = f; p != lastRun; p = p.next) {
                            int ph = p.hash; K pk = p.key; V pv = p.val;
                            if ((ph & n) == 0)
                                ln = new Node<K,V>(ph, pk, pv, ln);
                            else
                                hn = new Node<K,V>(ph, pk, pv, hn);
                        }
                        // 在新数组位置上放置拷贝的值
                        setTabAt(nextTab, i, ln);
                        setTabAt(nextTab, i + n, hn);
                        // 在老数组位置上放上 ForwardingNode 节点
                        // put 时,发现是 ForwardingNode 节点,就不会再动这个节点的数据了
                        setTabAt(tab, i, fwd);
                        advance = true;
                    }
                    // 红黑树的拷贝
                    else if (f instanceof TreeBin) {
                        // 红黑树的拷贝工作,同 HashMap 的内容,代码忽略
                        ...
                        // 在老数组位置上放上 ForwardingNode 节点
                        setTabAt(tab, i, fwd);
                        advance = true;
                    }
                }
            }
        }
    }
}
复制代码

Implementation process

  1. First copy all the values ​​of the original array to the new array after expansion, first copy from the end of the array
  2. When copying the slots of an array, first lock the slots of the original array. When copying to a new array successfully, assign the slots of the original array to the transfer node
  3. At this time, if there is new data that needs to be put to the slot, it is found that the slot is a transfer node, and it will always wait, so the data corresponding to the slot will not change until the expansion is completed.
  4. Copy from the end of the array to the head. Each time the copy is successful, the nodes in the original array are set as transfer nodes until all the array data is copied to the new array. The entire array is directly assigned to the array container, and the copy is complete.

6 Summary

ConcurrentHashMap, as a concurrent map, is a necessary point for interviews and a concurrent container that must be mastered in the job.

Guess you like

Origin juejin.im/post/5e934e215188256bdf72b691