ConcurrentHashMap of JAVA source code

This article has participated in the "Newcomer Creation Ceremony" event to start the road of gold creation together.

Like HashMap, the data structure of ConcurrentHashMap in version 1.7 and 1.8 is also different.

The difference between 1.7 and 1.8

1.7

ConcurrentHashMap in JDK1.7 is composed of Segment array structure and HashEntry array structure, that is, ConcurrentHashMap divides the hash bucket array into small arrays (Segment), each small array is composed of n HashEntry.

As shown in the figure below, first divide the data into sections of storage, and then assign a lock to each section of data. When a thread occupies the lock to access one section of data, other sections of data can also be accessed by other threads. of concurrent access.

image-20220314141742909-16472386654097.pngSegment inherits ReentrantLock, so Segment is a kind of reentrant lock and plays the role of lock. Segment defaults to 16, that is, the concurrency is 16.

1.8

In terms of data structure, ConcurrentHashMap in JDK1.8 selects the same Node array + linked list + red- black tree structure as HashMap; in terms of lock implementation, the original Segment lock is abandoned, and CAS + synchronized is used to achieve a more detailed implementation. Granular locking.

The lock level is controlled at a more fine-grained hash bucket array element level, which means that only the head node of the linked list (the root node of the red- black tree ) needs to be locked, and it will not affect other hash bucket array elements. Read and write, greatly improving the concurrency.

image-20220314142026203.png

Why use the built-in lock synchronized to replace the reentrant lock ReentrantLock in JDK1.8?

  1. In JDK1.6, a lot of optimizations have been introduced to the implementation of synchronized lock, and synchronized has multiple lock states, which will be converted step by step from no lock -> biased lock -> lightweight lock -> heavyweight lock.
  2. Reduce memory overhead. Assuming that reentrant locks are used for synchronization support, each node needs to inherit AQS for synchronization support. But not every node needs synchronization support, only the head node of the linked list (the root node of the red-black tree) needs to be synchronized, which undoubtedly brings a huge waste of memory.

Basic instructions

constant definition

//最大容量
private static final int MAXIMUM_CAPACITY = 1 << 30;
//默认容量
private static final int DEFAULT_CAPACITY = 16;
//最大的数组长度,toArray方法需要
static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
//负载因子
private static final float LOAD_FACTOR = 0.75f;
//链表树化的阈值
static final int TREEIFY_THRESHOLD = 8;
//红黑树变成链表的阈值
static final int UNTREEIFY_THRESHOLD = 6;
//链表需要树化的最小容量要求
static final int MIN_TREEIFY_CAPACITY = 64;
//在进行扩容时单个线程处理的最小步长。
private static final int MIN_TRANSFER_STRIDE = 16;
//sizeCtl 中用于生成标记的位数。对于 32 位数组,必须至少为 6。
private static int RESIZE_STAMP_BITS = 16;
//可以帮助调整大小的最大线程数。必须适合 32 - RESIZE_STAMP_BITS 位。
private static final int MAX_RESIZERS = (1 << (32 - RESIZE_STAMP_BITS)) - 1;
//在 sizeCtl 中记录大小标记的位移。
private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;
//哈希表中的节点状态,会在节点的hash值中体现
static final int MOVED     = -1; // hash for forwarding nodes
static final int TREEBIN   = -2; // hash for roots of trees
static final int RESERVED  = -3; // hash for transient reservations
static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash

//基本计数器值(拿来统计哈希表中元素个数的),主要在没有争用时使用,但也可作为表初始化竞赛期间的后备。通过 CAS 更新。
private transient volatile long baseCount;
//表初始化和调整大小控制。如果为负数,则表正在初始化或调整大小:-1 表示初始化,否则 -(1 + 活动调整大小线程的数量)。否则,当 table 为 null 时,保存要在创建时使用的初始表大小,或者默认为 0。初始化后,保存下一个元素计数值,根据该值调整表的大小。
private transient volatile int sizeCtl;
//调整大小时要拆分的下一个表索引(加一个)。
private transient volatile int transferIndex;
//调整大小和/或创建 CounterCell 时使用自旋锁(通过 CAS 锁定)。
private transient volatile int cellsBusy;
//计数单元表。当非空时,大小是 2 的幂。   与baseCount一起记录哈希表中的元素个数。
private transient volatile CounterCell[] counterCells;
复制代码

spread

/**
将散列的较高位传播(XOR)到较低位,并将最高位强制为 0。由于该表使用二次幂掩码,因此仅在当前掩码之上的位中变化的散列集总是会发生冲突。 (已知的例子是在小表中保存连续整数的 Float 键集。)因此,我们应用了一种变换,将高位的影响向下传播。在位扩展的速度、实用性和质量之间存在折衷。因为许多常见的散列集已经合理分布(所以不要从传播中受益),并且因为我们使用树来处理 bin 中的大量冲突,我们只是以最便宜的方式对一些移位的位进行异或,以减少系统损失,以及合并最高位的影响,否则由于表边界,这些最高位将永远不会用于索引计算。
*/
static final int spread(int h) {
    return (h ^ (h >>> 16)) & HASH_BITS;
}
复制代码

After passing through spread, the hash value of all Key operations is a number greater than or equal to 0. So ConcurrentHashMap uses the hash of Node to record the state of the node. Refer to the constant definitions above: MOVED, TREEBIN, RESERVED.

put operation

image-20220314180212299-16472521346978.png

initTable

The initialization operation is very simple, just look at the source code directly

private final Node<K,V>[] initTable() {
        Node<K,V>[] tab; int sc;
    //while循环一直来检查table是不是已经被初始化好了
        while ((tab = table) == null || tab.length == 0) {
            //看变量是不是小于0,负数表示有其他线程正在则表正在初始化或调整大小,当前线程像调度器表示当前线程愿意让出CPU
            if ((sc = sizeCtl) < 0)
                Thread.yield(); // 失去了初始化竞赛;只是自旋
            //CAS加锁:比较当前对象的SIZECTL偏移量的位置的值是不是sc,如果是则将SIZECTL偏移量的位置的值设置成-1。    如果当前线程成功设置成-1,那么其他线程再CAS的时候就会发现这个地方的值不是原来的sc了,就加锁失败,退出。
            //另外:sc>0表示哈希表初始化或要扩容的大小
            else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                try {
                    //CAS成功后,再次判断table是不是未初始化,避免在CAS的之前一刻,其他线程完成了初始化操作。
                    if ((tab = table) == null || tab.length == 0) {
                        //计算扩容大小,如果没指定扩容大小,那么按默认容量初始化
                        int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                        table = tab = nt;
                        // >>> 无符号右移,所有sc=(3/4)n,也就是n*0.75;所以这行代码的意思就是把下一次扩容的阈值设置给sc
                        sc = n - (n >>> 2);
                    }
                } finally {
                    //最后将sc设置给sizeCtl,try成功的情况下,sizeCtl记录的则是下一次扩容的阈值;
                    sizeCtl = sc;
                }
                //退出初始化操作
                break;
            }
        }
        return tab;
    }
复制代码

addCount

The transfer operation occurs when the expansion occurs, and the expansion occurs after the number of elements increases:

Look at the counting design of ConcurrentHashmap before looking at the addCount source code, so that it is easier to understand when you look at the source code.

Counting Design Overview

image-20220315144505540.png

counting logic

image-20220315144203536.png

fullAddCount procedure

TODO to be added

helpTransfer

TODO to be added

Expansion and assistance in expansion are assisted by receiving tasks through multiple threads. The expansion is to first point the nextTable to an expanded size and length Node array, and then assist the multi-threading to transfer the data in the Table to the nextTable. The transfer task is received from the table in steps by the method of receiving the task. After the last thread completes the transfer task, the table points to the nextTable to complete the expansion operation.

Guess you like

Origin juejin.im/post/7084931212871434271