Extra chapter on the growth path of programmers -- Introduction to ConcurrentHashmap

The last extra article introduced the implementation principle of hashmap, so this time I will introduce the implementation principle of concurrentHashmap.

What is concurrentHashmap?

As the name implies, concurrent = happens at the same time, so concurrentHashmap can probably be translated into concurrent hashmap. It can indeed be used to handle high concurrency scenarios. If Hashmap is a public passwordless storage box, then ConcurrentHashmap is a storage box with a lock. The biggest feature of concurrentHashmap is the lock segmentation technology, that is, segment, which is its biggest feature different from Hashmap. In addition, concurrentHashmap modifies its basic storage unit Entry, and some of its parameters are modified with volatile to ensure its visibility. (version jdk7.0)

Why do we use ConcurrentHashmap?

In the last introduction to Hashmap, we briefly introduced the problem of Hashmap, that is, in the case of concurrency, there will be a circular linked list leading to the appearance of an infinite loop . After these days of research, I have a deeper understanding of the reasons for the emergence of circular linked lists. (version is jdk7)

First of all, we know that hashmap is an array of linked lists, and the solution to hash conflicts is the chain address method. As shown in the figure,
insert image description here
under normal circumstances, there will be no problems with this storage structure. The conditions for a problem to cause a loop are as follows:

  1. concurrent access
  2. Need to expand

Let's take a look at the source code:

public V put(K key, V value) {
    
    
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    // 找节点位置
    int i = indexFor(hash, table.length);
    //遍历数组,查找到了就返回原值如果没有就添加新entry节点
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    
    
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    
    
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(hash, key, value, i); // 注意这里,进行新的节点的添加
    return null;
}

The above is the source code of put an element

void addEntry(int hash, K key, V value, int bucketIndex) {
    
    
    Entry<K,V> e = table[bucketIndex];
    // 新增一个节点并将节点头指针改为这个新增的节点,因为第四个参数表示next对象
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
    if (size++ >= threshold) //注意这里,进行扩容
        resize(2 * table.length);
}

The above is the source code for adding a new entry node. Note that the node is added first and then expanded. ConcurrentHashmap first judges whether expansion is required. If expansion is required, first expand and then add entry nodes. This benefit can avoid invalid expansion (that is, no node elements are added after expansion)

 void resize(int newCapacity) {
    
    
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
    
    
            threshold = Integer.MAX_VALUE;
            return;
        }
        Entry[] newTable = new Entry[newCapacity]; //1.0
        boolean oldAltHashing = useAltHashing;
        useAltHashing |= sun.misc.VM.isBooted() &&
                (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean rehash = oldAltHashing ^ useAltHashing;
        transfer(newTable, rehash); //1.1
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);  //1.2
}

This is the expansion code of jdk7.0 version, which is mainly divided into three steps:
1.0 -- create a new array
1.1 -- copy the data in the original array
1.2 -- redefine the threshold (threshold)

Look at the transfer code again

    void transfer(Entry[] newTable, boolean rehash) {
    
    
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
    
    
            while(null != e) {
    
    
                Entry<K,V> next = e.next; //step 1.0
                if (rehash) {
    
    
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);//查找位置
                e.next = newTable[i];//注意这里,将节点指向新节点
                newTable[i] = e;//注意这里,头插法插入新数组
                e = next;//注意这里,进行下一次的插入
            }
        }
    }

The code here is more interesting. It inserts the values ​​in the original array into the new array through header interpolation. Next we look at a picture,
insert image description here

Thread 1 has not yet started to expand but is ready to expand, that is, it has reached the code position of step 1.0. At this time, the damn thread 2 comes. Since the table in the transfer function is public, that is to say, each thread has a backup of the original array, and the objects pointed to by e in the table are all the same object. With this premise, let's look at the following picture
insert image description here

Thread 2 has completed the expansion, and thread 1 starts to expand at this time (this is just an accidental situation, the technical term is called a race condition), then thread 1 points to the node A at the beginning, and A.next -> new array after executing the following statement In B, there is B.next -> A, so there is a ring link.
insert image description here
After the circular linked list appears, there will be an infinite loop in the subsequent reading (get). In
addition, after the multi-threaded put non-NULL element of the hashmap, the get operation will get a NULL value; the multi-threaded put operation will cause the element to be lost. Interested technicians can check it out by themselves.

Compared with the problem of hashmap, concurrentHashmap has more advantages. First, it supports concurrent access, also known as concurrent container. Second, it first judges the storage elements after expansion to avoid invalid expansion (the problem of expanding but not inserting nodes ), and again, concurrentHashmap optimizes the entry structure, and modifies value and next with volatile to maintain visibility. The most powerful point is that concurrentHashmap uses lock segmentation technology, so that each segment (segment) is equipped with a lock, so that different segments can be accessed concurrently. (JDK 7)

ConcurrentHashmap source code analysis

insert image description here
The class diagram of ConcurrentHashmap in jdk7 is roughly shown in the figure above, and the source code analysis starts below:
first look at the code of hashEntry

static final class HashEntry<K,V> {
    
    
    final K key;
    final int hash;
    volatile V value;
    final HashEntry<K,V> next;
}

Note that the value here is modified with volatile to ensure its visibility, and other member variables are modified with final to prevent the structure of the linked list from being destroyed.

static final class Segment<K,V> extends ReentrantLock implements Serializable {
    
    
    transient volatile int count; //Segment中元素的数量
    transient int modCount; // 修改次数
    transient int threshold;// 阈值(扩容临界值)
    transient volatile HashEntry<K,V>[] table; // hashentry节点数组
    final float loadFactor; //负载因子
}

After understanding the structure of the node, let's take a look at the initialization method in concurrentHashmap

Initialize concurrentHashmap


    /* ---------------- Constants -------------- */

    /**
     * 最大的容量,是2的幂次方(java 数组索引和分配的最大值约为 1<<30,32位的hash值前面两位用于控制)
     */
    private static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * 数组容量,为2的幂次方, 
     * 1<=DEFAULT_CAPACITY<=MAXIMUM_CAPACITY
     */
    private static final int DEFAULT_CAPACITY = 16;

    /**
     * 最大的数组容量(被toArray和其他数组方法调用时获取所需要)
     */
    static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    /**
     * 默认的并发等级.
     * 为12、13、14、15、16表示segment数组大小默认为16
     */
    private static final int DEFAULT_CONCURRENCY_LEVEL = 16;

    /**
     * 负载因子,考虑到红黑树和链表的平均检索时间,取0.75为宜。
     * 这样接近O(1)
     */
    private static final float LOAD_FACTOR = 0.75f;

    /**
     * 红黑树化链表的阈值,即当前hashentry中桶链表节点的对象长度
     * >=8时进行扩容该节点会红黑树化
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * 红黑树化链表的阈值,即当前hashentry中桶链表节点的对象长度
     * <= 6 时进行扩容该节点仍为链表
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * 最小的链表数组容量(至少为4倍的TREEIFY_THRESHOLD。即32)
     * 以防止扩容和红黑树化阈值的冲突
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

    /**
     * Minimum number of rebinnings per transfer step. Ranges are
     * subdivided to allow multiple resizer threads.  This value
     * serves as a lower bound to avoid resizers encountering
     * excessive memory contention.  The value should be at least
     * DEFAULT_CAPACITY.
     */
    private static final int MIN_TRANSFER_STRIDE = 16;

    /**
     * 扩容戳,和resizeStamp函数有关
     * Must be at least 6 for 32bit arrays.(至少6位以满足32位的数组)
     * rs(RESIZE_STAMP_BITS) = 1 << (RESIZE_STAMP_BITS - 1)
     * rs(6) = 1 << (6-1) = 32
     */
    private static int RESIZE_STAMP_BITS = 16;

    /**
     * 最大的可扩容线程数
     * 线程在扩容时会将高RESIZE_STAMP_BITS作为扩容后的标记,高 32- RESIZE_STAMP_BITS 为作为扩容线程数
     */
    private static final int MAX_RESIZERS = (1 << (32 - RESIZE_STAMP_BITS)) - 1;

    /**
     * The bit shift for recording size stamp in sizeCtl.
     * 扩容戳的位偏移
     */
    private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;
	// ...

public ConcurrentHashMap(int initialCapacity,
                         float loadFactor, int concurrencyLevel) {
    
    
    if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
        throw new IllegalArgumentException();
    if (concurrencyLevel > MAX_SEGMENTS)
        concurrencyLevel = MAX_SEGMENTS;
        
    // Find power-of-two sizes best matching arguments
    int sshift = 0;
    int ssize = 1;
    while (ssize < concurrencyLevel) {
    
    
        ++sshift;
        ssize <<= 1;
    }
    segmentShift = 32 - sshift;
    segmentMask = ssize - 1;
    this.segments = Segment.newArray(ssize);

    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    int c = initialCapacity / ssize;
    if (c * ssize < initialCapacity)
        ++c;
    //最小Segment中存储元素的个数为2
    int cap = MIN_SEGMENT_TABLE_CAPACITY;
    while (cap < c)
        cap <<= 1;
 	//创建segments数组并初始化第一个Segment,其余的Segment延迟初始化
    Segment<K,V> s0 =
        new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                         (HashEntry<K,V>[])new HashEntry[cap]);
    Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
    UNSAFE.putOrderedObject(ss, SBASE, s0); 
    this.segments = ss;
}

In that initialization there are some parameters:

  1. loadFactor load factor
  2. initialCapacity Initial capacity size, equal to segment * segment capacity
  3. concurrencyLevel concurrency level, used to determine the length of the segment, such as concurrencyLevel = 13,14,15,16 when the segment segment size is 16
  4. sshift indicates the number of digits occupied by the concurrency level (number of segments), which is used to determine the size of the segment offset. Segment offset = 32 - sshift indicates the number of digits shifted to the right when hashing later, which will be discussed later
  5. ssize indicates the size of the segment, which is not lower than the power of 2 of the concurrencyLevel.
  6. segmentShift segment offset, which will be mentioned later, is used for segment rehashing
  7. segmentMask segment mask, which will be mentioned later, is used for segment rehashing to take the high n bits.
  8. MAXIMUM_CAPACITY is the maximum number of segments
  9. c, cap is used to determine the capacity of each segment, which is also a power of 2, and loadfactor is also applicable to the objects in each segment.

Introduction to the initialization process:

  • Perform parameter validation
  • Determine whether the concurrency level exceeds the maximum value, and if so, set the concurrency level to the maximum value,
  • Get ssize (segment length) and sshift according to concurrency level
  • Calculate segmentshift (segment offset) = 32 - sshift, and then determine the high-order data offset that needs to be ANDed when rehashing (the number of bits that the high-order bits move to the right, making the high-order bits low).
  • Calculate segmentmask (segment mask) = ssize -1, that is, take the lower segmentmask bits after the offset for rehashing. (That is, the data of the original high bit n can determine the segment position)
  • Calculate the capacity of hashEntry in each segment. It is cap. By default, initialCapacity is equal to 16 and loadFactor is equal to 0.75. By calculating cap=1, threshold=0.

Insert elements into concurrentHashmap

First, let's take a look at the structure of Segment
Image source:
https://blog.csdn.net/m0_37135421/article/details/80551884
insert image description here

static final class Segment<K, V> extends ReentrantLock implements Serializable {
    
    
 
	/**
	 * scanAndLockForPut中自旋循环获取锁的最大自旋次数。
	 */
	static final int MAX_SCAN_RETRIES = Runtime.getRuntime().availableProcessors() > 1 ? 64 : 1;
 
	/**
	 * 链表数组,数组中的每一个元素代表了一个链表的头部
	 */
	transient volatile HashEntry<K, V>[] table;
 
	/**
	 * 用于记录每个Segment桶中键值对的个数
	 */
	transient int count;
 
	/**
	 * 对table的修改次数
	 */
	transient int modCount;
 
	/**
	 * 阈值,Segment里面元素的数量超过这个值依旧就会对Segment进行扩容
	 */
	transient int threshold;
 
	/**
	 * 负载因子,用于确定threshold,默认是1
	 */
	final float loadFactor;
}
 
static final class HashEntry<K, V> {
    
    
	final int hash;
	final K key;
	volatile V value; //设置可见性
	volatile HashEntry<K, V> next; //不再用final关键字,采用unsafe操作保证并发安全
}

The segment uses a reentrant lock to ensure that each operation on the segment is atomic. Every time a segment is operated, the lock of the segment is acquired first, and then the operation is performed. And operations between segments do not interfere with each other due to the existence of different locks.

Let's take a look at the put method

// ConcurrentHashMap类的put()方法
public V put(K key, V value) {
    
    
    Segment<K,V> s;
    //concurrentHashMap不允许key/value为空
    if (value == null)
        throw new NullPointerException();
    //hash函数对key的hashCode重新散列,避免差劲的不合理的hashcode,保证散列均匀
    int hash = hash(key);
    //返回的hash值无符号右移segmentShift位与段掩码进行位运算,定位segment,即进行再散列操作
    int j = (hash >>> segmentShift) & segmentMask;
    if ((s = (Segment<K,V>)UNSAFE.getObject 
         (segments, (j << SSHIFT) + SBASE)) == null)        s = ensureSegment(j);
    // 调用Segment类的put方法
    return s.put(key, hash, value, false);  
}
 
// Segment类的put()方法
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    
    
    // 注意这里,这里进行加锁
    HashEntry<K,V> node = tryLock() ? null :
        scanAndLockForPut(key, hash, value); //如果加锁失败,则调用该方法
    V oldValue;
    try {
    
    
        HashEntry<K,V>[] tab = table;
        // 根据hash计算在table[]数组中的位置
        int index = (tab.length - 1) & hash;
        HashEntry<K,V> first = entryAt(tab, index);
        for (HashEntry<K,V> e = first;;) {
    
    
            if (e != null) {
    
     //若不为null,则持续查找,知道找到key和hash值相同的节点,将其value更新
                K k;
                if ((k = e.key) == key ||
                    (e.hash == hash && key.equals(k))) {
    
    
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
    
    
                        e.value = value;
                        ++modCount;
                    }
                    break;
                }
                e = e.next;
            }
            else {
    
     //如果在链表中没有找到对应的node
                if (node != null) //如果scanAndLockForPut方法中已经返回的对应的node,则将其插入first之前
                    node.setNext(first);
                else //否则,new一个新的HashEntry
                    node = new HashEntry<K,V>(hash, key, value, first);
                int c = count + 1;
                // 判断table[]是否需要扩容,并通过rehash()函数完成扩容
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node);
                else  //设置node到Hash表的index索引处
                    setEntryAt(tab, index, node);
                ++modCount;
                count = c;
                oldValue = null;
                break;
            }
        }
    } finally {
    
    
        unlock();
    }
    return oldValue;
}

The steps of the put operation:

  1. Determine whether the value is empty
  2. hash the key
  3. Determine the location of the segment (segment) of the data store based on the hash value of the key
  4. Insert the key, value key-value pair into the hashEntry in the segment, return the old value if it exists, and create a new node if it does not exist. Note that the lock is inserted here.

Get elements from concurrentHashmap

public V get(Object key) {
    
    
    Segment<K,V> s; 
    HashEntry<K,V>[] tab;
    int h = hash(key);
    long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
    //先定位Segment,再定位HashEntry
    if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
        (tab = s.table) != null) {
    
    
        for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
                 (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
             e != null; e = e.next) {
    
    
            K k;
            if ((k = e.key) == key || (e.hash == h && key.equals(k)))
                return e.value;
        }
    }
    return null;
}

The get operation is relatively simple, as long as the segment position is determined according to the re-hashed value, and then the hashentry position is determined according to the key-value pair.

How does concurrentHashmap achieve expansion?

  1. First judge whether the hashentry array in the segment reaches the threshold, if it exceeds, expand the capacity, and then insert elements
  2. The expansion is generally 2 times expansion, and the elements in the original array are re-hashed and then inserted into the new array. For efficiency, concurrentHashmap only expands a segment but not the entire container.
    -------------------------- To be continued --------------------- --------

Guess you like

Origin blog.csdn.net/qq_31236027/article/details/124504165