Chapter IV concurrent containers ConcurrentHashMap

Multithreading HashMap HashMap will lead Entry forms an annular list data structure, data structure to form a ring once, never Entry of the next node is not empty, it will produce an endless loop acquire Entry.

HashTable using synchronized to ensure thread safety, but HashTable in the fierce competition in the thread very inefficient. Because when a thread synchronization method of access HashTable, when other threads have access HashTable synchronization method, and enters the blocking or polling state. Such as thread element to be put to use 1, thread 2 will not be able to add elements using the put method can not use the get method to get the elements, so the lower the efficiency of more intense competition.

putIfAbsent (): This value is not placed map, have the value corresponding to original key is returned.

First, prior knowledge

1, Hash algorithms:

Hash, Hash: the arbitrary length by an algorithm (hash), converted into fixed-length output, the output value is the hash value. It belongs to the contraction mapping, prone to hash collision. Hash algorithms have direct access to more than law.

When generating the hash solution to the conflict:

  • OpenAddressed
  • Rehashing
  • Chain address method (elements with the same hash value string together chain)

ConcurrentHashMap address using the chain method when hash conflict. md4, md5, sha-hash algorithm also belong to the hash algorithm, also known as the digest algorithm.
Describes the common hash algorithm:
(. 1) the MD4 (2) is still the MD5 its input packet 512, which is cascaded output four 32-bit words (3) SHA-1 and others.

2, bit operations:

int type bit
Here Insert Picture Description
0 = 1 1 th power of 2 = 2 ......., the above table represents a number (3 + 2 th power of 2, 5) = 40
can be seen from the above table, numeric type digital gradually becomes large, slowly from low to high is extended.
A positive number of 31 = 0 negative when actually save Java int type: 31 bit = 1

Common bit operation are:

  • 位与 & (1&1=1 1&0=0 0&0=0)
  • Bit or | (1 | 1 = 1 1 | 0 = 1 0 | 0 = 0)
  • THE NON ~ (~ ~ 0 1 = 0 = 1)
  • Bitwise XOR ^ (1 = 1 ^ 1 ^ 0 ^ 0 0 0 = 1 = 0)
  • << >> signed signed right shift left unsigned shift right >>> example: 328 = 2 8 << 2 >> 2 =

Modulo operation a% (Math.pow (2, n)) is equivalent to a & (Math.pow (2, n) -1)

Interesting modulus properties: modulo a% (2 ^ n) is equivalent to a & (2 ^ n - 1), so that the number of arrays in the map number must be a power of 2, in which the key value is calculated element, when it used to quickly locate bit operation.

3, the use of scene-bit computing

Modifiers in Java classes, member variables modifiers, modifier methods, such as class Class
implemented Java HashMap vessel and ConcurrentHashMap the
access control attribute or merchandise
simple reversible encryption, such as an exclusive OR operation (1 ^ 1 = 0; 0 ^ 1 = 1)
real: access control, com.chj.thread.capt05.bitconvert.Permission
use bit operation advantages and disadvantages:
save a lot of code amount, high efficiency, small changes in impact properties, non-intuitive

Second, the thread-safe HashMap

HashMap which is an array, then each array element is a singly linked list, since the multi-threaded environment, using Hashmap for a put operation may cause an infinite loop, resulting in close to 100% CPU utilization, can not be used in the HashMap concurrency .
Java1.7 HashMap configuration: the
Here Insert Picture Description
figure above, the instance of each entity is green nested class of Entry, Entry contains four attributes: key, value, hash value, and for the next way linked list.

  • capacity: current array capacity, kept 2 ^ n, the expansion can, after expansion to the current size of the array twice.
  • loadFactor: load factor, the default is 0.75.
  • threshold: expansion threshold, equal capacity * loadFactor

1. Definition of Terms

Hash Algorithm hash algorithm:
is a device that converts any input to the same length of the output content encryption method, which is referred to output a hash value.
Hashtable hash table:
The set of the hash function H (key) and a method for handling conflicts mapped to a finite set of keywords in the address range, and as a keyword in the address interval as reported in Table a storage location, which is called a hash table or hash table, the resulting hash address or storage locations called hash address.

2.put process analysis:

2.1 source code analysis:

public V put(K key, V value) {
    // 当插入第一个元素的时候,需要先初始化数组大小
    if (table == EMPTY_TABLE) {
        inflateTable(threshold); //capacity * loadFactor
    }
    // 如果 key 为 null,感兴趣的可以往里看,最终会将这个 entry 放到 table[0] 中
    if (key == null)
        return putForNullKey(value);
    // 1. 求 key 的 hash 值
    int hash = hash(key);
    // 2. 找到对应的数组下标
    int i = indexFor(hash, table.length);
    // 3. 遍历一下对应下标处的链表,看是否有重复的 key 已经存在,
    //    如果有,直接覆盖,put 方法返回旧值就结束了
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    // 4. 不存在重复的 key,将此 entry 添加到链表中,细节后面说
    addEntry(hash, key, value, i);
    return null;
}

2.2 array initialization:

In a first insertion element when doing a HashMap array initialization is to first determine the initial size of the array, and calculates the threshold array expansion

private void inflateTable(int toSize) {
    // 保证数组大小一定是 2 的 n 次方。
    // 比如这样初始化:new HashMap(20),那么处理成初始数组大小是 32
    int capacity = roundUpToPowerOf2(toSize);
    // 计算扩容阈值:capacity * loadFactor
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    // 算是初始化数组吧
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity); //ignore
}

2.3 add a node to the list:

Find the array index, it will carry out a key sentence heavy, if there is no duplication, ready to put the new value into the header of the list.

void addEntry(int hash, K key, V value, int bucketIndex) {
    // 如果当前 HashMap 大小已经达到了阈值,并且新值要插入的数组位置已经有元素了,那么要扩容
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // 扩容,后面会介绍一下
        resize(2 * table.length);
        // 扩容以后,重新计算 hash 值
        hash = (null != key) ? hash(key) : 0;
        // 重新计算扩容后的新的下标
        bucketIndex = indexFor(hash, table.length);
    }
    // 往下看
    createEntry(hash, key, value, bucketIndex);
}
// 这个很简单,其实就是将新值放到链表的表头,然后 size++
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

3.get process analysis

Relative to put the process, get the process is very simple.
1) The hash value calculated key.
2) find the corresponding array subscript: hash & (length - 1) .
3) traversing the list at a position of the array until it finds equality (== or the equals) the key.

public V get(Object key) {
    // 之前说过,key 为 null 的话,会被放到 table[0],所以只要遍历下 table[0] 处的链表就可以了
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);
    return null == entry ? null : entry.getValue();
}
final Entry<K,V> getEntry(Object key) {
    if (size == 0) { return null;}
    int hash = (key == null) ? 0 : hash(key);
    // 确定数组下标,然后从头开始遍历链表,直到找到为止
    for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    	 }
    	return null;
}

4. The low efficiency of the container HashTable

HashTable container using synchronized to ensure thread safety, but HashTable in the fierce competition in the thread very inefficient. Because when one thread to access the HashTable synchronization method, synchronization method HashTable other threads access may be blocked or enter the polling state. Such as thread 1 put to use to add elements, thread 2 will not be able to add elements, and also can not use the get method to get the elements using the put method, so the lower the efficiency of more intense competition.

Third, the principle and JDK1.7

1, the segmented locking mechanism

Hashtable reason why inefficient mainly because of its implementation uses the synchronized keyword lock to put other operations carried out, and the synchronized keyword lock the entire object is locked, which means that during the operation put Hash tables and other modifications , the entire Hash table locked, so that their performance inefficiencies; Thus, version 1.7 JDK1.5 ~, Java uses a segmented locking mechanism to achieve ConcurrentHashMap.

Briefly, the object of ConcurrentHashMap stored in a Segment array, i.e. the entire Hash table is divided into a plurality of segments; Segment and each element, i.e. each segment is similar to a the Hashtable; when performing this operation first of all put according to hash algorithm to locate elements which belong Segment, and then you can lock the Segment. Therefore, ConcurrentHashMap in multi-threaded programming, but multi-threaded put operation. The article then detailed analysis of the realization of the principle of ConcurrentHashMap JDK1.7 version.

2, ConcurrentHashMap a data structure

Here Insert Picture Description
ConcurrentHashMap is <K, V> a Segment [] array structure and HashEntry array structures. Segment actual inherited from reentrant ReentrantLock, play a role in ConcurrentHashMap lock in; HashEntry key-value pairs are used to store data. A ConcurrentHashMap contains a Segment array, each Segment contains a HashEntry array, we call table, each HashEntry is an element of a linked list structure.
Here Insert Picture Description
Frequently Asked Interview:
ConcurrentHashMap The principle is how ConcurrentHashMap or ask how to achieve performance gains in high-concurrency Thread ensure safety at the same time?
A: ConcurrentHashMap concurrently allowing multiple modifications, which lock key is used separation techniques. It uses a plurality of locks to control different parts of the hash table modifications performed. Internal segment (Segment) to represent these different sections, each segment is actually a small hash table, as long as multiple modifications occur at different segments, they can be complicated.

3, ConcurrentHashMap initialization

ConcurrentHashMap JDK1.7 initialization is divided into two parts: First, initialization ConcurrentHashMap, i.e., array initialization segments, segmentShift segmentMask segment offset and segment mask and the like; and initialize each segment are segment.

ConcurrentHashMap contains multiple constructors, and all the constructors eventually invoke the constructor as follows:

public ConcurrentHashMap(int initialCapacity, float loadFactor, int concurrencyLevel) {
    if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
        throw new IllegalArgumentException();
    if (concurrencyLevel > MAX_SEGMENTS)
        concurrencyLevel = MAX_SEGMENTS;
    int sshift = 0;
    int ssize = 1;
    // 计算并行级别 ssize,因为要保持并行级别是 2 的 n 次方
    while (ssize < concurrencyLevel) {
        ++sshift;
        ssize <<= 1;
    }
    // 我们这里先不要那么烧脑,用默认值,concurrencyLevel 为 16,sshift 为 4
    // 那么计算出 segmentShift 为 28,segmentMask 为 15,后面会用到这两个值
    this.segmentShift = 32 - sshift;
    this.segmentMask = ssize - 1;
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // initialCapacity 是设置整个 map 初始的大小,
    // 这里根据 initialCapacity 计算 Segment 数组中每个位置可以分到的大小
    // 如 initialCapacity 为 64,那么每个 Segment 或称之为"槽"可以分到 4 个
    int c = initialCapacity / ssize;
    if (c * ssize < initialCapacity)
        ++c;
    // 默认 MIN_SEGMENT_TABLE_CAPACITY 是 2,这个值也是有讲究的,因为这样的话,对于具体的槽上,
    // 插入一个元素不至于扩容,插入第二个的时候才会扩容
    int cap = MIN_SEGMENT_TABLE_CAPACITY; 
    while (cap < c)
        cap <<= 1;
    // 创建 Segment 数组,并创建数组的第一个元素 segment[0]
    Segment<K,V> s0 = new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
                         (HashEntry<K,V>[])new HashEntry[cap]);
    Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
    // 往数组写入 segment[0]
    UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
    this.segments = ss;
}

由代码可知,该构造函数需要传入三个参数:initialCapacity、loadFactor、concurrencyLevel,其中,concurrencyLevel主要用来初始化segments、segmentShift和segmentMask等;而initialCapacity和loadFactor则主要用来初始化每个Segment分段。

名词解释:

  • initialCapacity:初始容量大小 ,默认16。
  • loadFactor:扩容因子,默认0.75,当一个Segment存储的元素数量大于initialCapacity*
  • loadFactor时,该Segment会进行一次扩容。 concurrencyLevel 并发度,默认16。

并发度可以理解为程序运行时能够同时更新ConccurentHashMap且不产生锁竞争的最大线程数,实际上就是ConcurrentHashMap中的分段锁个数,即Segment[]的数组长度。如果并发度设置的过小,会带来严重的锁竞争问题;如果并发度设置的过大,原本位于同一个Segment内的访问会扩散到不同的Segment中,CPU cache命中率会下降,从而引起程序性能下降。

3.1 初始化ConcurrentHashMap

根据ConcurrentHashMap的构造方法可知,在初始化时创建了两个中间变量ssize和sshift,它们都是通过concurrencyLevel计算得到的。其中ssize表示了segments数组的长度,为了能通过按位与的散列算法来定位segments数组的索引,必须保证segments数组的长度是2的N次方,所以在初始化时通过循环计算出一个大于或等于concurrencyLevel的最小的2的N次方值来作为数组的长度;而sshift表示了计算ssize时进行移位操作的次数。

segmentShift用于定位参与散列运算的位数,其等于32减去sshift,使用32是因为ConcurrentHashMap的hash()方法返回的最大数是32位的;segmentMask是散列运算的掩码,等于ssize减去1,所以掩码的二进制各位都为1。因为ssize的最大长度为65536,所以segmentShift最大值为16,segmentMask最大值为65535. 由于segmentShift和segmentMask与散列运算相关,因此之后还会对此进行分析。

构造方法中部分代码解惑:
Here Insert Picture Description
保证Segment数组的大小,一定为2的幂,例如用户设置并发度为17,则实际Segment数组大小则为32
Here Insert Picture Description
保证每个Segment中tabel数组的大小,一定为2的幂,初始化的三个参数取默认值时,table数组大小为2
初始化Segment数组,并实际只填充Segment数组的第0个元素。
用于定位元素所在segment。segmentShift表示偏移位数,通过前面的int类型的位的描述我们可以得知,int类型的数字在变大的过程中,低位总是比高位先填满的,为保证元素在segment级别分布的尽量均匀,计算元素所在segment时,总是取hash值的高位进行计算。segmentMask作用就是为了利用位运算中取模的操作:a % (Math.pow(2,n)) 等价于 a&( Math.pow(2,n)-1)

3.2 初始化Segment分段

ConcurrentHashMap通过initialCapacity和loadFactor来初始化每个Segment. 在初始化Segment时,也定义了一个中间变量cap,其等于initialCapacity除以ssize的倍数c,如果c大于1,则取大于等于c的2的N次方,cap表示Segment中HashEntry数组的长度;loadFactor表示了Segment的加载因子,通过cap*loadFactor获得每个Segment的阈值threshold.
默认情况下,initialCapacity等于16,loadFactor等于0.75,concurrencyLevel等于16.

3.3 定位Segment

由于采用了Segment分段锁机制实现一个高效的同步,那么首先则需要通过hash散列算法计算key的hash值,从而定位其所在的Segment. 因此,首先需要了解ConcurrentHashMap中hash()函数的实现。
在这里插入图片描述
通过hash()函数可知,首先通过计算一个随机的hashSeed减少String类型的key值的hash冲突;然后利用Wang/Jenkins hash算法对key的hash值进行再hash计算。通过这两种方式都是为了减少散列冲突,从而提高效率。因为如果散列的质量太差,元素分布不均,那么使用Segment分段加锁也就没有意义了。
在这里插入图片描述

4、ConcurrentHashMap的操作

在介绍ConcurrentHashMap的操作之前,首先需要介绍一下Unsafe类,因为在JDK1.7新版本中是通过Unsafe类的方法实现锁操作的。Unsafe类是一个保护类,一般应用程序很少用到,但其在一些框架中经常用到,如JDK、Netty、Spring等框架。Unsafe类提供了一些硬件级别的原子操作,其在JDK1.7和JDK1.8中的ConcurrentHashMap都有用到,但其用法却不同,在此只介绍在JDK1.7中用到的几个方法:

  • arrayBaseOffset(Class class):获取数组第一个元素的偏移地址。
  • arrayIndexScale(Class class):获取数组中元素的增量地址。
  • getObjectVolatile(Object obj, long offset):获取obj对象中offset偏移地址对应的Object型field属性值,支持Volatile读内存语义。
    对于某个元素而言,一定是放在某个segment元素的某个table元素中的,所以在定位上,定位segment:取得key的hashcode值进行一次再散列(通过Wang/Jenkins算法),拿到再散列值后,以再散列值的高位进行取模得到当前元素在哪个segment上。
    在这里插入图片描述
    定位table:同样是取得key的再散列值以后,用再散列值的全部和table的长度进行取模,得到当前元素在table的哪个元素上。

4.1、Get() 方法:

JDK1.7的ConcurrentHashMap的get操作是不加锁的,因为在每个Segment中定义的HashEntry数组和在每个HashEntry中定义的value和next HashEntry节点都是volatile类型的,volatile类型的变量可以保证其在多线程之间的可见性,因此可以被多个线程同时读,从而不用加锁。而其get操作步骤也比较简单,定位Segment –> 定位HashEntry –> 通过getObjectVolatile()方法获取指定偏移量上的HashEntry –> 通过循环遍历链表获取对应值。

  • 定位Segment:(((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE
  • 定位HashEntry:(((tab.length - 1) & h)) << TSHIFT) + TBASE
    在这里插入图片描述
    定位segment和定位table后,依次扫描这个table元素下的的链表,要么找到元素,要么返回null。

在高并发下的情况下如何保证取得的元素是最新的?
答:用于存储键值对数据的HashEntry,在设计上它的成员变量value等都是volatile类型的,这样就保证别的线程对value值的修改,get方法可以马上看到。
在这里插入图片描述

4.2、put() 方法

ConcurrentHashMap的put方法就要比get方法复杂的多,其实现源码如下:

public V put(K key, V value) {
    Segment<K,V> s;
    if (value == null)  throw new NullPointerException();
    // 1. 计算 key 的 hash 值
    int hash = hash(key);
    // 2. 根据 hash 值找到 Segment 数组中的位置 j
    //    hash 是 32 位,无符号右移 segmentShift(28) 位,剩下低 4 位,
    //    然后和 segmentMask(15) 做一次与操作,也就是说 j 是 hash 值的最后 4 位,也就是槽的数组下标
    int j = (hash >>> segmentShift) & segmentMask;
    // 刚刚说了,初始化的时候初始化了 segment[0],但是其他位置还是 null,
    // ensureSegment(j) 对 segment[j] 进行初始化
    if ((s = (Segment<K,V>)UNSAFE.getObject(
segments, (j << SSHIFT) + SBASE)) == null) //  in ensureSegment
        s = ensureSegment(j);
    // 3. 插入新值到 槽 s 中
    return s.put(key, hash, value, false);
}

第一层皮很简单,根据 hash 值很快就能找到相应的 Segment,之后就是 Segment 内部的 put 操作了。
Segment 内部是由 数组+链表 组成的:

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
    // 在往该 segment 写入前,需要先获取该 segment 的独占锁
    // 先看主流程,后面还会具体介绍这部分内容
    HashEntry<K,V> node = tryLock() ? null : scanAndLockForPut(key, hash, value);
    V oldValue;
    try {
        HashEntry<K,V>[] tab = table;  // 这个是 segment 内部的数组
        int index = (tab.length - 1) & hash; // 再利用 hash 值,求应该放置的数组下标
        HashEntry<K,V> first = entryAt(tab, index); // first 是数组该位置处的链表的表头
        // 下面这串 for 循环虽然很长,不过也很好理解,想想该位置没有任何元素和已经存在一个链表这两种情况
        for (HashEntry<K,V> e = first;;) {
            if (e != null) {
                K k;
                if ((k = e.key) == key || (e.hash == hash && key.equals(k))) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) {
                        e.value = value;  // 覆盖旧值
                        ++modCount;
                    }
                    break;
                }
                e = e.next;  // 继续顺着链表走
            }
            else {
                // node 到底是不是 null,这个要看获取锁的过程,不过和这里都没有关系。
                // 如果不为 null,那就直接将它设置为链表表头;如果是null,初始化并设置为链表表头。
                if (node != null)
                    node.setNext(first);
                else
                    node = new HashEntry<K,V>(hash, key, value, first);
                int c = count + 1;
                // 如果超过了该 segment 的阈值,这个 segment 需要扩容
                if (c > threshold && tab.length < MAXIMUM_CAPACITY)
                    rehash(node); // 扩容后面也会具体分析
                else
                 // 没有达到阈值,将node放到数组tab的index位置,其实就是将新的节点设置成原链表的表头
                    	setEntryAt(tab, index, node);
                	++modCount;
                	count = c;
                	oldValue = null;
                	break;
            }
        }
    } finally {
        unlock();
    }
    return oldValue;
}

同样的,put方法首先也会通过hash算法定位到对应的Segment,此时,如果获取到的Segment为空,则调用ensureSegment()方法;否则,直接调用查询到的Segment的put方法插入值,注意此处并没有用getObjectVolatile()方法读,而是在ensureSegment()中再用volatile读操作,这样可以在查询segments不为空的时候避免使用volatile读,提高效率。在ensureSegment()方法中,首先使用getObjectVolatile()读取对应Segment,如果还是为空,则以segments[0]为原型创建一个Segment对象,并将这个对象设置为对应的Segment值并返回。

private Segment<K,V> ensureSegment(int k) {
    final Segment<K,V>[] ss = this.segments;
    long u = (k << SSHIFT) + SBASE; // raw offset
    Segment<K,V> seg;
    if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {
        // 这里看到为什么之前要初始化 segment[0] 了,使用当前 segment[0] 处的数组长度和负载因子
// 来初始化 segment[k]为什么要用“当前”,因为 segment[0] 可能早就扩容过了
        Segment<K,V> proto = ss[0];
        int cap = proto.table.length;
        float lf = proto.loadFactor;
        int threshold = (int)(cap * lf);
        // 初始化 segment[k] 内部的数组
        HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap];
  	 // 再次检查一遍该槽是否被其他线程初始化了。
        if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) { 
            Segment<K,V> s = new Segment<K,V>(lf, threshold, tab);
            // 使用 while 循环,内部用 CAS,当前线程成功设值或其他线程成功设值后,退出
            while ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {
                if (UNSAFE.compareAndSwapObject(ss, u, null, seg = s))
                    break;
            }
        }
    }
    return seg;
}

在Segment的put方法中,首先需要调用tryLock()方法获取锁,然后通过hash算法定位到对应的HashEntry,然后遍历整个链表,如果查到key值,则直接插入元素即可;而如果没有查询到对应的key,则需要调用rehash()方法对Segment中保存的table进行扩容,扩容为原来的2倍,并在扩容之后插入对应的元素。插入一个key/value对后,需要将统计Segment中元素个数的count属性加1。最后,插入成功之后,需要使用unLock()释放锁。
对Segment 加锁:
在这里插入图片描述
定位所在的table元素,并扫描table下的链表,找到时:
在这里插入图片描述
没有找到时:
在这里插入图片描述

4.3扩容操作

Segment 不扩容,扩容下面的table数组,每次都是将数组翻倍
在这里插入图片描述
带来的好处:假设原来table长度为4,那么元素在table中的分布是这样的:
在这里插入图片描述
扩容后table长度变为8,那么元素在table中的分布变成:
可以看见 hash值为34和56的下标保持不变,而15,23,77的下标都是在原来下标的基础上+4即可,可以快速定位和减少重排次数。

4.4 size方法:

ConcurrentHashMap的size操作的实现方法也非常巧妙,一开始并不对Segment加锁,而是直接尝试将所有的Segment元素中的count相加,这样执行两次,然后将两次的结果对比,如果两次结果相等则直接返回;而如果两次结果不同,则再将所有Segment加锁,然后再执行统计得到对应的size值

弱一致性:

get方法和containsKey方法都是通过对链表遍历判断是否存在key相同的节点以及获得该节点的value。但由于遍历过程中其他线程可能对链表结构做了调整,因此get和containsKey返回的可能是过时的数据,这一点是ConcurrentHashMap在弱一致性上的体现。

四、JDK1.8中原理和实现

在JDK1.7之前,ConcurrentHashMap是通过分段锁机制来实现的,所以其最大并发度受Segment的个数限制。因此,在JDK1.8中,ConcurrentHashMap的实现原理摒弃了这种设计,而是选择了与HashMap类似的数组+链表+红黑树的方式实现,而加锁则采用CAS和synchronized实现。

1、ConcurrentHashMap的数据结构
在这里插入图片描述
JDK1.8的ConcurrentHashMap数据结构比JDK1.7之前的要简单的多,其使用的是HashMap一样的数据结构:数组+链表+红黑树。ConcurrentHashMap中包含一个table数组,其类型是一个Node数组;而Node是一个继承自Map.Entry<K, V>的链表,而当这个链表结构中的数据大于8,则将数据结构升级为TreeBin类型的红黑树结构。

另外,JDK1.8中的ConcurrentHashMap中还包含一个重要属性sizeCtl,其是一个控制标识符,不同的值代表不同的意思:其为0时,表示hash表还未初始化,而为正数时这个数值表示初始化或下一次扩容的大小,相当于一个阈值;即如果hash表的实际大小>=sizeCtl,则进行扩容,默认情况下其是当前ConcurrentHashMap容量的0.75倍;而如果sizeCtl为-1,表示正在进行初始化操作;而为-N时,则表示有N-1个线程正在进行扩容。
在这里插入图片描述
JDK1.8的ConcurrentHashMap的初始化过程也比较简单,所有的构造方法最终都会调用如下这个构造方法。

  public ConcurrentHashMap(int initialCapacity,float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();
        if (initialCapacity < concurrencyLevel)   // Use at least as many bins
            initialCapacity = concurrencyLevel;   // as estimated threads
        long size = (long)(1.0 + (long)initialCapacity / loadFactor);
        int cap = (size >= (long)MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : tableSizeFor((int)size);
        this.sizeCtl = cap;
    }

该初始化过程通过指定的初始容量initialCapacity,加载因子loadFactor和预估并发度concurrencyLevel三个参数计算table数组的初始大小sizeCtl的值。
可以看到,在构造ConcurrentHashMap时,并不会对hash表(Node<K, V>[] table)进行初始化,hash表的初始化是在插入第一个元素时进行的。在put操作时,如果检测到table为空或其长度为0时,则会调用initTable()方法对table进行初始化操作。

private final Node<K,V>[] initTable() {
        Node<K,V>[] tab; int sc;
        while ((tab = table) == null || tab.length == 0) {
            if ((sc = sizeCtl) < 0)
                Thread.yield(); // lost initialization race; just spin
            else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                try {
                    if ((tab = table) == null || tab.length == 0) {
                        int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                        table = tab = nt;
                        sc = n - (n >>> 2);
                    }
                } finally {
                    sizeCtl = sc;
                }
                break;
            }
        }
        return tab;
    }

可以看到,该方法使用一个循环实现table的初始化;在循环中,首先会判断sizeCtl的值,如果其小于0,则说明其正在进行初始化或扩容操作,则不执行任何操作,调用yield()方法使当前线程返回等待状态;而如果sizeCtl大于等于0,则使用CAS操作比较sizeCtl的值是否是-1,如果是-1则进行初始化。初始化时,如果sizeCtl的值为0,则创建默认容量的table;否则创建大小为sizeCtl的table;然后重置sizeCtl的值为0.75n,即当前table容量的0.75倍,并返回创建的table,此时初始化hash表完成。

3、Node链表和红黑树结构转换

上文中说到,一个table元素会根据其包含的Node节点数在链表和红黑树两种结构之间切换,因此我们本节先介绍Node节点的结构转换的实现。
首先,在table中添加一个元素时,如果添加元素的链表节点个数超过8,则会触发链表向红黑树结构转换。具体的实现方法如下:

    /**
     * Replaces all linked nodes in bin at given index unless table is
     * too small, in which case resizes instead.
     */
    private final void treeifyBin(Node<K,V>[] tab, int index) {
        Node<K,V> b; int n, sc;
        if (tab != null) {
            if ((n = tab.length) < MIN_TREEIFY_CAPACITY)
                tryPresize(n << 1);
            else if ((b = tabAt(tab, index)) != null && b.hash >= 0) {
                synchronized (b) {
                    if (tabAt(tab, index) == b) {
                        TreeNode<K,V> hd = null, tl = null;
                        for (Node<K,V> e = b; e != null; e = e.next) {
                            TreeNode<K,V> p =
                                new TreeNode<K,V>(e.hash, e.key, e.val,
                                                  null, null);
                            if ((p.prev = tl) == null)
                                hd = p;
                            else
                                tl.next = p;
                            tl = p;
                        }
                        setTabAt(tab, index, new TreeBin<K,V>(hd));
                    }
                }
            }
        }
    }

该方法首先会检查hash表的大小是否大于等于MIN_TREEIFY_CAPACITY,默认值为64,如果小于该值,则表示不需要转化为红黑树结构,直接将hash表扩容即可。

如果当前table的长度大于64,则使用CAS获取指定的Node节点,然后对该节点通过synchronized加锁,由于只对一个Node节点加锁,因此该操作并不影响其他Node节点的操作,因此极大的提高了ConcurrentHashMap的并发效率。加锁之后便是将这个Node节点所在的链表转换为TreeBin结构的红黑树。

然后,在table中删除元素时,如果元素所在的红黑树节点个数小于6,则会触发红黑树向链表结构转换。具体实现如下:

   /**
     * Returns a list on non-TreeNodes replacing those in given list.
     */
    static <K,V> Node<K,V> untreeify(Node<K,V> b) {
        Node<K,V> hd = null, tl = null;
        for (Node<K,V> q = b; q != null; q = q.next) {
            Node<K,V> p = new Node<K,V>(q.hash, q.key, q.val, null);
            if (tl == null)
                hd = p;
            else
                tl.next = p;
            tl = p;
        }
        return hd;
    }

4、ConcurrentHashMap的操作

4.1、get方法

通过get获取hash表中的值时,首先需要获取key值的hash值。而在JDK1.8的ConcurrentHashMap中通过spreed()方法获取。

static final int spread(int h) {
        return (h ^ (h >>> 16)) & HASH_BITS;
    }

speed()方法将key的hash值进行再hash,让hash值的高位也参与hash运算,从而减少哈希冲突。然后再查询对应的value值。

   /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     * /
     public V get(Object key) {
            Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
            int h = spread(key.hashCode());
            if ((tab = table) != null && (n = tab.length) > 0 &&
                (e = tabAt(tab, (n - 1) & h)) != null) {
                if ((eh = e.hash) == h) {
                    if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                        return e.val;
                }
                else if (eh < 0)
                    return (p = e.find(h, key)) != null ? p.val : null;
                while ((e = e.next) != null) {
                    if (e.hash == h &&
                        ((ek = e.key) == key || (ek != null && key.equals(ek))))
                        return e.val;
                }
            }
            return null;
        }

查询时,首先通过tabAt()方法找到key对应的Node链表或红黑树,然后遍历该结构便可以获取key对应的value值。其中,tabAt()方法主要通过Unsafe类的getObjectVolatile()方法获取value值,通过volatile读获取value值,可以保证value值的可见性,从而保证其是当前最新的值。

4.2、put方法

JDK1.8的ConcurrentHashMap的put操作实现主要定义在putVal(K key, V value, boolean onlyIfAbsent)中。

 public V put(K key, V value) {
     return putVal(key, value, false);
 }
    
    /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }
put操作大致可分为以下几个步骤:

计算key的hash值,即调用speed()方法计算hash值;获取hash值对应的Node节点位置,此时通过一个循环实现。有以下几种情况:

  • 如果table表为空,则首先进行初始化操作,初始化之后再次进入循环获取Node节点的位置。
  • 如果table不为空,但没有找到key对应的Node节点,则直接调用casTabAt()方法插入一个新节点,此时不用加锁。
  • 如果table不为空,且key对应的Node节点也不为空,但Node头结点的hash值为MOVED(-1),则表示需要扩容,此时调用helpTransfer()方法进行扩容。
  • 其他情况下,则直接向Node中插入一个新Node节点,此时需要对这个Node链表或红黑树通过synchronized加锁。
  • 插入元素后,判断对应的Node结构是否需要改变结构,如果需要则调用treeifyBin()方法将Node链表升级为红黑树结构;
  • 最后,调用addCount()方法记录table中元素的数量。

4.3、size方法

JDK1.8 of ConcurrentHashMap saved element of the record number of methods are also different, first of all when you add and delete elements, baseCount update property values ConcurrentHashMap to count the number of elements will be operated by CAS. But the CAS operation may fail, therefore, ConcurrentHashMap also defined a CounterCell array to record the number of elements at the CAS operation failed. Thus, the number of elements is obtained by ConcurrentHashMap follows:
Number of elements = baseCount + sum (CounterCell)

  final long sumCount() {
        CounterCell[] as = counterCells; CounterCell a;
        long sum = baseCount;
        if (as != null) {
            for (int i = 0; i < as.length; ++i) {
                if ((a = as[i]) != null)
                    sum += a.value;
            }
        }
        return sum;
    }

And JDK1.8 provides two ways to get the number of elements in ConcurrentHashMap:

public long mappingCount() {
     long n = sumCount();
     return (n < 0L) ? 0L : n; // ignore transient negative values
}
    
public final int size()        {
	 return map.size(); 
 }

As shown in the code, size can obtain the number of elements in the int range ConcurrentHashMap; and if the hash table data is too large, than the maximum int type, is recommended mappingCount () method to get the number of elements thereof.

5, JDK1.8 achieve significant change compared to the 1.7

  • Cancel the segment array, save data directly with the table, lock granularity smaller, reducing the probability of concurrent conflict.
  • When the data were stored in the form of a red-black tree + list, list pure form time complexity is O (n), compared to red-black tree O (logn), great performance. When the list turn red-black tree? When the list of elements equal to the key value of the number of elements formed over eight times.
  • The main critical variables and data structures: Node class that contains the actual value and the key value.
  • sizeCtl: · Negative: indicates initialization or expansion, -1 for initializing, -N, N-1 expressed threads ongoing expansion · Positive: 0 indicates not yet been initialized,> 0 the number, is initialized or once the threshold of expansion
  • TreeNode used in red-black tree, the tree represent nodes, TreeBin table is actually placed in the array represents the root of the red-black tree.
  • Expansion of operations
  • transfer () method of the actual expansion operations, table size is doubled form, a mechanism for concurrent expansion.
  • The size method: Probably the estimated number, the number is not accurate
  • Consistency: Weak consistent

Guess you like

Origin blog.csdn.net/m0_37661458/article/details/90700314