Concurrent programming and high concurrency solutions: HashMap and ConcurrentHashMap

Transfer from: Mukenet actual combat · high concurrency exploration (10): HashMap and ConcurrentHashMap

HashMap only allows the key of one record to be null at most, and allows the value of multiple records to be null. HashMap is not thread safe. If you need to meet thread safety, you can use the synchronizedMap method of Collections to make HashMap thread-safe, or use ConcurrentHashMap.

HashMap

(1) Initialization method

The implementation of HashMap is:
JDK1.7 array + linked list
JDK1.8: array + linked list + red-black tree

Initial capacity: The number of buckets in the Hash table.
Load factor: A measure of how full the Hash table can be before it is automatically increased.

HashMap defines these two parameters in the class:

//初始容量,默认16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
//加载因子,默认0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;

When the number of entries in the Hash table exceeds the product of the load factor and the current capacity, resize() will be called to expand the capacity and double the capacity.

These two parameters can be set when initializing the HashMap: the initial capacity can be specified separately or set at the same time Initial capacity, load factor

(2) Addressing mode

For a newly inserted data or data to be read, HashMap calculates the hash value of the key according to certain rules, and modulates the length of the array as the index to be searched in the array. Because 在计算机中取模的代价远远高于位操作的代价, HashMap requires the length of the array to be 2 to the Nth power. At this time, it performs an AND operation on the hash value of the key to the n-1 power of 2, which is equivalent to a modulo operation. HashMap does not require the user to set an initial size of 2 to the Nth power, and it will determine a reasonable size that conforms to the Nth power of 2 through calculations (tableSizeFor method) to set it.

static final int tableSizeFor(int cap) {
    
    
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

The Hash algorithm is essentially three steps: taking the hashCode value of the key, high-order operation, and modulo operation.
Use h & (table.length -1) to get the storage position of the object, and the length of the bottom array of HashMap is always 2 to the nth power, which is an optimization of HashMap in terms of speed. When length is always 2 to the power of n, the h& (length-1) operation is equivalent to modulo length, which is h%length, but & is more efficient than %.

In the implementation of JDK1.8, the high-order operation algorithm is optimized, and it is realized by the high 16-bit exclusive or low 16-bit of hashCode(): (h = k.hashCode()) ^ (h >>> 16), mainly It is considered in terms of speed, efficiency, and quality. This can ensure that when the length of the array table is relatively small, it can also ensure that both the high and low bits are involved in the Hash calculation, and there will be no too much overhead.
Bit manipulation

(3) HashMap's thread unsafe reason one: infinite loop

The reason is that HashMap is prone to an infinite loop when performing resize() for expansion in a multi-threaded situation.
The idea of ​​capacity expansion is to create an array that is twice the original size to ensure that the new capacity is still 2 to the Nth power, so as to ensure that the above addressing methods are still applicable. After expansion, the original array is re-inserted into the new array. This process is called reHash.

Before reHash (safe)
ReHash under single thread
expansion under single thread : the initial capacity of our HashMap is 2, the load factor is 1, and 3 keys need to be stored in it, namely 5, 9, 11, and when the third element 11 is placed It involves expansion.

  • First create an array of twice the size, then reHash the elements in the original array into the new array, and insert 5 into the new array. There is no problem.
  • Insert 9 into the new array, after Hash calculation, insert it after 5.
  • Insert 11 through Hash into the array node with index 3.

For reHash under multi-threading,
ReHash under multithreading
we assume that two threads perform the put operation at the same time and trigger the reHash operation at the same time. The upper layer is thread 1 and the lower layer is thread 2.

  • Thread 1 completes the expansion at a certain moment, and prepares to point the next pointer of the element with key 5 to 9. It stops at this step because the time slice allocated by thread scheduling is used up.
  • Thread 2 executes the reHash operation at this moment and completes the entire operation of data migration.

Next, thread 1 is awakened to continue operation.
Thread 1 is awakened

  • Execute the remaining part of the previous round, when processing the element with key 5, put this key at the head of the linked list at index 1 of the array applied by our thread 1. The ideal state is (thread 1 array index 1) —> (Key=5) —> null
    Insert picture description here
  • Then process the element with Key 9 and insert the element with key 9 between (index 1) and (key=5), the ideal state: (thread 1 array index 1) —> (Key=9) —> (Key =5)—>null
    Insert picture description here
  • But after processing the element with key 9, it should be over. But because thread 2 has processed the elements with key=9 and key=5, the real situation is (thread 2 array index 1 —> (key=9 )—> (key=5)—> null)|(Thread 1 array index 1 —> (key=9)—> (key=5)—> null), then let thread 1 mistakenly think that key=9 Key=5 is from the original array that has not been migrated, and then key=5 is processed. Try to put key=5 before k=9, so there is a cycle between key=9 and key=5. Constantly being processed, exchange order.
  • Elements with key = 11 cannot be inserted into the new array. Once we get the value from the new array, there will be an infinite loop.

(4) HashMap's thread unsafe reason two: fail-fast

If other threads modify the map while using the iterator, ConcurrentModificationException will be thrown, which is called fail-fast.
Every time the HashMap is modified, the modCount field in the class will be changed, that is, the value of the modCount variable. Source code:

abstract class HashIterator {
    
    
        ...
        int expectedModCount;  // for fast-fail
        int index;             // current slot

        HashIterator() {
    
    
            expectedModCount = modCount;
            Node<K,V>[] t = table;
            current = next = null;
            index = 0;
            if (t != null && size > 0) {
    
     // advance to first entry
                do {
    
    } while (index < t.length && (next = t[index++]) == null);
            }
        }
        ...
}

In the process of each iteration, it will be judged whether modCount and expectedModCount are equal. If they are not equal, it means that someone has modified the HashMap. Source code:

final Node<K,V> nextNode() {
    
    
    Node<K,V>[] t;
    Node<K,V> e = next;
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    if (e == null)
        throw new NoSuchElementException();
    if ((next = (current = e).next) == null && (t = table) != null) {
    
    
        do {
    
    } while (index < t.length && (next = t[index++]) == null);
    }
    return e;
}

Solution: You can use the synchronizedMap method of Collections to construct a synchronized map, or directly use the thread-safe ConcurrentHashMap to ensure that there is no fail-fast strategy.

ConcurrentHashMap

ConcurrentHashMap principle analysis (1.7 and 1.8) source code analysis

java7

Java7

  • The underlying structure of ConcurrentHashMap in Java7 is still an array and a linked list. The biggest difference from HashMap and Hashtable is: put and get twice Hash reaches the specified HashEntry, the first time the hash reaches the segment, the second time it reaches the entry in the segment, and then Traversing the entry list
  • When we read a key, it first takes out the hash value of the 将Hash值的高sshift位与Segment的个数取模key and determines which segment the key belongs to. Then operate Segment like HashMap.
  • In order to ensure that different Hash values ​​are stored in different segments, ConcurrentHashMap also specifically optimizes the Hash value.
  • Segment inherits from ReetrantLock in JUC, so it is easy to lock the segment. That is 分段锁.
//源码
	//Segment的初始化容量是16;HashEntry最小的容量为2
	private static final int DEFAULT_CONCURRENCY_LEVEL = 16;
    private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException {
    
    
        // For serialization compatibility
        // Emulate segment calculation from previous version of this class
        int sshift = 0;
        int ssize = 1;
        while (ssize < DEFAULT_CONCURRENCY_LEVEL) {
    
    
            ++sshift;
            ssize <<= 1;//最大16位2进制
        }
        int segmentShift = 32 - sshift;
        int segmentMask = ssize - 1;
   		.........省略
    }

The put operation segment implements ReentrantLock, which also has a lock function. When the put operation is performed, the first key hash will be performed to locate the position of the segment. If the segment has not been initialized, it will be assigned through the CAS operation, and then Perform the second hash operation to find the location of the corresponding HashEntry. Here, the characteristics of the inherited lock will be used. When data is inserted into the specified HashEntry location (the end of the linked list), it will be tried by inheriting the tryLock() method of ReentrantLock To acquire the lock, if the acquisition is successful, insert it directly into the corresponding position, 如果已经有线程获取该Segment的锁,那当前线程会以自旋的方式and continue to call the tryLock() method to acquire the lock. After the specified number of times, it hangs and waits for wake-up (as asked by the US Mission interviewer, when multiple threads put together , How currentHashMap operates) The
size operation to
calculate the element size of ConcurrentHashMap is an interesting problem, because it is a concurrent operation, that is, when you calculate the size, he is still inserting data concurrently, which may cause the calculated size and There is a difference between your actual size (when you return size, insert multiple data), to solve this problem, JDK1.7 version uses two solutions

1. In the first scheme, he will try to calculate the size of ConcurrentHashMap multiple times without locking, up to three times, and compare the results of the previous two calculations. If the results are consistent, it is considered that there is no element currently added, and the calculated result is accurate.(3次获取比较值)

2. The second scheme is that if the first scheme is not met, he will add a lock to each segment, and then calculate the size of ConcurrentHashMap to return (Meituan interviewer's question, how to determine the size under multiple threads) [ 所有segment加锁]

java8

java8
Java8 ConcurrentHashMap structure is basically the same as Java8 HashMap, but it guarantees thread safety.

Java8 abandoned the segment lock scheme in ConcurrentHashMap in Java7 , and did not use Segments, instead using large arrays. At the same time, in order to improve the addressing under Hash collision, performance optimization has been made (a large number of CAS operations are used internally ).
Node: A data structure that stores key, value, and key hash value. Both value and next are modified with volatile to ensure the visibility of concurrency.

class Node<K,V> implements Map.Entry<K,V> {
    
    
    final int hash;
    final K key;
    volatile V val;
    volatile Node<K,V> next;
    //... 省略部分代码
}
  • When the length of the list exceeds a certain value (default 8), Java8 converts the linked list into a red-black tree. The time complexity of addressing is converted from O(n) to Olog(n).

Java7vsJava8

In fact, it can be seen that the data structure of ConcurrentHashMap of JDK1.8 version is close to HashMap. Relatively speaking, ConcurrentHashMap only adds synchronization operations to control concurrency. From JDK1.7 version ReentrantLock+Segment+HashEntry to synchronized in JDK1.8 version +CAS+HashEntry+Red-black tree.

1. Data structure : The data structure of the segment lock is cancelled and replaced by the structure of array + linked list + red-black tree.
2. Guarantee thread safety mechanism : JDK1.7 adopts the segment lock mechanism to achieve thread safety, in which segment inherits from ReentrantLock. JDK1.8 uses CAS+Synchronized to ensure thread safety.
3. The granularity of the lock : originally it was to lock the segment that needs to perform data operations, now it is adjusted to lock each array element [Node (HashEntry is called Node in 1.8)].
4. The linked list is converted into a red-black tree : the simplification of the hash algorithm for locating nodes will bring disadvantages, and the hash conflict will intensify. Therefore, when the number of linked list nodes is greater than 8, the linked list will be converted into a red-black tree for storage.
5. Query time complexity : From the original traversal linked list O(n), to traverse the red-black tree O(logN).

Comparison of HashMap and ConcurrentHashMap

  • HashMap is not thread safe, ConcurrentHashMap is thread safe
  • HashMap allows Key and Value to be empty, ConcurrentHashMap does not allow
  • HashMap is not allowed to be modified while traversing through iterators, but ConcurrentHashMap is allowed. And the update is visible

High Concurrency Programming Series: Implementation Principles of ConcurrentHashMap (JDK1.7 and JDK1.8) Alibaba
P8 Architect Talk: In-depth discussion on the underlying structure, principle, and expansion mechanism of
HashMap, why JDK1.8 uses synchronized instead of reentrant lock ReentrantLock

  • Because the granularity is reduced, synchronized is no worse than ReentrantLock in the relatively low-granularity locking method. In coarse-grained locking, ReentrantLock may control the boundaries of each low-granularity through Condition , which is more flexible, and in low-granularity, The advantage of Condition is gone
  • JVM support JVM development team has never given up on synchronization, and synchronization based on JVM has more room for optimization, using embedded keywords is more natural than using API
  • Reducing memory overhead Under a large number of data operations, for the memory pressure of the JVM, API-based ReentrantLock will spend more memory. Although it is not a bottleneck, it is also a basis for selection.

Guess you like

Origin blog.csdn.net/eluanshi12/article/details/86680623