In-depth understanding of the underlying implementation principles of hashmap

Table of contents

General introduction

Storage of HashMap elements

Add elements to hashmap

The expansion mechanism of HashMap

Thread Safety of HashMap

1. There is insecurity when adding and removing elements

2. There is insecurity in the expansion operation

3. Hash collision is insecure

4. Invisibility between threads leads to security issues


General introduction

HashMap is the most frequently used data structure for element mapping. It inherits from the AbstractList class and supports a key with a value of null and countless pieces of data with a value of null. HashMap is not thread-safe. 6 In a multi-threaded environment , we By using the synchronizedMap in Collections to make it thread-safe or directly make ConcurrentHashMap, with the update iteration of JDK, since jdk1.8, the underlying data structure of HashMap has developed into an array + linked list + red-black tree

Storage of HashMap elements

The bottom layer of HashMap uses Node<K,V> to store elements. Let's check its source code:

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;    //用来定位数组索引位置
        final K key;
        V value;
        Node<K,V> next;   //链表的下一个node

        Node(int hash, K key, V value, Node<K,V> next) { ... }
        public final K getKey(){ ... }
        public final V getValue() { ... }
        public final String toString() { ... }
        public final int hashCode() { ... }
        public final V setValue(V newValue) { ... }
        public final boolean equals(Object o) { ... }
}

In the source code of HashMap , a relatively important component is Node[]table, that is, the array of hash buckets. The elements are finally put into different subscript positions of the hash bucket according to the hash algorithm. With the continuous increase of elements, it is possible There are two different elements with the same subscript, which is a hash conflict. The method for solving hash conflicts in hash buckets is the chain address method. The so-called chain address method refers to the solution by using a linked list + array Hash conflict problem, as the elements in the hash bucket continue to increase, when the number of elements in a single linked list is greater than or equal to 8 and the total number of elements in the hash bucket >= 64, the linked list will be converted into a red-black tree

Add elements to hashmap

The logic of adding elements in HashMap is as follows:

1. Determine whether the array table[i] is empty, if it is empty, execute resize() to expand

2. Calculate the hash value according to the key value key to get the array index i that should be inserted. If table[i]==null, directly create a new node and add it to the array, and turn to 6; if the element corresponding to table[i] is not empty, turn 3

3. Determine whether the first element of table[i] is exactly the same as the element we inserted (judged according to hashcode() and equals()), if the same, directly overwrite the value, otherwise turn to 4

4. Determine whether table[i] is a treeNode, that is, whether table[i] is a red-black tree, if it is a red-black tree, directly insert the key-value pair in the red-black tree, otherwise turn to 5

5. Traversing table[i], judging whether the length of the linked list is greater than or equal to 8, if the conditions are met, directly convert the linked list into a red-black tree, otherwise, insert elements in the linked list, and if the same key is found to directly overwrite the value during traversing the linked list, that is Can

6. After successful expansion, judge whether the actual number of key-value pairs size exceeds the maximum capacity threshold. If it exceeds the maximum capacity threshold, directly expand the capacity

The source code in HashMap is as follows:

1 public V put(K key, V value) {  2 // hashCode() of key  3 return putVal(hash(key), key, value, false, true);  4 }  5   6 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,  7 boolean evict) {  8 Node<K,V>[] tab; Node<K,V> p; int n, i;  9 // Step ①: If the tab is empty Create 10 if ((tab = table) == null || (n = tab.length) == 0) 11 n = (tab = resize()).length; 12 // Step ②: Calculate index, and null Do processing  13 if ((p = tab[i = (n - 1) & hash]) == null)  14 tab[i] = newNode(hash, key, value, null); 15 else { 16 Node<K, V> e; K k; 17 // Step ③: The node key exists, directly overwrite the value
















18 if (p.hash == hash &&
19 ((k = p.key) == key || (key != null && key.equals(k))))
20 e = p;
21 // Step ④: Determine that the chain is a red-black tree
22 else if (p instanceof TreeNode)
23 e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
24 // Step ⑤: The The chain is a linked list
25 else { 26 for (int binCount = 0; ; ++binCount) { 27 if ((e = p.next) == null) { 28 p.next = newNode(hash, key,value,null) ;                         //The length of the linked list is greater than 8 and converted to a red-black tree for processing 29 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st   30 treeifyBin(tab, hash);






31                     break;
32                 }
                    // key已经存在直接覆盖value
33                 if (e.hash == hash &&
34                     ((k = e.key) == key || (key != null && key.equals(k))))                                          break;
36                 p = e;
37             }
38         }
39         
40         if (e != null) { // existing mapping for key
41             V oldValue = e.value;
42             if (!onlyIfAbsent || oldValue == null)
43                 e.value = value;
44             afterNodeAccess(e);
45             return oldValue;
46         }
47     }

48 ++modCount;
49 // Step ⑥: expand the capacity by
50 if (++size > threshold)
51 resize();
52 afterNodeInsertion(evict);
53 return null;
54 }

The expansion mechanism of HashMap

Compared with JDK 1.7, JDK1.8 adds a red-black tree on the basis of the original capacity expansion. The implementation process and conditions are as follows: When the capacity expansion condition is met, the capacity expansion operation is performed (the number of elements in the hash table reaches the threshold) , if it is found that the treeing condition is met during the expansion operation (the number of elements in a single linked list>=8&& the number of total elements stored in the hash bucket>=64), the linked list will be converted into a red-black tree.

The specific source code and analysis are as follows:

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;//超过了哈希表的最大容量不进行扩容,任由其碰撞
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)//数组容量扩大为两倍
                newThr = oldThr << 1; // 将临界值扩大为两倍
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;//初始化容量
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);//初始化扩容门槛
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            //新索引=原索引
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                //新的索引=原索引+oldCap
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        //原索引放入bucket
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        //新索引放入bucket
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

In addition, by looking at the source code of resize(), we can find that JDK1.8 also optimizes the index changes of elements after expansion: after observation, we can find that we use the expansion of the power of 2 (referring to the length Expand to the original 2 times), so the position of the element is either at the original position, or at the original position and then moved to the power of 2. You can understand the meaning of this sentence by looking at the figure below. n is the length of the table. Figure (a) shows an example of key1 and key2 before expansion to determine the index position. Figure (b) shows two keys, key1 and key2 after expansion. An example of determining the index position, where hash1 is the hash and high-order operation result corresponding to key1.

After recalculating the hash of the element, because n is doubled, the mask range of n-1 is 1 bit higher than the high bit (red), so the new index will change as follows:

Therefore, when we expand the HashMap, we don’t need to recalculate the hash like the implementation of JDK1.7. We just need to check whether the newly added bit of the original hash value is 1 or 0. If it is 0, the index has not changed. If it is 1, the index becomes "original index + oldCap". You can see the resize schematic diagram of 16 expanded to 32 in the figure below :

 This design can be said to be quite ingenious. It omits the process of recalculating the hash after the expansion of JDK1. Yes, so the nodes with previous hash conflicts can be effectively evenly distributed to the new bucket.

Thread Safety of HashMap

The thread insecurity of hashmap is mainly reflected in the following four aspects:

1. There is insecurity when adding and removing elements

In a concurrent environment, if multiple threads modify the HashMap at the same time, such as adding, deleting or modifying elements, it may cause conflicts in the data structure, which may eventually lead to data inconsistency or loss.

This is because multiple threads may try to modify data in the same bucket at the same time, and during the modification process, some data may be lost or data modified by other threads may be overwritten, resulting in data inconsistency.

2. There is insecurity in the expansion operation

When HashMap reaches a certain capacity, it needs to be expanded. During the expansion process, operations such as recalculating the hash value and reallocating the storage location are required. If there are multiple threads performing insertion or deletion operations at the same time during the expansion process, it may cause data structure confusion, or even infinite loops and other problems.

3. Hash collision is insecure

When inserting or deleting an array, Hash conflicts may occur, that is, different key-value pairs may be mapped to the same position in the array, which requires a linked list or a red-black tree to resolve the conflict.

However, when multiple threads perform insertion or deletion operations at the same time, the structure of the linked list or red-black tree may be destroyed, resulting in data loss or abnormality.

4. Invisibility between threads leads to security issues

Since HashMap is not thread-safe, multiple threads may access the same HashMap instance concurrently.

When a thread modifies the HashMap, other threads may not be able to see these modifications immediately, which may lead to data inconsistency.

In order to solve the thread safety problem of HashMap, you can use ConcurrentHashMap, which uses lock segmentation technology, CAS, and volatile variables to ensure thread safety and data visibility.

However, these techniques also bring some additional overhead, so in non-high concurrency scenarios, HashMap may be faster than ConcurrentHashMap.

Guess you like

Origin blog.csdn.net/m0_65431718/article/details/130823772