The underlying principle of HashMap: data structure + put() process + 2 to the nth power + infinite loop + data coverage problem

navigation:

 

[Java notes + stepping pit summary] Java basics + advanced + JavaWeb + SSM + SpringBoot + St. Regis takeaway + SpringCloud + dark horse tourism + Guli mall + Xuecheng online + MySQL advanced articles + design mode + common interview questions + source code_vincewm Blog-CSDN Blog

Table of contents

1. The bottom layer

1.1 HashMap data structure

1.2 Expansion mechanism

1.3 put() process

1.4 How does HashMap calculate key?

1.5 Why is the capacity of HashMap 2 to the nth power?

1.5.1 Reasons

1.5.2 Expansion uniform hash demonstration: expansion from 2^4 to 2^5

2. Thread safety issues

2.1 Is HashMap thread-safe? 

2.2 Thread-safe solutions

2.3 Infinite loop problem during JDK7 expansion

2.3.1 Demonstration of the infinite loop problem 

2.3.2 How does JDK8 solve the infinite loop problem?

2.4 Data coverage problem during JDK8 put

2.5 modCount non-atomic self-increment problem


1. The bottom layer

1.1 HashMap data structure

In JDK1.7 and earlier versions, the bottom layer of HashMap is "array + one-way linked list".

In JDK8, the bottom layer of HashMap is implemented by "array + one-way linked list + red-black tree". The main purpose of using red-black tree is to improve query performance. The array is used for hash lookup, the linked list is used as the chain address method to handle conflicts, and the red-black tree replaces the linked list with a length of 8.

1.2 Expansion mechanism

In HashMap, the default initial capacity of the array is 16, and this capacity will be expanded with an exponent of 2. Specifically, HashMap will expand when the elements in the array reach a certain ratio. This ratio is called the load factor, and the default is 0.75.

The automatic expansion mechanism is to ensure that HashMap does not need to occupy too much memory at the beginning, and can ensure enough space in real time during use. The use of an exponent of 2 for expansion is to use bit operations to improve the efficiency of expansion operations.

Each element of the array stores the address of the head node of the linked list, and the link address method handles conflicts. If the length of the linked list reaches 8, the red-black tree replaces the linked list.

1.3 put() process

During the execution of the put() method, there are mainly four steps:

  1. Calculate the key access position, and the operation hash&(2^n-1), which is actually the remainder of the hash value, and the bit operation efficiency is higher.
  2. Judging the array, if the array is found to be empty, expand the capacity to the initial capacity of 16 for the first time.
  3. Determine the head node of the array access position. If the head node is found to be empty, create a new linked list node and store it in the array.
  4. Judging the head node of the array access position, if the head node is found to be non-empty, then depending on the situation, overwrite or insert the element into the linked list (JDK7 head insertion method, JDK8 tail insertion method), red-black tree.
  5. After inserting an element, judge the number of elements, and expand the capacity again with an index of 2 if it is found to exceed the threshold.

Among them, the third step can be subdivided into the following three small steps:

1. If the key of the element is the same as the key of the head node, the head node will be overwritten directly.

2. If the element is a tree node, append the element to the tree.

3. If the element is a linked list node, append the element to the linked list. After appending, it is necessary to judge the length of the linked list to decide whether to convert it into a red-black tree. If the length of the linked list reaches 8 and the capacity of the array does not reach 64, expand the capacity. If the length of the linked list reaches 8 and the capacity of the array reaches 64, it will be converted into a red-black tree.

Hash table handles conflicts: open address method (linear detection, secondary detection, re-hashing method), chain address method

1.4 How does HashMap calculate key?

key=value&(2^n-1) #结果相当于value%(2^n),使用位运算只要是为了提高计算速度。

For example, the current array capacity is 16, and we want to access 18, then we can use 18&15==2. Equivalent to 18%16==2.

In put(), calculate part of the source code of the key:

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        // 此处省略了代码
        // i = (n - 1) & hash]
        if ((p = tab[i = (n - 1) & hash]) == null)
            
            tab[i] = newNode(hash, key, value, null);
        
 
        else {
            // 省略了代码
        }
}

1.5 Why is the capacity of HashMap 2 to the nth power?

1.5.1 Reasons

Calculate the Hash operation of the value corresponding to the key:

key=value&(2^n-1)#结果相当于value%(2^n)。例如18&15和18%16值是相等的

The binary numbers of 2^n-1 and 2^(n+1)-1 are the same except for the first digit and the last few digits. In this way, the added elements can be evenly distributed on each position of the HashMap, preventing hash collisions.

For example, the binary value of 15 (that is, 2^4-1) is 1111, the binary value of 31 is 11111, the binary value of 63 is 111111, and the binary value of 127 is 1111111.

1.5.2 Expansion uniform hash demonstration: expansion from 2^4 to 2^5

0&(2^4-1)=0;0&(2^5-1)=0

16&(2^4-1)=0; 16&(2^5-1)=16. Therefore, after expansion, the position of some values ​​whose key is 0 remains unchanged, and some values ​​are migrated to the new positions after expansion.

1&(2^4-1)=1;1&(2^5-1)=1

17&(2^4-1)=1; 17&(2^5-1)=17. Therefore, after expansion, the position of some values ​​whose key is 1 remains unchanged, and some values ​​are migrated to the new positions after expansion.

Demonstrate expansion with remainder:

If you feel that the AND operation is a bit difficult to understand, we can use the remainder to demonstrate:

Assuming expansion from 16 to 32: 1%16=1, 17%16=1; 1%32=1, 17%32=17.

The original keys of 1 and 17 are both 1. After expansion, the key of 1 is still 1, and the key of 17 becomes 17. In this way, the value whose original key is 1 is evenly hashed in the expanded hash table (some values ​​remain unchanged, and some values ​​move to the new position after expansion).

2. Thread safety issues

2.1 Is HashMap thread-safe? 

HashMap is not thread-safe, and there may be infinite loop problems and data coverage problems in a multi-threaded environment.

Under multi-threading, it is recommended to use the Collections tool class and the ConcurrentHashMap of the JUC package.

2.2 Thread-safe solutions

  • Use Hashtable (old api is not recommended)
  • Use the Collections tool class to wrap HashMap into a thread-safe HashMap.
    Collections.synchronizedMap(map);
  • Use the safer ConcurrentHashMap (recommended). ConcurrentHashMap locks the slot (head node of the linked list) to ensure thread safety with less performance.
  • After using synchronized or Lock to lock the HashMap, the operation is equivalent to multi-threaded queue execution (it is more troublesome and not recommended).

2.3 Infinite loop problem during JDK7 expansion

2.3.1 Demonstration of the infinite loop problem 

Single-threaded expansion process:

In JDK7, the HashMap chain address method adopts the head insertion method when dealing with conflicts, and the head insertion method is still used when expanding the capacity, so the order of nodes in the linked list will be reversed.

If there are two threads T1 and T2 expanding a linked list at the same time, they both mark the head node and the second node. At this time, T2 is blocked. After T1 executes the expansion, the order of the linked list nodes is reversed. Flipping will generate a circular linked list, that is, B.next=A; A.next=B, thus an infinite loop.

2.3.2 How does JDK8 solve the infinite loop problem?

The JDK8 tail insertion method solves the infinite loop problem.

In JDK8, HashMap adopts the tail insertion method, and the position of the linked list nodes will not be flipped during expansion, which solves the problem of an infinite loop of expansion, but the performance is a little worse, because the linked list needs to be traversed to find the tail. 

For example, A->B->C needs to be migrated. When migrating, first move the head node A, then move B and insert it into the tail of A, and then move C to insert the tail, so the result is still A-->B-->C. The order has not changed, and the expansion thread

2.4 Data coverage problem during JDK8 put

HashMap is not thread-safe. If the data inserted by two concurrent threads are equal after the hash remainder, data overwriting may occur.

Thread A is blocked when it finds the null position of the linked list and is ready to insert, and then thread B finds the null position and inserts it successfully. With the recovery of thread A, because it has judged null, it directly overwrites and inserts this position, and overwrites the data inserted by thread B.

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)     // 如果没有 hash 碰撞,则直接插入
            tab[i] = newNode(hash, key, value, null);
    }

2.5 modCount non-atomic self-increment problem

modCount: member variable of HashMap, used to record the number of times HashMap has been modified

put will execute the modCount++ operation, which is divided into read, add, and save. It is not an atomic operation, and there will also be thread safety issues. 

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
//put会执行modCount++操作,这步操作分为读取、增加、保存,不是一个原子性操作,也会出现线程安全问题。 
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Guess you like

Origin blog.csdn.net/qq_40991313/article/details/131620721