Principle and analysis of HashMap in Java

The underlying data structure of HashMap

HashMap is a data structure that stores data in the form of Key-Value.
JDK1.7 uses an array + linked list, and uses the Entry class to store keys and values.
JDK1.8 uses an array + linked list/red-black tree, and uses the Node class to store keys and values.

Why is HashMap expansion always a power of 2?

HashMap expansion is mainly for array expansion, because the length of the array is immutable, while the length of the linked list is variable. In the source code of HashMap, you can see that HashMap chooses bit operations when expanding. When adding an element to the collection, the (n-1)&hash calculation method is used to determine the position of the element in the collection . Only when the data at the corresponding position are all 1, the operation result is also 1. When the capacity of HashMap is the n power of 2, n in (n-1) is the capacity, and the binary system of (n-1) is also It is in the form of 11111***1111, so that when performing bit operations with the hash value of the added element, the hash can be fully hashed, so that the added element is evenly distributed in each position of the HashMap, reducing hash collisions.

HashMap expansion infinite loop problem

HashMap in JDK1.7 uses the head insertion method to pull the linked list. The so-called head insertion method means that the newly inserted data will be inserted from the head node of the linked list.
Insert image description here

The normal expansion of HashMap is such a process. The nodes of the old HashMap will be transferred to the new HashMap in sequence. The order of the transferred linked list elements of the old HashMap is A, B, C, while the new HashMap uses the head insertion method to insert, so After the expansion is completed, the final order of linked list elements in the new HashMap is C, B, A.
Causes of infinite loop:
Insert image description here

Step 1: Thread startup. Thread T1 and thread T2 are both preparing to expand the HashMap. At this time, T1 and T2 are executed at the head node A of the linked list, and the next nodes of T1 and T2 are T1.next and T2 respectively. next, they all execute B node.
Insert image description here
Step 2: Start the expansion. At this time, assume that the time slice of thread T2 is used up and it enters the sleep state, and thread T1 starts to perform the expansion operation. It is not until the expansion of thread T1 is completed that thread T2 is awakened.
Insert image description here
After T1 completes the expansion,
Insert image description here
because HashMap expansion uses the head insertion method, after thread T1 is executed, the order of nodes in the linked list changes. But thread T2 is still agnostic to what happened, so the node reference it points to remains unchanged. As shown in the figure, T2 points to node A, and T2.next points to node B.
Insert image description here
When thread T1 completes execution and thread T2 resumes execution, an infinite loop occurs.
Insert image description here
Because after T1 completes the expansion, the next node of node B is A, and the first node pointed by the T2 thread is A, and the second node is B. This order is just opposite to the order of nodes before T1 expansion. After the execution of T1, the order is B to A, and the order of T2 is A to B. In this way, node A and node B form an infinite loop.
3. Solution
There are three common solutions to avoid infinite loops in HashMap:
1) Use thread-safe ConcurrentHashMap instead of HashMap. I personally recommend this solution.

2) Use the thread-safe container Hashtable instead, but its performance is low and is not recommended.

3) Using synchronized or Lock to lock before performing operations is equivalent to multi-thread queuing execution, which will also affect performance and is not recommended.

Conversion from linked list to red-black tree

In order to solve the infinite loop problem in JDK1.7, red-black trees were added in JDK1.8.
When the length of the linked list is greater than the threshold (default is 8), the treeifyBin() method will be called first.

for (int binCount = 0; ; ++binCount) {
    
    
    if ((e = p.next) == null) {
    
    
        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
        break;
    }
    if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        break;
    p = e;
}

This method will decide whether to convert to a red-black tree based on the HashMap array. Only when the array length is greater than or equal to 64, the red-black tree conversion operation will be performed to reduce search time. Otherwise, just execute the resize() method to expand the array.

    /**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
    
    
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
    
    
            TreeNode<K,V> hd = null, tl = null;
            do {
    
    
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
    
    
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

HashMap’s put method and get method

In the put method, when storing the kv key-value pair, we will first call a hash method, and then use this method to calculate the hash value of the Key, thereby obtaining a decimal number, using this number and the length of the array Subtract one to take the modulo, and you can get a result, which is the subscript of the array. Then we use this subscript to find the one-way linked list stored in the array, and then compare each Key in the linked list with the Key to be inserted. An equals() comparison, if it is equal, directly update the value, that is, overwrite. If it is not equal, put() the new kv value into the linked list. During the Put process, we act as a hash When the key-value pairs stored in the table exceed the array length times the load factor, the array will be expanded to twice the capacity. Also, when inserting the linked list, if the length of the linked list exceeds our default threshold of 8, the node The data structure will be automatically converted into a red-black tree structure.
In the get() method, the hash method is first called, and then the key is calculated, modulo this number and the length of the array minus one, which is the subscript of the array, and then we traverse the linked list element corresponding to this subscript. , then compare equals, and if the keys are the same, retrieve the element and return it to the user.

HashMap expansion mechanism

Whether it is JDK1.7 or JDK1.8, when the put method is executed, if the table is empty, the resize() method is executed to expand the capacity. The default length is 16.
Threshold: threshold = capacity * load factor
By default, the load factor of HashMap is 0.75. Therefore, the value of threshold is equal to 75% of capacity.

For example, if the capacity of HashMap is 16, then the threshold calculation result is:
threshold = 16 * 0.75 = 12
JDK1.7 expansion must meet two points at the same time:

JDK1.7 is expanded first and then added. Specifically, whether to expand the capacity of put requires two conditions:

  1. When storing new values, the number of existing elements must be greater than or equal to the threshold.
  2. When storing a new value, a hash collision occurs in the currently stored data (the array subscript position converted from the hash value calculated by the current key already has a value)

The expansion method is in the addEntry method

void addEntry(int hash, K key, V value, int bucketIndex) {
    
    
    //1、判断当前个数是否大于等于阈值
    //2、当前存放是否发生哈希碰撞
    //如果上面两个条件否发生,那么就扩容
    if ((size >= threshold) && (null != table[bucketIndex])) {
    
    
      //扩容,并且把原来数组中的元素重新放到新数组中
      resize(2 * table.length);
      hash = (null != key) ? hash(key) : 0;
      bucketIndex = indexFor(hash, table.length);
    }
 
    createEntry(hash, key, value, bucketIndex);
  }

The expansion uses the head insertion method
to adjust the table after expansion :
the table capacity is doubled, and all element subscripts need to be rewritten and calculated, newIndex=hash&(newLength-1)

Two expansion situations of JDK1.8 HashMap.

  1. When the actual number of maps is equal to the threshold capacity, the capacity will be doubled.
    Insert image description here

  2. When the linked list length of a bucket in the array in the map is greater than the tree threshold TREEIFY_THRESHOLD=8,
    and the number of map elements is less than the minimum tree capacity MIN_TREEIFY_CAPACITY=64, the capacity is expanded twice.
    Insert image description here

Otherwise, when the tree threshold is 8 and the number of map elements is greater than 64, the linked list is converted to a red-black tree.

Summary of expansion:
In jdk1.7, first check whether expansion is needed before inserting data.
When two conditions are met and new elements are stored, the number of existing elements must be greater than the threshold. When storing a new element, a hash collision occurs with an existing element (the element already exists at the position of the array subscript calculated by calculating the hash value of the new element key). If these two conditions are met, the capacity will be expanded. After expansion, the data will recalculate the index position based on the hash value, and then store the data in the corresponding location.
jdk1.8 inserts data first and then checks whether expansion is needed. When the actual number of maps is equal to the threshold, the linked list elements are greater than or equal to 8, and the array capacity is less than 64, double expansion will be performed. When the linked list elements are greater than or equal to 8 and the capacity of the array reaches 64, the linked list structure is converted into a red-black tree structure. When we delete elements, when the nodes in the red-black tree are less than or equal to 6, the red-black tree structure is converted into a linked list structure.

Why does HashMap obtain the subscript through bit operations (e.hash & oldCap) when expanding its capacity?

For example, assuming that the original length of the table is 16 (because the table length starts from 0 during the for loop in the source code, it is -1), and the length after expansion is 32, then how is the table subscript of a hash value calculated before and after expansion:
Insert image description here
hash Each binary bit of the value is represented by abcde. Then the last 4 bits of the bitwise AND result of the hash and the old and new tables are obviously the same. The only possible difference is the fifth bit, which is where the b of the hash value is located. bit, if the bit where b is located is 0, then the result of the bitwise AND of the new table is the same as the result of the old table. On the contrary, if the bit where b is located is 1, the result of the bitwise AND of the new table is better than the result of the old table. There are 10000 more (binary), and this binary is the length of the old table
. In other words, whether the new hash subscript of the hash value needs to be added to the length of the old table, just check whether the fifth bit of the hash value is 1. The operation method is bitwise AND between the hash value and 10000 (that is, the old table length), and the result can only be 10000 or 00000.
Therefore, e.hash & oldCap are used to calculate whether position b is 0 or 1. As long as the result is 0, the new hash subscript is equal to the original hash subscript. Otherwise, the new hash coordinates must be in the original hash subscript. Based on the column coordinates, the original table length is added.

Is HashMap thread-safe and why?

In JDK1.7, due to the expansion of HashMap by multi-threading, HashMap adopts the head insertion method, and newly inserted data will be inserted from the head node of the linked list. In this case, in the case of multi-threading, it is easy to cause an infinite loop problem.
In JDK1.8, due to multi-threading, Put operation is performed on HashMap. Assume that two threads A and B are both performing put operations, and the insertion subscripts calculated by the hash function are the same. When thread A is suspended due to exhaustion of the time slice, thread B inserts elements at the subscript after getting the time slice. , and then thread A obtains the time slice. Since the hash collision has been judged before, it will not judge again, but insert directly. This causes the data inserted by thread B to be overwritten by thread A, making the thread unsafe.

The difference between HashMap and HashTable

  1. HashTable is thread-safe, key and value cannot be null, otherwise a null pointer exception will be reported.
  2. HashMap is thread-unsafe, key and value can be null
  3. The initial capacity of HashMap is 16, and the capacity will be doubled directly.
  4. The initial capacity of HashTable is 11, and the capacity is doubled +1
  5. HashTable directly uses the hashCode of the object, and HashMap needs to be rewritten to calculate the hash value.
    Insert image description here

What are the methods to solve Hash collision?

Common solutions include:

  1. Develop address method: When a hash collision occurs, a certain algorithm is used to find a free position in the hash table and insert the element into that position.
  2. Linked list hash table: A linked list is stored at the element position of each hash table. When a hash collision occurs, the element is inserted into the corresponding linked list.
  3. Rehashing: If the hash value generated by one hash function collides, another hash function is used to calculate the hash value again.

Data structure of ConcurrentHashMap

In jdk1.7, the data structure of ConcurrentHashMap is implemented by Segments array + HashEntry array + linked list.
In jdk1.8, the same Node array + linked list + red-black tree structure as HashMap was chosen, and CAS + Synchronized was used to ensure concurrency security.

  1. How does ConcurrentHashMap ensure thread safety?
    ConcurrentHashMap uses segmented locks to achieve thread safety. It divides a large hash table into multiple small hash tables, and each small hash table has its own lock. In this way, different threads can access different small hash tables at the same time, thus avoiding the situation of multiple threads competing for a lock at the same time and improving concurrency performance.
  2. What is the expansion mechanism of ConcurrentHashMap?
    The expansion mechanism of CurrentHashMap is similar to HashMap. It will expand when the number of stored items reaches the threshold. During the expansion process, ConcurrentHashMap will copy the original small hash table to the new large hash table one by one. Thread safety can still be guaranteed in this process. After expansion, ConcurrentHashMap will continue to use segmentation locks to maintain new small hash tables.
  3. Does the get() method of ConcurrentHashMap need to be locked?
    The get() method of ConcurrentHashMap does not require locking because it is thread-safe. During concurrent access, ConcurrentHashMap uses mechanisms such as volatile and CAS to ensure data consistency and visibility.
  4. What is the difference between ConcurrentHashMap and Hashtable?
    ConcurrentHashMap and Hashtable are both thread-safe hash tables, but they are very different. ConcurrentHashMap uses segmented locks to improve concurrency performance, while Hashtable uses a global lock to ensure thread safety, so the concurrency performance is better than ConcurrentHashMap Much different. In addition, ConcurrentHashMap allows empty keys and values, while Hashtable does not. In addition, ConcurrentHashMap supports more operations, such as batch operations and atomic operations supported by ConcurrentHashMap, which Hashtable does not support.

Guess you like

Origin blog.csdn.net/ChenYiRan123456/article/details/131816807