The expansion mechanism of HashMap---resize()

Although there is this paragraph in the principle of hashmap, it is also excellent for rehash or resize() to be taken out alone.

When to expand: When adding elements to the container, the number of elements in the current container will be judged. If it is greater than or equal to the threshold, that is, when the length of the current array is multiplied by the value of the load factor, it will automatically expand.

Resize is to recalculate the capacity and continuously add elements to the HashMap object. When the array inside the HashMap object cannot hold more elements, the object needs to expand the length of the array so that it can load more elements. Of course , the array in Java cannot be automatically expanded. The method is to use a new array to replace the existing array with a small capacity, just like we use a small bucket of water. If we want to hold more water, we have to change to a larger bucket. .

Let's analyze the source code of resize. In view of the fact that JDK1.8 is integrated into the red-black tree, it is more complicated. In order to facilitate understanding, we still use the code of JDK1.7, which is easier to understand. There is little difference in essence. The specific difference will be discussed later.

void resize(int newCapacity) { //pass in the new capacity
Entry[] oldTable = table; //Refer to the Entry array before expansion
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) { //If the size of the array before expansion has reached the maximum (2^30)
threshold = Integer.MAX_VALUE; //Modify the threshold to the maximum value of int (2^31-1), so that it will not be expanded in the future
return;
}
Entry[] newTable = new Entry[newCapacity]; //Initialize a new Entry array
transfer(newTable); //! ! Move the data to the new Entry array
table = newTable; //The table property of HashMap refers to the new Entry array
threshold = (int) (newCapacity * loadFactor);//Modify the threshold
}

Here is to use an array with a larger capacity to replace the existing array with a small capacity. The transfer() method copies the elements of the original Entry array to the new Entry array.

void transfer(Entry[] newTable) {
Entry[] src = table; //src references the old Entry array
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) { // loop through the old Entry array
Entry<K, V> e = src[j]; //Get each element of the old Entry array
if (e != null) {
src[j] = null;//Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects)
do {
Entry<K, V> next = e.next;
int i = indexFor(e.hash, newCapacity); //! ! Recalculate the position of each element in the array
e.next = newTable[i]; //mark[1]
newTable[i] = e; //Put the element on the array
e = next; //Access the element on the next Entry chain
} while (e != null);
}
}
}

static int indexFor(int h, int length) {
return h & (length - 1);
}

The middle part of the article: Fourth, the storage implementation; explains in detail why h & (length-1) is required in the indexFor method

The reference of newTable[i] is assigned to e.next, that is, the head insertion method of the singly linked list is used, and the new element at the same position will always be placed at the head position of the linked list; in this way, the element placed on an index will end up It will be placed at the end of the Entry chain ( if a hash conflict occurs ), which is different from Jdk1.8, which will be explained in detail below. Elements in the same Entry chain in the old array may be placed in different positions in the new array after recalculating the index position.

The following is an example to illustrate the expansion process.

This sentence is the key point----hash(){return key % table.length;} method, which is to translate the following line of explanation:

Suppose that our hash algorithm is simply to use the key mod to look at the size of the table (that is, the length of the array).

The size of the hash bucket array table is 2, so the keys are 3, 7, and 5, and the put order is 5, 7, and 3. After mod 2, all conflicts are in table[1] here. Here, it is assumed that the load factor loadFactor=1, that is, when the actual size of the key-value pair is larger than the actual size of the table, the expansion is performed. The next three steps are to resize the hash bucket array to 4, and then rehash all Nodes.

jdk1.7 expansion example

Below we explain what optimizations have been made in JDK1.8. After observation, it can be found that we are using the expansion of the power of 2 (meaning that the length is expanded by 2 times the original), so,

After rehash, the position of the element is either in the original position, or it is moved to the position of the power of 2 in the original position . The corresponding is the resize comment below.

[java] view plain copy

/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {

You can understand the meaning of this sentence by looking at the figure below. n is the length of the table. Figure (a) shows an example of determining the index position of two keys, key1 and key2, before expansion. Figure (b) shows two keys, key1 and key2, after expansion. An example of determining the index position, where hash1 is the result of the hash and high-order operation corresponding to key1.

hashMap 1.8 Hash algorithm example Figure 1

After the element recalculates the hash, because n is doubled, the mask range of n-1 is 1 bit more in the high position (red), so the new index will change like this:

hashMap 1.8 Hash algorithm example Figure 2

Therefore, when we expand the HashMap, we do not need to recalculate the hash like the implementation of JDK1.7. We only need to see whether the new bit of the original hash value is 1 or 0. If it is 0, the index does not change. If it is 1, the index becomes "original index + oldCap". You can see the resize diagram of 16 expanded to 32 in the following figure:

jdk1.8 hashMap expansion example

This design is really ingenious, it saves the time to recalculate the hash value, and at the same time, since the new 1bit is 0 or 1 can be considered random, so the resize process evenly distributes the previous conflicting nodes. to the new bucket. This piece is the new optimization point of JDK1.8. There is a little difference. When rehash in JDK1.7, when the old linked list migrates to the new linked list, if the array index position of the new list is the same, the linked list elements will be inverted, but as can be seen from the above figure, JDK1.8 will not upside down. Interested students can study the resize source code of JDK1.8, which is very well written, as follows:

 1 final Node<K,V>[] resize() {
 2     Node<K,V>[] oldTab = table;
 3     int oldCap = (oldTab == null) ? 0 : oldTab.length;
 4     int oldThr = threshold;
 5     int newCap, newThr = 0;
 6     if (oldCap > 0) {
 7         // 超过最大值就不再扩充了，就只好随你碰撞去吧
 8         if (oldCap >= MAXIMUM_CAPACITY) {
 9             threshold = Integer.MAX_VALUE;
10             return oldTab;
11         }
12         // 没超过最大值，就扩充为原来的2倍
13         else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
14                  oldCap >= DEFAULT_INITIAL_CAPACITY)
15             newThr = oldThr << 1; // double threshold
16     }
17     else if (oldThr > 0) // initial capacity was placed in threshold
18         newCap = oldThr;
19     else {               // zero initial threshold signifies using defaults
20         newCap = DEFAULT_INITIAL_CAPACITY;
21         newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
22     }
23     // 计算新的resize上限
24     if (newThr == 0) {
25 
26         float ft = (float)newCap * loadFactor;
27         newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
28                   (int)ft : Integer.MAX_VALUE);
29     }
30     threshold = newThr;
31     @SuppressWarnings({"rawtypes"，"unchecked"})
32         Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
33     table = newTab;
34     if (oldTab != null) {
35         // 把每个bucket都移动到新的buckets中
36         for (int j = 0; j < oldCap; ++j) {
37             Node<K,V> e;
38             if ((e = oldTab[j]) != null) {
39                 oldTab[j] = null;
40                 if (e.next == null)
41                     newTab[e.hash & (newCap - 1)] = e;
42                 else if (e instanceof TreeNode)
43                     ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
44                 else { // 链表优化重hash的代码块
45                     Node<K,V> loHead = null, loTail = null;
46                     Node<K,V> hiHead = null, hiTail = null;
47                     Node<K,V> next;
48                     do {
49                         next = e.next;
50                         // 原索引
51                         if ((e.hash & oldCap) == 0) {
52                             if (loTail == null)
53                                 loHead = e;
54                             else
55                                 loTail.next = e;
56                             loTail = e;
57                         }
58                         // 原索引+oldCap
59                         else {
60                             if (hiTail == null)
61                                 hiHead = e;
62                             else
63                                 hiTail.next = e;
64                             hiTail = e;
65                         }
66                     } while ((e = next) != null);
67                     // 原索引放到bucket里
68                     if (loTail != null) {
69                         loTail.next = null;
70                         newTab[j] = loHead;
71                     }
72                     // 原索引+oldCap放到bucket里
73                     if (hiTail != null) {
74                         hiTail.next = null;
75                         newTab[j + oldCap] = hiHead;
76                     }
77                 }
78             }
79         }
80     }
81     return newTab;
82 }

The expansion mechanism of HashMap---resize()

Although there is this paragraph in the principle of hashmap, it is also excellent for rehash or resize() to be taken out alone.

Guess you like