How to perfectly answer interview questions-HashMap expansion mechanism (JDK1.7)

Hello everyone, I am Gu Yan. Today I want to talk about a question that almost every interviewer will raise during the interview- what is the expansion mechanism of HashMap ? Since I am also a novice who is just learning programming, this blog post will refer to multiple blog posts and conclude it at the end.

 This blog post only introduces the HashMap expansion mechanism before JDK1.8. Since JDK1.8 introduces the concept of red-black tree in HashMap, it is beyond the scope of this article, so I will not describe it here.

HashMap expansion mechanism

What is resize?

Resize : Recalculate the capacity, add elements to the HashMap object continuously, and when the array inside the HashMap object cannot load more elements, the object needs to expand the length of the array so that more elements can be loaded . Of course, the array in Java cannot be automatically expanded. The method is to use a new array to replace the existing array with a small capacity, just like we use a small bucket to store water. If we want to store more water, we have to change the big bucket. .

When to expand?

 When adding elements to the container, the number of elements in the current container will be judged. If the number of elements in the current container is greater than or equal to the threshold (threshold), that is, when the number of elements in the current container is greater than the length of the current array multiplied by the load factor, it will automatically Expanded.

The process of expansion!

 The following describes the expansion process of HashMap with source code + pictures + text description.

/** 
 * HashMap 添加节点 
 * 
 * @param hash        当前key生成的hashcode 
 * @param key         要添加到 HashMap 的key 
 * @param value       要添加到 HashMap 的value 
 * @param bucketIndex 桶,也就是这个要添加 HashMap 里的这个数据对应到数组的位置下标 
 */  
void addEntry(int hash, K key, V value, int bucketIndex) {
    
      
    //数组扩容条件:1.已经存在的key-value mappings的个数大于等于阈值  
    //             2.底层数组的bucketIndex坐标处不等于null  
    if ((size >= threshold) && (null != table[bucketIndex])) {
    
      
        resize(2 * table.length);//扩容之后,数组长度变了  
        hash = (null != key) ? hash(key) : 0;//为什么要再次计算一下hash值呢?  
        bucketIndex = indexFor(hash, table.length);//扩容之后,数组长度变了,在数组的下标跟数组长度有关,得重算。  
    }  
    createEntry(hash, key, value, bucketIndex);  
}  
  
/** 
 * 这地方就是链表出现的地方,有2种情况 
 * 1,原来的桶bucketIndex处是没值的,那么就不会有链表出来啦 
 * 2,原来这地方有值,那么根据Entry的构造函数,把新传进来的key-value mapping放在数组上,原来的就挂在这个新来的next属性上了 
 */  
void createEntry(int hash, K key, V value, int bucketIndex) {
    
      
    HashMap.Entry<K, V> e = table[bucketIndex];  
    table[bucketIndex] = new HashMap.Entry<>(hash, key, value, e);  
    size++;  
}

 In the above addEntry method, if size (the number of elements in the current container) is greater than or equal to the threshold (array length multiplied by the load factor), and the bucketIndex coordinate of the underlying array is not equal to null, then resize will be performed . Otherwise, there will be no expansion.

 The following will focus on the expansion process:

        void resize(int newCapacity) {
    
       //传入新的容量
            Entry[] oldTable = table;    //引用扩容前的Entry数组
            int oldCapacity = oldTable.length;
            if (oldCapacity == MAXIMUM_CAPACITY) {
    
      //扩容前的数组大小如果已经达到最大(2^30)了
                threshold = Integer.MAX_VALUE; //修改阈值为int的最大值(2^31-1),这样以后就不会扩容了
                return;
            }
     
            Entry[] newTable = new Entry[newCapacity];  //初始化一个新的Entry数组
            transfer(newTable);                         //!!将数据转移到新的Entry数组里
            table = newTable;                           //HashMap的table属性引用新的Entry数组
            threshold = (int) (newCapacity * loadFactor);//修改阈值
        }

 Before expansion, first obtain the reference address of the array before expansion and store it in the oldTable variable, and then determine whether the length of the array before expansion has reached the maximum value stored in the int type, if so, give up the expansion, because the array capacity has reached the maximum and cannot be expanded. Up.

 The following figure shows the state after the program has executed Entry[] newTable = new Entry[newCapacity]; code:
Insert picture description here
 here is to use a larger capacity array to replace the existing smaller capacity array, the transfer () method will be the original Entry array The elements of is copied to the new Entry array.

        void transfer(Entry[] newTable) {
    
    
            Entry[] src = table;                   //src引用了旧的Entry数组
            int newCapacity = newTable.length;
            for (int j = 0; j < src.length; j++) {
    
     //遍历旧的Entry数组
                Entry<K, V> e = src[j];             //取得旧Entry数组的每个元素
                if (e != null) {
    
    
                    src[j] = null;//释放旧Entry数组的对象引用(for循环后,旧的Entry数组不再引用任何对象)
                    do {
    
    
                        Entry<K, V> next = e.next;
                        int i = indexFor(e.hash, newCapacity); //!!重新计算每个元素在数组中的位置
                        e.next = newTable[i]; //标记[1]
                        newTable[i] = e;      //将元素放在数组上
                        e = next;             //访问下一个Entry链上的元素
                    } while (e != null);
                }
            }
        }

        static int indexFor(int h, int length) {
    
    
            return h & (length - 1);
        }

 The reference of newTable[i] is assigned to e.next, that is , the head insertion method of a singly linked list is used . The new element at the same position will always be placed at the head of the linked list; so the element placed on an index first ends Will be placed at the end of the Entry chain (if there is a hash conflict). Elements on the same Entry chain in the old array may be placed in different positions in the new array after recalculating the index position.

 The following will demonstrate the transfer process in the form of a picture (the red font in the picture below indicates the difference from the above picture, the following pictures are all like this, the red font description will not be repeated)

 The following figure shows the state after the program is executed src[j] = null; (this is the state in the first loop):
Insert picture description here

 First, assign the reference address of the table[] array to the src[] array.

 Then, Entry<K, V> e = src[j]; is to transfer the linked list of src[j] to the e variable for storage. Since the linked list at src[j] has been handed over to e for storage, you can boldly set src[j]=null; and then wait for garbage collection.

 The following figure shows the state after the program has executed Entry<K, V> next = e.next; (this is the state in the first loop):
Insert picture description here

 Here, the value of e.next is backed up to the next variable, and the subsequent code will change the point of e.next, so the value of e.next is backed up here.

 The following figure shows the state after the program is executed e.next = newTable[i]; (this is the state in the first loop):
Insert picture description here

 Since the value of newTable[3] is null, e.next is null, as shown in the figure above.

 The following figure shows the state after the program has executed the newTable[i] = e; code (this is the state in the first cycle):
Insert picture description here

 The following figure shows the state after the program executes e = next; code (this is the state at the first loop):
Insert picture description here
 As shown above, the entry 1 node is successfully inserted into the newTable. At the end of the loop, it is judged that e! =null, so the above process will be repeated again until all nodes are moved to newTable.

summary

  • Expansion is a particularly performance-consuming operation, so when programmers use HashMap, estimate the size of the map and give a rough value during initialization to avoid frequent expansion of the map.
  • The load factor can be modified or greater than 1, but it is recommended not to modify it easily unless the situation is very special.
  • HashMap is not thread safe. Do not operate HashMap at the same time in a concurrent environment. It is recommended to use ConcurrentHashMap.
  • The introduction of red-black trees in JDK1.8 greatly optimizes the performance of HashMap.

Guess you like

Origin blog.csdn.net/Handsome_Le_le/article/details/108470271