Why is the initial size of HashMap 16, why is the loading factor size 0.75, and what are the characteristics of the selection of these two values?

First look at the definition of HashMap:

public class HashMap<K,V>extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable

HashMap is a subclass of AbstractMap that implements the Map interface.

HashMap()

Constructs an emptyHashMap with the default initial capacity (16) and the default load factor (0.75).

HashMap(int initialCapacity)

Constructs an emptyHashMap with the specified initial capacity and the default load factor (0.75).

HashMap(int initialCapacity, float loadFactor)

Constructs an emptyHashMap with the specified initial capacity and load factor.

HashMap(Map<? extends K,? extends V> m)

Constructs a newHashMap with the same mappings as the specifiedMap.

A total of 4 construction methods are given.

1. HashMap() takes no parameters, the default initialization size is 16, and the load factor is 0.75;

2.HashMap (int initialCapacity) specifies the initial size;

3. HashMap (int initialCapacity, float loadFactor) specifies the initialization size and load factor size;

4.HashMap(Map<? extends K,? extends V> m) Construct HashMap with an existing map.

First analyze the initialization code

public HashMap(int initialCapacity,float loadFactor){  
   if(initialCapacity<0) //初始化大小小于0，抛出异常  
       throw new IllegalArgumentException("Illegal initial capacity: "  
       +initialCapacity);  
   if(initialCapacity>MAXIMUM_CAPACITY)//初始大小最大为默认最大值  
       initialCapacity=MAXIMUM_CAPACITY;  
   if(loadFactor <=0|| Float.isNaN(loadFactor)) //加载因子要在0到1之间  
       throw new IllegalArgumentException("Illegal load factor: "  
       +loadFactor);  
   this.loadFactor =loadFactor;  
   this.threshold=tableSizeFor(initialCapacity);  
   //threshold是根据当前的初始化大小和加载因子算出来的边界大小，  
   //当桶中的键值对超过这个大小就进行扩容  
}

threshold=initialCapacity*loadFactor, if the key-value pair in the bucket exceeds this limit, the capacity of the bucket will be increased.

if ((tab = table) == null || (n = tab.length) == 0)  
   n = (tab = resize()).length;

This is an expansion when elements are put into the bucket for the first time (put method), because the size of the bucket is not specified during initialization

After each call to the put method, verification will be performed to determine whether expansion is required.

if (++size > threshold)  
           resize();

Looking at the expanded code

Node<K,V>[] oldTab=table;    
int oldCap=(oldTab==null)?0:oldTab.length;// 第一次扩容时旧的桶大小为0  
int oldThr=threshold; //桶中键值对数量的边界大小。  
int newCap,newThr=0; //新的桶大小和边界大小。  
if(oldCap>0){ //这是后来扩容时进入的，第一次指定桶大小时不会进入这  
   if(oldCap>= MAXIMUM_CAPACITY){//最大只能这么大了  
       threshold=Integer.MAX_VALUE;  
       return oldTab;  
   }  
   else if((newCap=oldCap<<1)<MAXIMUM_CAPACITY&&  
           oldCap>=DEFALT_INITAL_CAPACITY)  
       newThr=oldThr<<1;   //如果旧的桶大小扩大一倍还没有超过最大值，  
                            //就把旧的桶大小和新的边界都乘2  
}  
else if(oldThr>0)    
   newCap=oldThr; //第一次扩容时会进入这里初始化桶的大小  
if(newThr==0){  //根据初始化桶数组大小和加载因子已经是否越界来设置新边界  
   float ft=(float)newCap*loadFactor;  
   newThr=(newCap<MAXIMUM_CAPACITY && ft<(float)MAXIMUM_CAPACITY?  
           (int)ft:Integer.MAX_VALUE);  
}

Let's discuss a question below.

Why is the default initialization bucket array size 16, why the loading factor size is 0.75, and what are the characteristics of the selection of these two values.

By looking at the above code, we can know the size of the threshold that these two values mainly affect. The value of this value is the boundary size of whether the current bucket array needs to be expanded or not.

We all know that if the bucket array is expanded, it will apply for memory space, and then copy the elements in the original bucket into the new bucket array, which is a time-consuming process. In this case, why not set these two values larger, the threshold is the product of two numbers, and it is not so easy to expand the capacity if it is set larger.

The reason is this, if the bucket initialization bucket array is set too large, it will waste memory space, 16 is a compromise size, it will not expand with a few elements like 1, 2, and 3, nor will it be like a few elements. Tens of thousands can only use a little space and cause a lot of waste.

The loading factor is set to 0.75 instead of 1, because the setting is too large, the probability of collision of key-value pairs in the bucket will be greater, and several value values may be stored in the same bucket location, which will increase the search time and reduce performance. , it is not appropriate to set it too small. If it is 0.1, then there are 10 buckets, and the threshold is 1. If you put two key-value pairs, you will need to expand the capacity, which is a waste of space.