HashMap insert data execution process analysis

overview 

The underlying implementation of HashMap in JDK1.7 is an array + linked list. There is a problem with the HashMap in JDK1.7: expansion of the internal array in a multi-threaded state may lead to an infinite loop. Therefore, HashMap is re-implemented in JDK1.8, and its underlying implementation is array + linked list + red-black tree. Improved performance of HashMap.

insert image description here

Interpretation of HashMap source code


internal attribute
static final int DEFAULT_INITIAL_CAPACITY

The default initialization capacity, the default value is 16, and the required value must be a power of 2. As for why it must be a power of 2, simply speaking, it is to improve the hashability of the key of the map when it is put into the array, which will be analyzed below.

static final int MAXIMUM_CAPACITY

The default value is 1<<30, the maximum capacity of the array.

static final float DEFAULT_LOAD_FACTOR

The default load factor, the default value is 0.75, the array will decide whether to expand according to the load factor * current capacity.

static final int TREEIFY_THRESHOLD

Tree threshold, the default value is 8. When the length of the linked list in the array exceeds the current value, it will be considered to be converted into a red-black tree structure to improve efficiency. The current threshold is one of the conditions to decide whether to convert to a red-black tree.

static final int UNTREEIFY_THRESHOLD

Anti-tree threshold, the default is 6, when the red-black tree nodes are less than the current value, it will convert from red-black tree to linked list structure.

static final int MIN_TREEIFY_CAPACITY

The minimum tree capacity threshold, the default value is 64. When the capacity of the array exceeds the current value, it will consider converting the linked list in the array into a red-black tree structure, and determine whether it needs to be converted into a red-black tree together with TREEIFY_THRESHOLD.

transient Node<K,V>[] table

Array used to store data in HashMap

Execution process of stored data

Storage data source code analysis

The map needs to call the put method to store data, and the following will start from the put method

public V put(K key, V value) {
      //调用putVal向数组中添加数据
    return putVal(hash(key), key, value, false, true);
}

The put method will call the putVal method inside, the following is the execution flow of putVal

/**
	* hash: 待存入数据的key值的hash值
	* key: 待存入数据的key值
	* value: 待存入数据的value值
	* onlyIfAbsent: 如果key值相等,是否覆盖原有值
	* evict: 数组处于创建状态标识
	* V:返回值为当前key值的旧值,如果不存在就返回null
	**/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
  			//因为table属性为懒加载,所以首次调用put方法时将会调用resize方法创建table属性
        if ((tab = table) == null || (n = tab.length) == 0)
          //初始化table并得到当前数组长度
            n = (tab = resize()).length;
  			/**
  				*	p=tab[i=(n-1)&hash]用于获取数组中key将要放置的节点位置
	  			*	通过`(n-1)&hash`得到其下标,这也是为什么数组容量要求为2的幂次方的原因所在(当n为2的幂次方数时,n-1的二进制值得所有低位都为1,这样在和hash执行&运算时,使结果只根hash的值相关,提高了结果的分散性)。为了保证数组容量为2的幂次方数,内部也对用户传入的数组容量做了处理,见tableSizeFor方法。
  				*		
  				* 如果当前位置的节点为null,说明当前位置未曾放置对象,可直接创建节点对象,并放入数组
  				**/
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
          	//当前位置已经存在节点:
            Node<K,V> e; K k;
         	//根据当前key值计算出的位置存在节点且节点key和当前key相等,则把e变量指向已存在节点的引用,在最后修改并返回旧值
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
          //当前位置已存的节点如果是树节点,执行向树中添加节点
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
              //如果当前位置为链表,就遍历链表结构
                for (int binCount = 0; ; ++binCount) {
                  //遍历链表,直到找到链表最后一个元素,将当前待存放的数据存入尾结点的next属性
                  //这里和JDK1.7中不同的是,JDK1.7中向链表中新增元素是头插法,而JDK1.8这里则是尾插法
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                      //链表中新增数据后,判断链表长度是否超过`TREEIFY_THRESHOLD`,调用`treeifyBin`内部判断是否转换为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
          //e不为空说明存在key值相同的元素,返回旧值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
  			//根据整个map中键值对的个数,判断当前数组是否需要扩容
  			//size>threshold则需要扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }


The following is the process of array expansion in HashMap:

//内部数组扩容方法
final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
  			//获取扩容前数组长度
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
  			//扩容前的扩容阈值
        int oldThr = threshold;
        int newCap, newThr = 0;
  		//如果旧数组容量大于0,说明数组已经初始化过了
        if (oldCap > 0) {
          	//如果扩容前的数组容量大于最大容量,已经没有扩容的余地了,返回旧表
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
          //如果扩容后容量依然在最大限制范围内,并且扩容前容量大于默认初始化容量,设置新的阈值为旧阈值的2倍
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
  			//数组容量为0,但是阈值存在,则将阈值设置为扩容目标大小
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
  			//按照默认大小初始化数组
        else {               // zero initial threshold signifies using default
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
  			//按照新的数组容量大小,创建新的数组newTab
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
  			//如果扩容前数组为null,说明是第一次懒加载初始化数组,直接返回创建好的newTab即可。
  			//否则则为数组扩容
        if (oldTab != null) {
          	//遍历旧表中每一个元素
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
              	//e指向扩容前table中各个元素的引用
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                  	//e.next==null说明当前位置链表只有一个元素,那么则根据e.hash&(newCap-1)重新计算在新表中的位置,并将e放到新表中
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                  //如果e是树节点,执行树节点拆分
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                  //e节点是长度大于1的链表
                    else { 
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                      //使用do{}while循环遍历链表中的每一个元素,根据(e.hash&oldCap)是否为0将链表拆分为2个链表
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                      //将loHead为头结点的链表放在新表中和扩容前下标相同的位置
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                      //将hiHead为头结点的链表放在[新表中和扩容前下标+扩容前数组长度]的位置
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }


Another interesting point is that in order to ensure that the array capacity is a power of 2 inside HashMap, the array capacity passed in from the outside will be processed in the tableSizeFor method, so that no matter how much the value passed from the outside is, a value of 2 can always be obtained. number of powers.

Method idea: One of the main points used is any power of 2, its binary value has only one binary bit as 1, and the rest of the binary bits are 0.

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}


Summary
Through the study of the HashMap source code, we can know:

HashMap will automatically expand after exceeding the range, and expansion will lead to performance degradation. Therefore, in normal development, try to avoid array expansion as much as possible. You can specify the capacity in creating HashMap.
There is no synchronization measure in the added data of HashMap, so there will be thread safety issues in a multi-threaded environment, and ConcurrentHashMap can be used in a multi-threaded environment.
The default load factor value is 0.75, which is an official empirical value obtained through experiments, and it is best not to move it.

Take jdk7 as an example:
  HashMap map = new HashMap();
After instantiation, the underlying layer creates a 16-bit array Entry[] table.
  ···May have performed multiple put operations···
map.put(key1, value1)
will first call the hashCode() method of the class where key1 belongs to calculate the hash value of key1, and then calculate the value of key1 through a certain algorithm The storage location in the Entry array.
If there is no data stored in this location, (key1-value1) is added successfully.
If there is data at this location, compare the hash value of key1 with the stored data (one or more data, where multiple data are stored in the form of a linked list)
  If the hash value of key1 is the hash of the stored data If the key values ​​are not the same, then (key1-value1) is added successfully.
  If the hash value of key1 is the same as the hash value of a stored data (key2-value2), call the equals(key2) method of the class where key1 belongs to. If equals returns false, then (key1-value1) is added successfully
    .
    If equals returns true, replace value2 with value1.

In the process of continuous addition: it will involve the problem of capacity expansion. When the critical value is exceeded and the location to be stored is not empty, by default, the capacity will be doubled and the original data will be copied.

In jdk8:
  new HashMap(): The underlying layer does not create an array with a length of 16.
  The underlying array of jdk8 is: Node[], not the Entry array.
  When the put() method is called for the first time, the bottom layer creates an array with a length of 16.
  The underlying structure of jdk7: array + linked list.
  The underlying structure of jdk8: array + linked list + red-black tree.
  When the number of elements at an index position of the array in the form of a linked list is greater than 8 and the length of the current array is greater than 64, at this time all data at this index position is changed from an array + linked list to an array + red-black tree storage,

When the number of elements at an index position of the array in the form of red-black tree data is less than 6, all data at this index position is changed from array + red-black tree to array + linked list storage.
 

Guess you like

Origin blog.csdn.net/xiaowang_lj/article/details/129313010