6. HashMap source code analysis

1. Data structure
Insert picture description here
As shown in the figure above, the underlying data structure of HashMap is mainly array + linked list + red-black tree. When the length of the linked list is greater than or equal to 8, the linked list will be converted into a red-black tree, and when the size of the red-black tree is less than or equal to 6, the red-black tree will be converted into a linked list. On the left side of the figure is the array structure of HashMap. The elements of the array can be a single Node, or a linked list or a red-black tree. For example, the position of the array index 2 is a linked list, and the position of the index 9 corresponds to The ones are red and black trees.
Source code

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    
    
	//初始容量为16
	static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
	
	//最大容量
	static final int MAXIMUM_CAPACITY = 1 << 30;
	
	//负载因子默认值
	static final float DEFAULT_LOAD_FACTOR = 0.75f;
	
	//桶上的链表长度大于等于8时,链表转化成红黑树
	static final int TREEIFY_THRESHOLD = 8;
	
	//桶上的红黑树大小小于等于6时,红黑树转化成链表
	static final int UNTREEIFY_THRESHOLD = 6;
	
	//当数组容量大于64时,链表才会转化成红黑树
	static final int MIN_TREEIFY_CAPACITY = 64;
	
	//记录迭代过程中HashMap结构是否发生变化,如果有变化,迭代时会fail-fast
	transient int modCount;
	
	//HashMap的实际大小,可能不准(因为当拿到这个值的时候,可能该值又发生了变化)
	transient int size;
	
	//存放数据的数组
	transient Node<K,V>[] table;
	
	/**
     * 扩容的门槛,有两种情况
     * 如果初始化时,给定数组大小的话,通过tableSizeFor方法计算,数组大小永远接近于2的幂次方,例如给定初始化大小19,实际上初始化大小为32,为2的5次方
     * 如果是通过resize方法进行扩容,大小 = 数组容量 * 0.75
     */
	int threshold;
	
	//链表的节点
	static class Node<K,V> implements Map.Entry<K,V> {
    
    
	
	//红黑树的节点
	static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    
    ......}
}

Source code analysis

  1. HashMap allows null values, unlike HashTable, which is thread-unsafe.
  2. The default value of the load factor (load factor) is 0.75, which is the value calculated by balancing the time and space loss. A higher value will reduce the space overhead (the expansion is reduced, the array size grows slower), but it increases the search cost (hash The conflict increases, the length of the linked list becomes longer), and the condition for no expansion is that the array capacity> the required array size / load factor.
  3. If the HashMap needs to store a lot of data, it is recommended that the initial capacity of the HashMap be set large enough at the beginning to prevent the performance impact caused by the expansion.
  4. HashMap is non-thread-safe. It can be locked externally or through the synchronizedMap method of Collections to achieve thread-safety. The implementation of synchronizedMap adds a synchronized lock to each method.
  5. In the iterative process, if the structure of HashMap is modified, it will fail quickly.

2. Newly
added The source code of newly added key and value is shown below.
Source code

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    
    
	//入参hash为通过hash算法计算出来的值
	//入参onlyIfAbsent为boolean类型,默认为false,表示即使key已经存在了,仍然会用新值覆盖原来的值,
	final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    
    
	    //n表示数组的长度,i为数组索引下标,p为i位置的节点
	    Node<K,V> [] tab; Node<K,V> p; 
	    int n, i;
	    //如果数组为空,使用resize方法初始化
	    if ((tab = table) == null || (n = tab.length) == 0)
	        n = (tab = resize()).length;
	    //如果当前索引位置为空,直接在当前索引位置上生成新的节点
	    if ((p = tab[i = (n - 1) & hash]) == null)
	        tab[i] = newNode(hash, key, value, null);
	    //如果当前索引位置有值,即常说的hash冲突
	    else {
    
    
	        //e为当前节点的临时变量
	        Node<K,V> e; 
	        K k;
	        //如果key的hash和值都相等,直接把当前下标位置的Node值赋值给临时变量
	        if (p.hash == hash &&
	            ((k = p.key) == key || (key != null && key.equals(k))))
	            e = p;
	        //如果是红黑树,使用红黑树的方式新增
	        else if (p instanceof TreeNode)
	            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
	        //如果是链表,把新节点放到链表的尾端
	        else {
    
    
	            //自旋
	            for (int binCount = 0; ; ++binCount) {
    
    
	                //e = p.next表示从头开始,遍历链表
	                //p.next == null表明p是链表的尾节点
	                if ((e = p.next) == null) {
    
    
	                    //把新节点放到链表的尾部 
	                    p.next = newNode(hash, key, value, null);
	                    //当链表的长度大于等于8时,链表转红黑树
	                    if (binCount >= TREEIFY_THRESHOLD - 1)
	                        treeifyBin(tab, hash);
	                    break;
	                }
	                //链表遍历过程中,发现有元素和新增的元素相等,结束循环
	                if (e.hash == hash &&
	                    ((k = e.key) == key || (key != null && key.equals(k))))
	                    break;
	                //更改循环的当前元素,使p在遍历过程中一直往后移动
	                p = e;
	            }
	        }
	        //e != null说明新节点的新增位置已经找到
	        if (e != null) {
    
    
	            V oldValue = e.value;
	            //当onlyIfAbsent为false时,才会覆盖原值 
	            if (!onlyIfAbsent || oldValue == null)
	                e.value = value;
	            afterNodeAccess(e);
	            //返回老值
	            return oldValue;
	        }
	    }
	    ++modCount;
	    //如果HashMap的实际大小大于扩容的门槛,开始扩容
	    if (++size > threshold)
	        resize();
	    afterNodeInsertion(evict);
	    return null;
	}
}

Source code analysis

  1. First determine whether the empty array is initialized, if not, initialize it.
  2. If the value can be found directly through the hash of the key, jump to 6, otherwise jump to 3.
  3. If the hash conflicts, there are two solutions, linked list or red-black tree.
  4. If it is a linked list, loop recursively and append new elements to the end of the queue.
  5. If it is a red-black tree, call the new method of the red-black tree.
  6. After successfully appending the new element through 2, 4, and 5, judge whether it needs to be overwritten according to onlyIfAbsent.
  7. Determine whether you need to expand the capacity. If you need to expand the capacity, expand the capacity, or end the adding process.

3. New addition of the
linked list The addition of the linked list is relatively simple, appending the current node to the end of the linked list is the same as the implementation of the addition of LinkedList. When the length of the linked list is greater than or equal to 8, the linked list at this time will be converted into a red-black tree. The conversion method is treeifyBin. This method has a judgment. When the length of the linked list is greater than or equal to 8, and the size of the entire array is greater than 64, it will be converted. It becomes a red-black tree. When the size of the array is less than 6, it will only trigger expansion and will not be converted into a red-black tree.
During the interview, I usually ask why it is 8, because the time complexity of the linked list query is O(n), and the query complexity of the red-black tree is O(log(n)). When there is not much data in the linked list, it is faster to use the linked list to traverse. Only when there is more data in the linked list will it be transformed into a red-black tree. However, the red-black tree occupies twice the space of the linked list. Considering the conversion time and space loss, it is necessary to define the boundary value of the conversion. When designing this boundary value, refer to the Poisson distribution probability function, and the hit probability of each length of the linked list is shown below.

* 0:    0.60653066
* 1:    0.30326533
* 2:    0.07581633
* 3:    0.01263606
* 4:    0.00157952
* 5:    0.00015795
* 6:    0.00001316
* 7:    0.00000094
* 8:    0.00000006

When the length of the linked list is 8, the probability of occurrence is 0.00000006, which is less than one in ten million. Therefore, under normal circumstances, the length of the linked list cannot reach 8, and once it reaches 8, there must be a problem with the hash algorithm. , So in this case, in order to make HashMap still have higher query performance, so let the linked list into a red-black tree. When using HashMap, you will almost never encounter a situation where the linked list is transformed into a red-black tree. After all, the probability is only one in ten million.

4. New
red-black tree The source code of the red-black tree new node process is shown below.
Source code

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    
    
	//入参h为key的hash值
	final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,int h, K k, V v) {
    
    
	    Class<?> kc = null;
	    boolean searched = false;
	    //找到根节点
	    TreeNode<K,V> root = (parent != null) ? root() : this;
	    //自旋
	    for (TreeNode<K,V> p = root;;) {
    
    
	        int dir, ph; K pk;
	        //p的hash值大于h,说明p在h的右边
	        if ((ph = p.hash) > h)
	            dir = -1;
	        //p的hash值小于 h,说明p在h的左边
	        else if (ph < h)
	            dir = 1;
	        //判断要放进去的key在当前树中是否已存在(通过equals来判断)
	        else if ((pk = p.key) == k || (k != null && k.equals(pk)))
	            return p;
	        //自定义的Comparable不能用hashcode比较,需要用compareTo
	        else if ((kc == null &&
	                  //得到key的Class类型,如果key没有实现Comparable就是null
	                  (kc = comparableClassFor(k)) == null) ||
	                  //当前节点pk和入参k不等
	                 (dir = compareComparables(kc, k, pk)) == 0) {
    
    
	            if (!searched) {
    
    
	                TreeNode<K,V> q, ch;
	                searched = true;
	                if (((ch = p.left) != null &&
	                     (q = ch.find(h, k, kc)) != null) ||
	                    ((ch = p.right) != null &&
	                     (q = ch.find(h, k, kc)) != null))
	                    return q;
	            }
	            dir = tieBreakOrder(k, pk);
	        }
	 
	        TreeNode<K,V> xp = p;
	        //找到和当前hashcode值相近的节点(当前节点的左右子节点其中一个为空即可)
	        if ((p = (dir <= 0) ? p.left : p.right) == null) {
    
    
	            Node<K,V> xpn = xp.next;
	            //生成新的节点
	            TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
	            //把新节点放在当前子节点为空的位置上
	            if (dir <= 0)
	                xp.left = x;
	            else
	                xp.right = x;
	            //当前节点和新节点建立父子、前后关系
	            xp.next = x;
	            x.parent = x.prev = xp;
	            if (xpn != null)
	                ((TreeNode<K,V>)xpn).prev = x;
				/**
				 * balanceInsertion对红黑树进行着色或旋转,以达到更高的查找效率,着色或旋转的几种场景如下
				 * 着色:新节点总是为红色。如果新节点的父亲是黑色,则不需要重新着色,如果父亲是红色,那么必须通过重新着色或者旋转的方法,再次达到红黑树的5个约束条件
				 * 旋转:父亲是红色,叔叔是黑色时,进行旋转
				 * 如果当前节点是父亲的右节点,则进行左旋
				 * 如果当前节点是父亲的左节点,则进行右旋
				 */
	            //moveRootToFront方法是把算出来的root放到根节点上
	            moveRootToFront(tab, balanceInsertion(root, x));
	            return null;
	        }
	    }
	}	
}

Source code analysis

  1. Firstly, judge whether the new node already exists in the red-black tree. There are two ways to judge. If the node does not implement the Comparable interface, use equals to judge. If the node implements the Comparable interface, use compareTo to judge.
  2. If the newly added node is already on the red-black tree, return directly. If it is not, judge whether the newly added node is on the left or right of the current node, the left value is small, and the right value is large.
  3. Spin recursive steps 1 and 2, until the left or right node of the current node is empty, stop spinning, and the current node is the parent node of the new node.
  4. Put the new node in the empty position on the left or right of the current node, and establish the parent-child node relationship for the current node.
  5. Perform coloring and rotation, and the process ends.

5. Searching for
HashMap is mainly divided into two steps. The first step is to locate the index position of the array according to the hash algorithm. Equals determines whether the current node is the key to be searched. If it is, return directly, otherwise continue to the second step. The second step is to determine whether the current node has a next node, and if so, whether it is a linked list type or a red-black tree type. Follow the different types of search methods of linked list and red-black tree respectively. The key source code of the linked list search is as follows.

//采用自旋方式从链表中查找key
do {
    
    
    //如果当前节点hash等于key的hash,并且equals相等,当前节点即是要找的节点
    //当hash冲突时,通过equals方法来比较key是否相等
    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
    //否则,把当前节点的下一个节点拿出来继续寻找
} while ((e = e.next) != null);

Guess you like

Origin blog.csdn.net/Jgx1214/article/details/109096631