JAVASE collection (underlying HashMap implementation)

Simple API and some underlying concepts

Regarding some basic APIs, it was recorded in my last note that the basic APIs are relatively easy to use, but the easy things are often not the point, so I won't do too much documentation here.

The structure of HashMap

It is the same as everyone’s understanding. A dynamic container for storing key-value pairs. Key stores "keys" and cannot be stored repeatedly. Repeated storage of the same key is similar to an update operation. Value is "value", and both are Objects. Type, and what is the bottom layer, as shown in the figure is the structure of jdk8.
Insert picture description herePrior to this, jdk7 only has an array + linked list to complete the underlying data structure, namely

jdk7:------数组+链表
jdk8:------数组+链表/红黑树(二叉树)

This structure diagram is ignorant for many of our students who are new to SE, because we don't talk about the array structure first, why can Entry objects be stored in a linked list, a binary tree?
The so-called dynamic of HashMap is reflected by the array. When the length of the array reaches a certain length, that is , the mechanism stored in the put method , the array will be expanded. And we all know that the HashMap storage object is based on the hashCode of the object key, and different The HashCode of the object may be the same , so how to store it? Map has done a process that when the storage object comes in, it now rewrites the hashCode of the key object by bitwise and left shift operations , and then performs bitwise and left shift operations on the length of the current array to return an array of Subscripts , so the object can be evenly distributed to each subscript to the greatest extent , and there will always be objects with the same subscript. At this time, the Entry object is referenced and linked to form a linked list, and the length of the linked list is too long It may cause efficiency problems, so when the chain expression reaches a certain length, tree operation is performed.

Source code tracking

Basic parameters

几个常量和变量:
(1) DEFAULT inItIAL_ CAPACITY: 默认的初始容量16
(2) MAXIMUM_ CAPACITY: 最大容量1 << 30
(3) DEFAULT LOAD_ FACTOR:默认加载因子0.75
(4) TREEIFY_ THRESHOLD:默认树化阈值8,当链表的长度达到这个值后,要考虑树化
(5) UNTREEIFY_ THRESHOLD:默认反树化阈值6,当树中的结点的个数达到这个阈值后,要考虑变为链表
(6) MIN_ TREEIFY CAPACITY:最小树化容量64
当单个的链表的结点个数达到8,并且table的长度达到64,才会树化。
当单个的链表的结点个数达到8,但是table的长度未达到64,会先扩容
(7) Node<K,V>[] table: 数组
(8) size:记录有效映射关系的对数,也是Entry对象的个数
(9) threshold:阈值,当size达到阈值时,考虑扩容
(10) loadFactor:加载因子,影响扩容的频率

initialization

new HashMap()

public HashMap() {
    
    
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    
}

At this time threshold, it tableis null and size0

put add

  • put(key,value)
    public V put(K key, V value) {
    
    
        return putVal(hash(key), key, value, false, true);
    }

Click here to find the source code and call the putVal() and hash() methods, continue to follow up

  • hash(Object key

Purpose: Interfere with the value of hashCode to make the final hash storage as far as possible.
If the key value is null, the hash value is returned ---- 0
If it is not null, the hash value of the key and the high 16 bits of its hash value (int Type 32 bits, right shift by 16 bits is to take the value of the high 16 bits) for XOR,
that is , XOR the high 16 bits and low 16 bits of the hashCode to reduce the probability of repetition

   static final int hash(Object key) {
    
    
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
  • final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) The
    Insert picture description here first parameter: the hash value of the key The
    second parameter: the key object The
    third parameter: the value object The
    fourth parameter: if true Keep the original value, otherwise replace the
    fifth parameter: If false, the table is in creation mode.
    There are many parameters defined, here combined with the following code comments
Node<K,V>[] tab;  注释:建立临时数组后面赋值成HashMap的数组
 Node<K,V> p;     注释:新建一个结点,作为交换变量
 int n, i;        注释:n为tab的长度,i为tab的下标

Continue to look down and see the following

if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;

Here assign the temporary array tab to the array of HashMap, and assign n to the length of the array. If the HashMap is empty, we call the resize method, and we continue to follow up

  • resize()
 if (oldCap > 0) {
    
    
            if (oldCap >= MAXIMUM_CAPACITY) {
    
    
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {
    
                   // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
    
    
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;

Take a snippet here. Excluding the unsatisfied conditions, the rest of the code
is used to initialize the parameters, length and threshold of the HashMap array, and return an array with a length of 16 to be assigned to the array,
and then return to puValue to continue, there is a judgment:

if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);

This means that if the array is empty, create a key-value mapping node of Node type.
Continue to look at the source code, as follows

 if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;

Judgment, the added node is the same as the first node of the array, use the e node to save the first node of the tab

 else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);

Continue to determine whether the first element of the tab is a tree node, and then process the tree node separately.
Continue down execution is not the first element, no repeated judgments

//table[i]的第一个结点不是树结点,也与新的映射关系的key不重复
for (int binCount = 0; ; ++binCount) {
    
    
	//如果p的下一个结点是空的,说明当前的p是最后一个结点
	if ((e = p.next) == null) {
    
    
		//把新的结点连接到table [i]的最后
		p.next = newNode (hash, key,value,nu1l) ;
		if (binCount >= TREEIFY TRESHOLD - 1) {
    
     // -1 for 1st
			treeifyBin(tab,hash);//树化操作
		}
	break;
	}
	//如果key重复了,就跳出for循环
	if (e.hash == hash &&((k = e.key) == key川(key != null && key.equals(k)))) 
	break;
	//继续往下循环判断
	P =e;
}

The loop is to save the node p to the tab array, and in the process it will also determine whether it is necessary to tree the linked list. Then jump out of the loop for the next judgment

	//如果这个e不是null,说明有key重复,就考虑替换原来的value
	if (e != null) {
    
     / / existing mapping for key
		V oldValue = e. value ;
	if (!onlyI fAbsent| | oldValue == null) {
    
    
		e. value = value ;
	}
	afterNodeAccess (e) ;
	return oldValue ;

Here exits from the loop, if e is not empty, it means the end of the cycle BREAK, illustrate the key is repeated, the operation proceeds to update the value
immediately

		 ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;

The number of elements in the array is increased by one and compared with the threshold. If the threshold is reached, the array is resized (the expansion is doubled), and the threshold is also updated. Because the expansion is twice, the threshold will also increase double

  • to sum up:

(1) If it is added for the first time, initialize the table to a length of 16 array, threshold = 12
(2) If it is not the first addition
① will consider whether the key is repeated, then replace the value
② If table[i ] The following is not a tree, count the number of nodes of table[i], and it will reach 7 before adding, consider treeization

  • When the number of nodes of a single linked list reaches 7 before being added, and the length of the table reaches 64, it will be treed.
  • When the number of nodes of a single linked list reaches 7 before being added, and the length of the table does not reach 64, expand the capacity first.

③The tree below table[i] is processed separately, and the new mapping relationship is directly connected to the leaf node of the tree.
④After adding, the size reaches the threshold, and the capacity needs to be expanded.
Once the capacity is expanded, the position of all the mapping relations will be adjusted.

Debug under unit test

The following is a step-by-step code debugging and tracking of the put method

Unit test example

I am here in the unit test, add 15 pairs of mapping relations to the map, the key is the type of Integer (0-14), and the value can be Insert picture description hereexplained here, why the hashCode of Integer is its own value, we open the source code of Integer to view
* * ctrl+shift+T**Enter Integer to open Integer under Java.lang to open the source code, and find
two overloaded methods of hashCode under Outline
Insert picture description here

Start debugging

Double-click to add a breakpoint to the put method, and then select the unit test method, right-click to perform debugs as to
see the first initialization, there is nothing in the map,
Insert picture description herethen click Execute Next to enter the loop, click here twice, and execute the loop twice
Insert picture description here
When it is executed 12 times, the threshold is theoretically reached, and the size of the map should double. After the operation, the figure shows
Insert picture description here
that the size of the map has doubled.

Guess you like

Origin blog.csdn.net/BlackBtuWhite/article/details/107715159