Principle analysis of HashMap based on Java (put, get, resize)

Before analyzing HashMap, first look at the following figure to understand the structure of HashMap

image

I drew a picture to briefly describe the structure of HashMap. The array + linked list form a HashMap. When we add a new key-value when we call the put method, the HashMap will pass the hash value of the key and the current node data. The length calculates an index value, and then creates a Node object by hash, key, value, and stores it in the Node[] array according to the index. When the calculated index already has a Node object. Just store the new value on Node[index].next, just like a->aa->a1 in the figure, this situation is called a hash conflict

Basic usage of HashMap

Map<String, Object> map = new HashMap<>();
map.put("student", "333");//正常入数组,i=5
map.put("goods", "222");//正常入数据,i=9
map.put("product", "222");//正常入数据,i=2
map.put("hello", "222");//正常入数据,i=11
map.put("what", "222");//正常入数据,i=3
map.put("fuck", "222");//正常入数据,i=7
map.put("a", "222");//正常入数据,i=1
map.put("b", "222");//哈希冲突,i=2,product.next
map.put("c", "222");//哈希冲突,i=3,what.next
map.put("d", "222");//正常入数据,i=4
map.put("e", "222");//哈希冲突,i=5,student.next
map.put("f", "222");//正常入数据,i=6
map.put("g", "222");//哈希冲突,i=7,fuck.next

First of all, we all create a Map object, and then use HashMap to realize it. put get Data storage can be realized by calling the  method. Let's start the analysis from the construction method.

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

The initial load factor is 0.75. The function of the load factor is to calculate an expansion threshold. When the number of containers reaches the threshold, HashMap will perform a resize to double the container size and recalculate the expansion threshold. Expansion threshold = number of containers * load factor, why is it 0.75? Don't ask me, check the information yourself (actually I don't know, I think this is not important~)

Continue to see the  put method

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

Uh, there's nothing to see, let's continue to look at the putValmethod

transient Node<K,V>[] table;

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //先判断当前容器内的哈希表是否是空的,如果table都是空的就会触发resize()扩容
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //通过 (n - 1) & hash 计算索引,稍后单独展开计算过程
    if ((p = tab[i = (n - 1) & hash]) == null)
        //如果算出来的索引上是空的数据,直接创建Node对象存储在tab下
        tab[i] = newNode(hash, key, value, null);
    else {
        //如果tab[i]不为空,说明之前已经存有值了
        Node<K,V> e; K k;
        //如果key相同,则需要先把旧的 Node 对象取出来存储在e上,下边会对e做替换value的操作
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //在这里解决hash冲突,判断当前 node[index].next 是否是空的,如果为空,就直接
            //创建新Node在next上,比如我贴的图上,a -> aa -> a1
            //大概逻辑就是a占了0索引,然后aa通过 (n - 1) & hash 得到的还是0索引
            //就会判断a的next节点,如果a的next节点不为空,就继续循环next节点。直到为空为止
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //如果当前这个链表上数量超过8个,会直接转化为红黑树,因为红黑树查找效率
                    //要比普通的单向链表速度快,性能好
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        //只有替换value的时候,e才不会空
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //在增加计数器
    ++modCount;
    //判断是否超过了负载,如果超过了会进行一次扩容操作
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

Although I added a comment, let me briefly talk about the logic of this
1. First determine whether the hash table exists, and when it does not exist, create it by resize
2. Then calculate whether the hash table is calculated by the index algorithm If the data exists, add node node storage if it does not exist, and then the method ends
3. If there is data on the target index, you need to use the equals method to determine the content of the key. If it is a hit, replace the value. The method ends
4. If the key is also Not the same, the index is the same, then it is a hash conflict. HashMap's strategy for solving hash conflicts is to traverse the linked list, find the last empty node, and store the value, just like my graph. The soul painter has
nothing to do with it, and it vividly expresses the data structure of HashMap. 5. The last step is to determine whether the expansion threshold is reached. After the capacity reaches the threshold, perform an expansion and expand according to the 2 times rule, because it must be followed The length of the hash table must be a power of 2 concept

Well, put reported a drop-off, we continue  get it

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

The get method, um, good, very simple. Hash the key, then get the node through getNode, and then return the value, eh. Get finished, haha. just kidding. We continue to look at getNodeit

 final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        //哈希表存在的情况下,根据hash获取链表的头,也就是first对象
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            //检测第一个first是的hash和key的内容是否匹配,匹配就直接返回
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            //链表的头部如果不是那就开始遍历整个链表,如果是红黑树节点,就用红黑树的方式遍历
            //整个链表的遍历就是通过比对hash和equals来实现
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

We are sorting it out. The getmethod is much simpler than put. The core logic is to take out the node on the index, and then match the hash and equals one by one until the node is found.
Then the get method is done

Let's take a look again resize. Is the expansion mechanism of HashMap

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    //检测旧容器,如果旧容器是空的,就代表不需要处理旧数据
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    //保存扩容阀值
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 对阀值进行扩容更新,左移1位代表一次2次幂
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    //如果哈希表是空的,这里会进行初始化扩容阀值,
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //处理旧数据,把旧数据挪到newTab内,newTab就是扩容后的新数组
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                //如果当前元素无链表,直接安置元素
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //红黑树处理
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    //对链表的索引重新计算,如果还是0,那说明索引没变化
                    //如果hash的第5位等于1的情况下,那说明 hash & n - 1 得出来的索引已经发生变化了,变化规则就是 j + oldCap,就是索引内向后偏移16个位置
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

resizeThe function of the method is to initialize the container and perform expansion operations on the container. The expansion rule is that
after the double expansion is completed, another important operation is to rearrange the elements on the linked list.

(e.hash & oldCap) == 0

Before I talk about this formula, let me make a foreshadowing

The binary of 16 is 0001 0000
The binary of 32 is 0010 0000
The binary of 64 is 0100 0000

We know that each expansion of the HashMap is shifted by 1 bit to the left, which is actually 2 to the power of m+1, which means that the expansion of the hash table is 16, 32, 64...n
and then we Knowing that the index in the HashMap is hash & n-1, n represents the length of the hash table. When n=16, it is hash & 0000 1111, which is actually the last four bits of the hash. When the expansion n becomes 32, Is hash & 0001 1111, which is the last five digits

Why do I want to say this, because it is related to the above (e.hash & oldCap) == 0, in fact, we can also use it here

Suppose our HashMap has expanded from 16 to 32.
In fact, you can use e.hash & newCap -1 to recalculate the index, and then rearrange the linked list, but the source code author uses another method (in fact, I think the performance should be the same) The author uses a direct comparison of e The fifth digit of .hash (16 is the last four digits, and 32 is the last five digits). If it is 0, it means that the index calculated by (hash & n-1) has not changed, or it is current position. If the fifth digit is checked as 1, then the index obtained by the formula of (hash & n-1) is offset by 16 (oldCap) bits behind the data.

So the author defines two linked lists here,
loHead low header, loTail low header (close to index 0)
hiHead high header, hiTail high header (away from index 0)

Then split the linked list. If the calculated index does not change, let him stay on this linked list (spliced ​​on loTail.next)
if the calculated index has changed. Then the data should be placed on the high linked list (spliced ​​on hiTail.next)

Finally, a soul map, rearrangement of the linked list
image
, HashMap after retake
image

All right. HashMap is over. I may need to digest it by myself. Anyway, I have finished digesting it.

Guess you like

Origin blog.csdn.net/a159357445566/article/details/108608708