Purpose

Prepare for the interview, which is convenient for follow-up review.

resource

A learning video about high-frequency interview questions at Station B

Core knowledge points

underlying data structure

1.7: Array + Linked List
1.8: array+ (linked list/red-black tree)

	// 链表
    static class Node<K,V> implements Map.Entry<K,V> {
    
    ...}
    // 红黑树
	static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    
    ...}

Arborization and deforestation

Why use a red-black tree

1.8 In order to solve the problem that the linked list is too long and affects the query efficiency, a red-black tree is introduced.

Why don't you turn it into a tree

The red-black tree is used to avoid DoS attacks and prevent the performance of the linked list from decreasing when the linked list is too long. The tree transformation should be accidental. (It can be considered as an optimization of 1.7. It will not have much impact without 1.8. After all, 1.7 has been used a lot Years)
1.1 There is no need to make a tree as soon as it comes up. It should be possible to use a linked list. There are not many elements in the linked list, so there is no need to use a red-black tree. The time complexity of searching/updating a red-black tree is O(logn), and TreeNode takes up Space has more member variables and is more complicated than ordinary Node. If it is not necessary, it is recommended to use a linked list. There are
a lot of codes in TreeNode, but it is actually very complicated, such as element rotation and so on.

Why the treeing threshold is 8

If the hash value is random enough, it will be distributed according to Poisson in the hash table. In the case of a load factor of 0.75, the probability of a linked list with a length exceeding 8 is 0.00000006. The reason for choosing 8 is to make the probability of tree formation small enough.

when to tree

The length of the linked list exceeds the threshold 8

				// put 元素, 若果是链表的一段源码逻辑
                for (int binCount = 0; ; ++binCount) {
    
    
                    if ((e = p.next) == null) {
    
    
                        p.next = newNode(hash, key, value, null);
                        // 判断是否满足树化条件
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }

Array capacity>=64

    /**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
    
    
        int n, index; Node<K,V> e;
        // 如果容量不够, 会先扩容, 不会去树化
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
    
    
            TreeNode<K,V> hd = null, tl = null;
            do {
    
    
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
    
    
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }

When does it degenerate into a linked list

When expanding the capacity, if the number of tree elements is <=6 when splitting the tree, it will degenerate into a linked list

// 红黑树元素少于6, 退化
if (lc <= UNTREEIFY_THRESHOLD)
                    tab[index] = loHead.untreeify(map);

When removing tree nodes, if one of root, root.left, root.right, root.left.left is null, it will also degenerate into a linked list

// 节点元素情况校验, 看是否满足退化条件
            if (root == null || root.right == null ||
                (rl = root.left) == null || rl.left == null) {
    
    
                tab[index] = first.untreeify(map);  // too small
                return;
            }

several member variables

// 负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// 树化的其中一个条件
static final int TREEIFY_THRESHOLD = 8;
// 退树化的一种情况
static final int UNTREEIFY_THRESHOLD = 6;
// 树化: 最小的容量
static final int MIN_TREEIFY_CAPACITY = 64;

How the index is calculated

Calculate the hashCode() of the line
Then call the hash() method of HashMap for secondary hashing

    static final int hash(Object key) {
    
    
        int h;
        // (对象原始code) 异或 (对象原始code高16位)
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

Finally & (capacity - 1) to get the index
index: i = (n - 1) & hash

The hashCode is available, why provide the hash() method?

The second hash() is to synthesize high-level data and make the hash distribution more uniform

Why is the array capacity 2 to the Nth power

When calculating the index, if it is the Nth power of 2, you can use the bit-and operation instead of the modulus, which is more efficient; when expanding, the elements with hash & oldCap == 0 remain in the original position, otherwise the new position = old position + oldCap.
1 Replace the operation of modulus
2. When expanding, the efficiency of element migration is higher.

put method process

HashMap creates an array lazily, and the array is created only when it is used for the first time
Calculate index (bucket subscript)
If the bucket subscript is not occupied yet, create a Node placeholder and return
If the subscript of the bucket is already occupied by someone
4.1 is already a TreeNode popular black tree add or update logic
4.2 is an ordinary Node, follow the add or update logic of the linked list
4.3 If the length of the linked list exceeds the tree threshold, follow the tree logic
Check whether the capacity exceeds the threshold before returning, and expand the capacity once it exceeds

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    
    
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // 1. HashMap是懒惰创建数组的, 首次使用才创建数组
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 2. 计算索引(桶下标): (n - 1) & hash
        if ((p = tab[i = (n - 1) & hash]) == null)
        	// 3. 如果桶下标还没有占用, 创建Node占位返回
            tab[i] = newNode(hash, key, value, null);
        else {
    
    
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
            	// 4.1 已经是TreeNode走红黑树的添加或更新逻辑
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
    
    
            	// 4.2 是普通Node, 走链表的添加或更新逻辑
                for (int binCount = 0; ; ++binCount) {
    
    
                    if ((e = p.next) == null) {
    
    
                        p.next = newNode(hash, key, value, null);
                        // 4.3 如果链表长度超过树化阈值, 走树化逻辑
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) {
    
     // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // 5. 返回前检查容量是否超过阈值, 一旦超过进行扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

How is 1.7 different from 1.8

When inserting a node into a linked list, 1.7 is the head insertion method, and 1.8 is the tail insertion method
1.7 is >= threshold && there is no space to expand, and 1.8 is greater than the threshold to expand
1.8 When expanding and calculating the Node index, it will be optimized

Why is the loading factor defaulted to 0.75f

A better trade-off between space usage and query time
If it is greater than this value, the space is saved, but the linked list will be longer and affect performance
If it is less than this value, the conflicts will be reduced, but the capacity will be expanded frequently and the space will be occupied more

What's wrong with multithreading?

Expansion dead chain (1.7)
Data corruption (1.7, 1.8)

Can the key be null, what are the requirements for the object of the key

The key of HashMap can be null, but other implementations of other Maps are not
As a key object, hashCode and equals must be implemented, and the content of the key cannot be modified. (immutable)

If the hashCode of the String object is designed, why is it multiplied by 31 each time (understand)

    public int hashCode() {
    
    
        int h = hash;
        if (h == 0 && value.length > 0) {
    
    
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
    
    
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

The goal is to achieve a relatively uniform hash effect, and the hashCode of each string is unique enough

Interview question learning: HashMap