HashMap classic interview questions + source code analysis

1. HashMap classic interview questions

Talk about the underlying principle of HashMap?

Based on the principle of hashing, jdk8 adopts the data structure of array + linked list + red-black tree. We store and retrieve objects through put and get. When we pass the key and value to the put() method, we first do a hashCode() calculation on the key to get its position in the bucket array to store the Entry object. When getting an object, get the location of the bucket through get, find the correct key-value pair through the equals() method of the key object, and then return the value object.

Talk about how put is implemented in HashMap?

1. Calculate the hashcode value of the key (XOR operation with the upper 16 bits of Key.hashCode)

2. If the hash table is empty, call resize() to initialize the hash table

3. If there is no collision, add the element directly to the hash table

4. If a collision occurs (the hashCode value is the same), three judgments are made

​ 4.1: If the key address is the same or the content after equals is the same, replace the old value

​ 4.2: If it is a red-black tree structure, call the insertion method of the tree

​ 4.3: Linked list structure, loop traversal until a node in the linked list is empty, insert by tail insertion, after inserting, judge whether the number of linked list has reached the threshold of 8 to become a red-black tree; you can also traverse to have nodes and insert elements The hash value is the same as the content and is overwritten.

5. If the bucket is full and greater than the threshold, resize to expand the capacity

Let’s talk about when the HashMap needs to be expanded, and how is the expansion resize() implemented?

Call scene:

1. Initialize the array table

2. When the size of the array table reaches the threshold value, that is, ++size > load factor * capacity, it is also in the putVal function

Implementation process: (details)

1. Determine whether the array has been initialized by judging whether the capacity of the old array is greater than 0

no: initialize

Determine whether to call the no-argument constructor,

​ Yes: use the default size and threshold

​ No: Use the capacity initialized in the constructor, of course, this capacity is the power of 2 calculated by tableSizefor

​ Yes, expand the capacity by twice (in the case of less than the maximum value), and then re-do the AND operation and copy the elements to the new hash table

In a nutshell: expansion needs to re-allocate a new array, the new array is twice as long as the old array, and then traverse the entire old structure, and re-hash all the elements one by one to the new structure.

PS: It can be seen that the underlying data structure uses arrays, and in the end it will need to be expanded due to capacity issues

Talk about how get is implemented in HashMap?

Perform hashing on the hashCode of the key, and calculate the subscript to obtain the bucket position. If it can be found at the first position of the bucket, it will be returned directly. Otherwise, it will be searched in the tree or traversed in the linked list. If there is a hash conflict, use the equals method to find Traverse the linked list to find the node.

Why not directly use the key as a hash value but do an XOR operation with the upper 16 bits?

​Because the array position is determined by the AND operation, only the last four bits are valid, the designer performs an XOR operation on the hash value of the key and the high 16 bit so that when the & operation is performed to determine the insertion position of the array, the low bit at this time In fact, the combination of high and low bits increases randomness and reduces the number of hash collisions.

Why 16? Why does it have to be a power of 2? What if the input value is not a power of 2 like 10?

The default initialization length of HashMap is 16, and each time the capacity is automatically expanded or manually initialized, it must be a power of 2.

1. For the uniform distribution of data, reduce hash collisions. Because determining the position of the array is a bit operation, if the data is not a power of 2, the number of hash collisions will be increased and the array space will be wasted. (PS: In fact, if you don’t consider efficiency, you can do the remainder without bit operations and the length must be a power of 2)

2. If the input data is not a power of 2, HashMap must obtain a power of 2 through a shift operation and or operation, and it is the number closest to that number

Talk about what happens when the hashCode of two objects are equal?

​ There will be a hash collision. If the key value is the same, replace the old value. Otherwise, it will be linked to the back of the linked list. If the length of the linked list exceeds the threshold value of 8, it will be stored in a red-black tree

Please explain the parameter loadFactor of HashMap, what is its function?

loadFactor indicates the degree of congestion of HashMap, which affects the probability of hash operation to the same array position. The default loadFactor is equal to 0.75. When the elements contained in the HashMap have reached 75% of the length of the HashMap array, it means that the HashMap is too crowded and needs to be expanded. The loadFactor can be customized in the HashMap constructor.

What if the size of the HashMap exceeds the capacity defined by the load factor?

If the threshold is exceeded, the expansion operation will be performed. In a nutshell, the size of the expanded array is twice that of the original array, and the original elements will be re-hashed into the new hash table.

Disadvantages of traditional HashMap (why introduce red-black tree?):

​ Before JDK 1.8, the implementation of HashMap was an array + linked list. Even if the hash function is good, it is difficult to achieve a 100% uniform distribution of elements. When a large number of elements in the HashMap are stored in the same bucket, there is a long linked list under the bucket. At this time, the HashMap is equivalent to a single linked list. If the single linked list has n elements, the time complexity of traversal is O(n), completely loses its advantage. In response to this situation, JDK 1.8 introduced a red-black tree (the search time complexity is O(logn)) to optimize this problem.

What type of element is generally used as the Key when using HashMap?

​ Choose immutable types like Integer and String. All operations on String are to create a new String object, splicing and splitting the new object, etc. These classes have already overridden hashCode() and equals() in a very standard way. method. Being an immutable class is inherently thread-safe.

2. HashMap source code analysis

Use no parameters to create a HashMap ( the default capacity is 16 ), loadFactor is the loading factor, which is an important parameter for capacity expansion. At this time, the array is not actually created, but the array is initialized when put adds elements for the first time

static final float DEFAULT_LOAD_FACTOR = 0.75f;
public HashMap() {
    
    
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}

There is a small detail in creating a map, that is, we use an int type constructor to create a HashMap with a specified initial capacity

// 创建了一个指定初始容量为 17 的 map
Map<String, Object> map = new HashMap<>(17);

Call another constructor through this to pass in the initial capacity and the default load factor. If the initial capacity is less than 0, an IllegalArgumentException will be thrown directly . Note that the initial capacity is not equal to 17, but greater than or equal to The minimum value of the power of 2, that is, 32, the threshold will be recalculated based on 32. 32 * 0.75 is equal to 24, that is, the threshold for this expansion is 24

// 通过唯一运算从新计算扩容阈值并返回
static final int tableSizeFor(int cap) {
    
    
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
public HashMap(int initialCapacity) {
    
    
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
    
    
    // 判断当前初始容量的合法性
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    // 判断指定初始化容量是否大于默认的最大容量, 如果大于就将最大容量设置为初始容量, 容量并不是无上限
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    // 从新计算出来的阈值
    this.threshold = tableSizeFor(initialCapacity);
}

Call the ordinary put method to add elements, it is worth noting that:

public V put(K key, V value)  
    return putVal(hash(key), key, value, false, true);
}
  • The method putVal that actually adds elements is called , which will be explained in detail below
  • The first parameter of putVal is the hash method that performs right-shift XOR operation through the hashCode value of the key and returns an int type hash value; if key = null, it returns 0 directly
  • The fourth parameter of putVal is mainly used for hash collision and whether the old value needs to be replaced if the key is the same. The default is false, that is, when the key is the same, the old value is replaced and the old value is returned
    • If putIfAbsent is called when a hash collision occurs and the key is the same, the old value will neither be replaced nor added
@Override
public V putIfAbsent(K key, V value) {
    
    
    return putVal(hash(key), key, value, true, true);
}

Analysis of resize initialization array and expansion method

final Node<K,V>[] resize() {
    
    
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
    	// 如果老数组的容量大于0, 执行扩容
        if (oldCap > 0) {
    
    
            if (oldCap >= MAXIMUM_CAPACITY) {
    
    
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
    	// 扩容之后基于新数组的长度设置 
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
    	// 初始化数组操作, 新数组的容量(长度)采用默认长度 16, 数组的阈值使用默认加载因子(0.75) * 默认初始容量(16) 为 12
        else {
    
                   // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
    
    
            // 数组扩容之后, 使用新数组的容量 * 0.75 计算出新数组的阈值
            // 假如扩容之后新数组的长度为 32, 那么新数组的阈值: 32 * 0.75 = 24 	
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({
    
    "rawtypes","unchecked"})
    		// 生成新的数组, newCap 是通过扩容计算出来的新长度
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
    
    
            // 遍历旧数组, 将旧数组上面的元素转移到新数组上面
            for (int j = 0; j < oldCap; ++j) {
    
    
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
    
    
                    oldTab[j] = null;
                    // 当前元素没有下一个节点也就意味着只有一个元素
                    if (e.next == null)
                        // 通过 hash 取模运算计算出新数组的索引位置, 直接存储到新数组
                        newTab[e.hash & (newCap - 1)] = e;
                    // 如果当前是一个红黑树
                    else if (e instanceof TreeNode)
                        // 就将红黑树上面的节点进行分割, 转移红黑树
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // 此时只有一种情况了, 剩下的只能是一个链表, 下面进行拆分链表
                    else {
    
     // preserve order
                        // 将当前链表拆分成一个高位链表和一个低位链表
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
    
    
                            next = e.next;
                            // 将当前元素加入到低位链表
                            if ((e.hash & oldCap) == 0) {
    
    
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // 加入高位链表
                            else {
    
    
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 将拆分的两个链表转移到新数组
                        if (loTail != null) {
    
    
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
    
    
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

Let's take a look at how putVal adds elements. Since the putVal method is a bit more complicated, it is divided into two parts to analyze. In fact, two things are done in it: initializing the array and adding replacement elements

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    
    
      //------------------初始化数组的大小-------------------------
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 判断 table 数组中是否有元素, 第一次添加的时候 table 等于 null, 这里调用 resize 的作用主要是 创建一个 Node 类型的数	   // 组, Node 中一四个重要的属性: hash、key、value、next
    // hash 就是经过 hash 运算返回的值, key 和 value 见名知意就不多解释了, next 就是链表中指向下一个节点的指针
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 通过 hash 计算出索引, 再判断 table[index] 是否有值, 如果没有值就直接创建一个 Node 节点将 key、value、hash设置进
    // 去, 由于新创建的这个节点没有下一个节点, 因此 next 就赋值为 null
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    //------------------后面为添加元素的逻辑, 此处省略,下面细讲-------------------------
}

The main adding logic behind the putVal method is to add based on linked list or red-black tree

    //	如果该下标位置存在元素
	else {
    
    
        Node<K,V> e; K k;
        // 判断 put 进来元素的 key 是否等于已存在元素的 key, 如果等于且不等于 null, 再去判断两个 key 是否相等, 相等就替换之前的的旧值并返回替换之前的值; 如果 key 不相等执行后面的逻辑, 最终使用尾插法将元素追加到链表的末尾  
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 判断当前元素是否是 treeNode 的实例, 是则使用红黑树的方式添加元素
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 走到这一步, 就只剩下一种情况了, 该位置是一个链表, 需要遍历整个链表判断有没有相同的 key, 
        // key 相同就替换, 不同就尾插
        else {
    
    
            // binCount 是判断是否走树化的一个关键
            for (int binCount = 0; ; ++binCount) {
    
    
                // 运行到这里, 此时 binCount = 0, 链表中有 1 个元素
                if ((e = p.next) == null) {
    
    
                    p.next = newNode(hash, key, value, null); // 运行到这一步时, binCount = 0, 此时链表中有两个元素
                    // 这一步很关键, 当前循环次数 >= 阈值 - 1, 也就是大于等于 7 的时候, 改造为红黑树, 但是我们平时说的达到 阈值 8 的时候才树化, 为什么这里是等于 7 的时候转化成红黑树呢, 注意的是上面遍历是从 0 开始, 也就是 binCount 等于 0 时原始链表有 1 个节点, 此时 put 进来的元素还没有插入到链表中, 循环结束还没正式进入下一轮循环之前, put 进来的元素插入到了链表的最后, 也就是说 bincount = 0, 此时链表中有两个元素, 以此类推, 当 binCount 等于 7 时, 链表中有 9 个元素, 链表就会改造为红黑树的方法, 那么也不对呀, 不是链表长度为 8 的时候树化吗, 怎么这里 链表长度为 9 才树化, 这里有个小细节点, 它是拿每次 追加元素之前的原始链表长度作为的判断条件, 也就是原始长度为 8 改造为红黑树, 把新元素追加到链表中(中间有个执行时机), 转换城红黑树时真正有 9 个元素
                    if (binCount >= TREEIFY_THRESHOLD - 1) 
                        // 具体的树化方法后面再讲, 只需要知道在这一步即将转换成红黑树
                        treeifyBin(tab, hash);
                    break;
                }
                // 遍历链表, 如果有相同的 key 则直接退出当前循环, 进行替换旧数据
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) {
    
     // existing mapping for key
            // 真正执行替换旧数据的操作, 并返回之前的数据
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            // 返回旧数据
            return oldValue;
        }
    }
	// 记录修改次数
    ++modCount;
	// 判断当前的元素个数是否达到扩容条件
    if (++size > threshold)
        // 如果达到扩容条件, 调用 resize 方法进行扩容
        resize();
    afterNodeInsertion(evict);
	// 如果在当前数组中没有相同的key, 也就没有替换操作, 就返回一个 null
    return null;

The method of converting to red-black tree

// 转化成红黑树时散列表最小的树化容量
static final int MIN_TREEIFY_CAPACITY = 64;


final void treeifyBin(Node<K,V>[] tab, int hash) {
    
    
    int n, index; Node<K,V> e;
    // 判断当前的数组是否为空或当前散列表的长度是否小于 64
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // 当 table 数组为空时调用 resize 方法进行初始化数组
        // 当 散列表的长度小于 树化容量 64, 此时链表的长度已经等于 8, 一旦链表长度大于 8 则链表的查询效率很低(0n), 为了解决这个问题, 就调用 resize 方法进行扩容, 将当前的链表截半转成一个高位链表和一个低位链表, 再通过 hash 运算计算出存放新数组的索引, 直接将, 老数组上面的元素迁移到新数组上面
        resize();
    // 真正转化成红黑树的逻辑
    else if ((e = tab[index = (n - 1) & hash]) != null) {
    
    
        // 将链表改造成双向链表, 将原始的 Node 节点 转换成 TreeNode 节点
        TreeNode<K,V> hd = null, tl = null;
        do {
    
    
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
    
    
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        } while ((e = e.next) != null);
        // 改造为红黑树
        if ((tab[index] = hd) != null)
            hd.treeify(tab);
    }
}

Guess you like

Origin blog.csdn.net/weixin_49137820/article/details/128489807
Recommended