(Java) Notes --- Analysis of the underlying principles of HashMap and HashMap interview questions

content

1. Implemented interface

2. Default initial value

1. Default initial capacity

2. Default maximum capacity

3. Default Load Factor

3. Interconversion between linked list and red-black tree

4. The structure of the linked list in the hash bucket

5. Hash function

6. Expansion

7. Commonly used methods in HashMap

1. Constructor

2. Find, get the value according to the key

3. Check if the key exists 

4. Insert

5. Delete

8. HashMap FAQ


1. Implemented interface

The bottom layer implements Map, clone, serialization interfaces

2. Default initial value

1. Default initial capacity

2^4 = 16, when no initial capacity is given, the capacity defaults to 16

2. Default maximum capacity

The default maximum capacity is 2^30 

3. Default Load Factor

The default load factor is 0.75, the number of effective elements / table capacity = load factor

3. Interconversion between linked list and red-black tree

The hash bucket stores the linked list nodes, but under certain conditions, the linked list and the red-black tree will be converted into each other.

If the number of linked list nodes in each bucket exceeds 8, the linked list will be converted into a red-black tree

When the number of nodes in the red-black tree is less than 6, the red-black tree will degenerate into a linked list

If the number of nodes in a linked list in the hash bucket exceeds 8, and the number of buckets exceeds 64, the linked list will be converted into a red-black tree, otherwise the capacity will be directly expanded 

4. The structure of the linked list in the hash bucket

//HashMap将其底层链表中的节点封装为静态内部类
//节点中带有key,value键值对以及key所对应的哈希值
static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;   //节点的哈希值
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }
        
        //重写Object类中hashcode()方法
        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        
        //重写Object类中equals方法
        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

5. Hash function

Convert the key to an integer number, and then use this number to divide the remainder method to calculate the position of the bucket 

Parse:

1. If key is null, return bucket 0.

2. If the key is not null, the hash code corresponding to the key is returned. If the key is a custom type, the hashcode() method in the Object class must be rewritten.

3. ( h = key . hashCode() ) ^ ( h >>> 16 ), in order to keep the high 16 bits unchanged, the low 16 bits and the high 16 bits are XORed, mainly used when the hashmap array is relatively small, all bits are Participate in the operation, the purpose is to reduce the collision.

4. After obtaining the hash address, calculate the bucket number as follows: index = (table.length - 1) & hash.

5. The bucket number is obtained by dividing the remainder method, because the size of the hash table is always the nth power of 2 , so the modulo can be converted into a bit operation to improve efficiency, which is one of the reasons why the capacity should be expanded by 2 times .

Here's a picture to illustrate why:

Summary: From the above method, it can be seen that many bits of the hashcode are actually not used, so the shift operation is used in the hash function of the hashMap, and only the first 16 bits are taken for mapping. On the other hand, the & operation ratio The model taking efficiency is higher. 

6. Expansion

Each time the cap is expanded to the nth power of 2 closest to cap , int n = cap - 1; to prevent cap from being a power of 2, if cap is already a power of 2, the execution is completed. After the next few unsigned right shift operations, the returned capacity will be twice the cap.

Assuming that the initial value of cap is now 10, the specific method is as follows:

7. Commonly used methods in HashMap

1. Constructor

 // 构造方法一:带有初始容量的构造,负载因子使用默认值0.75
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
    // 构造方法二:
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
    // 构造方法一:带有初始容量和初始负载因子的构造
    public HashMap(int initialCapacity, float loadFactor) {
        // 如果容量小于0,抛出非法参数异常
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        // 如果初始容量大于最大值,用2^30代替
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;

        // 检测负载因子是否非法,如果负载因子小于0,或者负载因子不是浮点数,抛出非法参数异常
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                    loadFactor);
        // 给负载因子和容量赋值,并将容量提升到2的整数次幂
        // 注意:构造函数中并没有给
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
/*
注意:
不同于Java7中的构造方法,Java8对于数组table的初始化,并没有直接放在构造器中完成,而是将table数组的构
造延迟到了resize中完成
*/

2. Find, get the value according to the key

/*
     1. 通过key计算出其哈希地址,然后借助哈希地址在哈希桶中找到与key对应的节点
     2. 如果节点为null,返回null,说明HashMap中节点是可以为空的
     3. 如果节点不为空,返回该节点中的value
    */
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // 1. 先检测哈希桶是否为空
        // 2. 检测哈希桶的个数是否大于零,如果桶不空,桶的个数肯定不为0
        // 3. n-1&hash-->计算桶号
        // 4. 当前桶是否为空桶
        // 如果1 2 3 4均不成立,说明当前桶中有节点,拿到当前桶中第一个节点
        if ((tab = table) != null && (n = tab.length) > 0 &&
                (first = tab[(n - 1) & hash]) != null) {

            // 如果节点的哈希值与key的哈希值相等,然后再检测key是否相等
            // 如果相等,则返回该节点
            // 此处也进一步证明了:HashMap必须要重写hashCode和equals方法
            if (first.hash == hash && // always check first node
                    ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 如果第一个节点后还有节点,检测first是否为treeNode类型的
            // 因为如果哈希桶中某条链节点大于8个,为了提高性能,HashMap会将链表替换为红黑树
            // 此时再红黑树中找与key对应的节点
            if ((e = first.next) != null) {
                if (first instanceof TreeNode) // 通过检测节点的类型知道是链表还是红黑树
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 当前桶中挂接的是一个链表
                // 顺着链表的节点一个一个往下找,找到之后返回
                do {
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

3. Check if the key exists 

    /*
1. 先通过getNode()获取与key对应的节点
2. 如果节点不为空,说明存在返回true,否则返回false
3. 时间复杂度:平均为O(1),如果当天key所对应的桶中挂接的链表则顺序查找,挂接的是红黑树按照红黑树性质找
*/
    public boolean containsKey(Object key) {
        return getNode(hash(key), key) != null;
    }

4. Insert

    /*
1. 先使用key借助hash函数计算key的哈希地址
2. 将key-value键值对,结合计算出的hash地址插入到哈希桶中
3. 从以下代码中可以看到,HashMap在插入时,并没有处理线程安全问题,因此HashMap不是线程安全的
4. 红黑树优化链表过长是java8新引进,是基于性能的考虑,在冲突大时,红黑树算法会比链表综合表现更好
*/
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;

        // 1. 桶如果是空的,则进行扩容
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;

        // 2. (n-1)&hash-->计算桶号,如果当前桶中没有节点,直接插入
        // p来记录桶中的第一个节点
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;

            // 3. 如果key已经是和桶中第一个节点相等,不进行插入
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode) // 4. 如果该桶中挂接的是红黑树,向红黑树中插入
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // 5. key不同,也不是红黑树,说明当前桶中挂的是一个链表
                // a. 在当前链表中找key
                // b. 如果找到,则不插入
                // c. 如果没有找到,先构建新节点,然后将该节点尾插到链表中
                // d. 检测bitCount的计数,binCount记录的是在未插入新节点前原链表的节点个数
                // e. 新节点插入后,链表长度是否超过TREEIFY_THRESHOLD,如果超过将链表转换为红黑树
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        // p已经是最后一个节点,说明在链表中未找到key对应的节点
                       // 进行尾插
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash); // 将链表转化为红黑树
                        break;
                    }

                    // 如果key已经存在,跳出循环
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }

            // 如果key已经存在,将key所对节点中的value替换为参数指定value,返回旧value
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
    /*
注意:afterNodeAccess和afterNodeInsertion主要是LinkedHashMap实现的,HashMap中给出了该方法,但是
并没有实现
*/
    // Callbacks to allow LinkedHashMap post-actions
    // 访问、插入、删除节点之后进行一些处理,
    // LinkedHashMap正是通过重写这三个方法来保证链表的插入、删除的有序性
    void afterNodeAccess(Node<K,V> p) { }
    void afterNodeInsertion(boolean evict) { }
    void afterNodeRemoval(Node<K,V> p) { }
/*
LinkedHashMap: 继承了HashMap,在LinkedHashMap中会对以上方法进行重写,以保证存入到LinkedHashMap中
的key是有序的,注意这里的有序是不自然序列,指的是插入元素的先后次序
LinkedHashMap底层的哈希桶使用的是双向链表
*/

5. Delete

public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
                null : e.value;
    }
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;

        // 1. 检测哈希表是否存在
        // 2. index = (n - 1) & hash: 获取桶号
        // 3. p记录当前桶中第一个节点,如果桶中没有节点,直接返回null
        if ((tab = table) != null && (n = tab.length) > 0 &&
                (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;

            // 如果第一个节点就是key,用node记录
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                // 如果当前桶下是红黑树,在红黑树中查找,结果用node记录
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    // 当前桶下是链表,遍历链表,在链表中检测是否存在为key的节点
                    do {
                        if (e.hash == hash &&
                                ((k = e.key) == key ||
                                        (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            // node不为空,在HashMap中找到了
            if (node != null && (!matchValue || (v = node.value) == value ||
                    (value != null && value.equals(v)))) {
                // 如果节点在红黑树中,将其删除
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                    // 如果节点是链表中第一个节点,将当前链表中下一个节点地址放在桶中
                else if (node == p)
                    tab[index] = node.next;
                else
                    p.next = node.next; // 非第一个节点
                ++modCount;
                --size;

                // LinkedHashMap使用
                afterNodeRemoval(node);

                // 删除成功,返回原节点
                return node;
            }
        }

        // 删除失败返回空
        return null;
    }

8. HashMap FAQ

1. If new HashMap(19), how big is the bucket array?

In Java 1.8, the new does not open up space for the array, but only opens up space when it is inserted for the first time. The space opened up is larger than 19 and closest to the power of 19, 2^4=16 , 2^5=32, so the size of the bucket array is 32

2. When does HashMap open up the bucket array to occupy memory?

This question has already been answered in the above answer, and the space memory is only opened when it is inserted for the first time.

3. When will hashMap expand?

When the number of valid elements in the table >= load factor * table capacity, the capacity needs to be expanded, and the capacity expansion is also performed according to the power of 2.

4. What happens when two objects have the same hashcode?

In get(): if the hashcodes are the same, first compare the keys through the equal method, if the keys are the same, return the value directly, otherwise return null

When inserting: If the hashcode is the same, then judge whether the key exists. If the key already exists, replace the value corresponding to the key. If the key does not exist, insert it.

When deleting: if the hashcode is the same, the key may be the one we want to delete, compare through equals, delete if so, return if not

5. If two keys have the same hashcode, how do you get the value object?

Traverse the linked list connected when the value of hashCode is equal, until equal or null

6. Do you understand what is the problem with resizing HashMap? 

If the capacity of the HashMap is changed, the nodes in the original table must be re-hashed. The purpose of the expansion is to re-hash the nodes and shorten the linked list.

7. Why override the hashcode() and equals() methods?

Rewrite hashcode: The underlying principle is to calculate the hashcode through the key, calculate the hash through the hashcode, the hash returns an integer number, and then divide the remainder by this number, and the result of the calculation is the position of the bucket. But for a custom type, the key cannot be converted to an integer number, and the hashcode method must be rewritten to convert the key of the custom type to an integer number to get the position of the bucket.

Override equals: When a hash conflict occurs, it is necessary to compare whether the keys are the same, and the equals method needs to be used for the comparison. When comparing keys for custom types, the equals method must be rewritten to compare whether the contents of the keys are the same. 

Guess you like

Origin blog.csdn.net/qq_58710208/article/details/122154321