Introduction to HashMap

HashMap is an implementation of the Map interface based on a hash table, which stores a collection of key-value pairs in the form of key-value.

1. HashMap construction method

HashMap(): Set the load factor to the default value (0.75);
HashMap(int initialCapacity): The load factor is still the default value (0.75), and the parameter initialCapacity indicates the custom desired initialized table space size;

The table space is not initialized when the object is instantiated, and the table space is initialized when the key-value pair is added for the first time;

The initialCapacity may not be the length of the table space after instantiation. The length after instantiation is greater than or equal to the minimum power of 2 of the initialCapacity, which is implemented by the tableSizeFor method, which is described below.

3. HashMap(int initialCapacity, float loadFactor): The parameter loadFactor represents the custom load factor, and the parameter initialCapacity represents the custom expected initialized table space size, the actual size is the same as 2.;

4.HashMap(Map<? extends K, ? extends V> m): The loading factor is still the default value (0.75), if it is smaller than the original Map collection size divided by the minimum integer value of the default loading factor (>= (Map collection size) /default load factor), the threshold (threshold) will be reset, and the threshold value is greater than or equal to the power of 2 that is the smallest integer previously calculated.

Second, the tableSizeFor method

The function of this method is to calculate the minimum power of 2 greater than or equal to the given number, and the maximum return to the power of 2 to the 30th power. The source code is as follows:

static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

Proof of implementation method:

Assuming that the value obtained by subtracting 1 from the input parameter is converted into binary as: 0001XX...XXX, note that the rightmost number is in the 0th position, and the leftmost 1 is the mth position, which is greater than or equal to the minimum power of 2 for the input parameter is 2^(m+1).

m is up to 30, 0001XX...XXX, the right side of 1 includes itself with up to 31 numbers: the maximum value of int (2^31-1) minus 1, the resulting value is expressed in binary: 11...110, the first 30 1, the last 0, so m max 30.

The first line in the method reduces the input parameter by 1 to prevent the input parameter from being a power of 2, resulting in the returned value should be the input parameter itself, but the actual returned value is twice the input parameter.

The process of converting from 0001XX...XXX (1 at the mth position) to 00010XX...XXX (1 at the m+1th position):

n |= n >>> 1 is equivalent to n = n | n >>> 1, 0001XX...XXX | 0001XX...XXX>>>1 = 00011XX...XXX, which ensures that the mth position The 1 digit immediately to the right is 1;

n |= n >>> 2 is equivalent to n = n | n >>> 2, 00011XX...XXX | 00011XX...XXX>>>2 = 0001111XX...XXX, which ensures that the mth position The (1+2) digits immediately to the right are 1;

n |= n >>> 4 is equivalent to n = n | n >>> 4, 0001111XX...XXX | 0001111XX...XXX>>>4 = 00011111111XX...XXX, which guarantees that the mth position The next (1+2+4) digit to the right is 1;

Similarly, after executing n |= n >>> 16, the (1+2+4+8+16)=31 digits next to the right of the mth digit are 1, and the maximum m is 30, so the right side of the mth digit has all been is 1 (000111 111 (m+1 1s in total)), add 1 to the result to get 000100 0 (the rightmost 0 is in the 0th position, and the only 1 is in the m+1th position , converted to decimal as 2^(m+1)). Since the return value is int, there will be no 2^31, so the maximum result is limited to 2^30:

static final int MAXIMUM_CAPACITY = 1 << 30;

3. Insert key-value pair (put method)

The source code and comments of the put method are as follows:

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        //HashMap实例化时不会初始化table，table的初始化在首次插入键值对时进行
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        //根据hash值对table数组最大下标取模，检查table中对应的位置是否为空，如果为空就新建一个Node节点
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            //如果需要增加节点的键在table中已有（用equals判断），会将新节点的值覆盖旧节点的值（具体操作在下面的if (e != null)中）
            if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //如果该节点为树节点，则按红黑树的结构存储节点
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //hash碰撞，在table数组中对应位置节点后面继续查找
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        //按链表的结构添加新的节点
                        p.next = newNode(hash, key, value, null);
                        //如果table中对应位置已有8个节点，且整个hashmap中节点个数已超过64（在treeifyBin()中判断），则转换为红黑树存储
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //在table数组中对应位置节点的链表中找到相同的键，就结束循环
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        //节点个数是否大于阈值，如果大于则需要扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

The flow chart of the put method:

So the data structure of HashMap in Java8 is array + linked list + red-black tree:

The red-black tree is a kind of binary tree, which is traversed in inorder (the implementation source code of the red-black tree is more complicated, I have not studied it in depth, I only know these~)

Fourth, get the value according to the key (get method)

get method source code and comments:

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        //先根据键的hash值找到在table数组中的位置
        if ((tab = table) != null && (n = tab.length) > 0 &&(first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                //如果这个节点以树结构存储，则按树结构查找
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    //节点不是树结构，不断查找下一个节点，直到找出key的节点，或全部查找完该链表
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

First find the position of the hash value of the key corresponding to the table array;
If the node at this position is a tree node, it is searched according to the tree structure; if it is not a tree node, the next node is continuously searched until the node whose key is the input parameter is found, or the list of positions in the table array is traversed. of all nodes.

5. Expansion

When the number of nodes is greater than the threshold, the HashMap will expand to twice the original space size, and the threshold will also expand (the original threshold bit operation is shifted to the left by one: << 1). The locations of nodes are reassigned after capacity expansion.

6. Thread Safety

HashMap is thread-unsafe. In the case of high concurrency, thread safety can be achieved by wrapping it with ConcurrentHashMap or Collections.synchronizedMap(map). The source code is very simple, that is, the synchronized keyword is added, so I won't introduce it here.

ConcurrentHashMap is recommended. Because hashtab and Collections.synchronizedMap(map) add the synchronized keyword to the entire method, but ConcurrentHashMap only adds the synchronized keyword to a part of the method, so ConcurrentHashMap is more efficient.

Collection framework series (1) HashMap source code analysis - Java8