Interview-oriented programming - HashMap source code

Problems you can solve by reading this article

  • HashMap important methods
  • Why is the loading factor 0.75?
  • When to expand
  • When to turn into a red-black tree and when to degenerate into a linked list
  • Why not use the red-black tree directly when resolving hash conflicts? Instead, choose to use the linked list first, and then switch to the red-black tree
  • Infinite loop problem in multi-threaded environment
  • What optimizations does JDK8 have when expanding

basic structure

The text is based on jdk1.8. It is well known that java introduced the concept of red-black tree in version 1.8, as shown below:

image.png

There is an attribute in the source code transient Node<K,V>[] table;, which is an array of hash buckets, and Node represents a bucket. The source code is as follows, where next represents the next node in the linked list.


   static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }
复制代码

important attributes

/**
 * HashMap 初始化长度
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * HashMap 最大长度
 */
static final int MAXIMUM_CAPACITY = 1 << 30; // 1073741824

/**
 * 默认的加载因子 (扩容因子)
 * 当 当前容量>=当前最大容量*0.75 会发生扩容
 * 比如初始化容量是16,当 16 * 0.75 =12 时会发生扩容,扩容2倍,就是32
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;


/**
 * 转换链表的临界值,当元素小于此值时,会将红黑树结构转换成链表结构
 */
static final int UNTREEIFY_THRESHOLD = 6;


/**
 * 转化红黑树重要参数,具体的方法为 treeifyBin
 *
 * @see HashMap#treeifyBin(java.util.HashMap.Node[], int)
 * <p>
 * 转化时条件是:
 * 链表长度>=TREEIFY_THRESHOLD && 数组长度>=MIN_TREEIFY_CAPACITY,链表转化为红黑树
 * 链表长度>=TREEIFY_THRESHOLD && 数组长度<MIN_TREEIFY_CAPACITY,不会转化为树,而是进行扩容,所以还是链表
 * <p>
 * 可以理解为如果元素数组长度小于MIN_TREEIFY_CAPACITY这个值,没有必要去进行结构转换
 * 当一个数组位置上集中了多个键值对,那是因为这些key的hash值和数组长度取模之后结果相同。(并不是因为这些key的hash值相同)
 * 因为hash值相同的概率不高,所以可以通过扩容的方式,来使得最终这些key的hash值在和新的数组长度取模之后,拆分到多个数组位置上。
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * 最小树容量
 */
static final int MIN_TREEIFY_CAPACITY = 64;

复制代码

When to expand

After reading the above comments, you can know that there are two situations in which expansion will occur.

  1. When the current capacity>=current maximum capacity*0.75, capacity expansion will occur
  2. When calling the conversion tree structure method treeifyBin, if the capacity is less than 64, it will also expand instead of converting the tree structure (the put method will be described in detail later)

Why is the loading factor 0.75

This is actually the result of a balance between capacity and performance

  1. Take a smaller value such as 0.5, then the expansion threshold is low, which can reduce hash conflicts, so the performance will be higher, but it will take up more space
  2. If the ratio is as high as 1, on the contrary, the hash conflict is high and the space is small, so take a compromise value

When to turn into a red-black tree and when to degenerate into a linked list

  1. When the length of the linked list is 8, it will be converted into a red-black tree (but there are already 9 nodes in the linked list at this time, there is an explanation when calling treeifyBin() in the specific put method)

  2. When the length of the linked list is 6, it reverts to a linked list. There is a difference of 7 in the middle to prevent frequent transitions between the linked list and the tree. Suppose, if the number of linked lists exceeds 8, the linked list will be converted into a tree structure, and if the number of linked lists is less than 8, the tree structure will be converted into a linked list. If a HashMap keeps inserting and deleting elements, and the number of linked lists hovers around 8, then Tree to linked list and linked list to tree will frequently occur, and the efficiency will be very low.

Why not use the red-black tree directly when resolving hash conflicts? Instead, choose to use the linked list first, and then switch to the red-black tree

Because the red-black tree needs to be left-handed, right-handed, and changed to maintain balance, while the singly linked list does not. When the number of elements is less than 8, the query operation is performed at this time, and the linked list structure can already guarantee the query performance. When the number of elements is greater than 8, a red-black tree is needed to speed up the query, but the efficiency of adding nodes becomes slower.

core method

get

get finally calls getNode, so the focus is also on this method

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //通过位运算得到求模结果确定链表的首节点
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        //首先比对首节点,如果首节点的hash值和key的hash值相同
        //并且 首节点的键对象和key相同(地址相同或equals相等)
        //则返回该节点
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;

        //如果首节点对比不一致,则判断是否存在下一个节点
        if ((e = first.next) != null) {
            //若是树结构则遍历树,否则就是一个普通的链表,那么逐个遍历比对即可
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
复制代码

put

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //若哈希桶为空则初始化,resize()是用来扩容的,但也可以作为初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //根据hash值和数组长度取摸计算出数组下标,若为null则新建一个元素存储到该位置
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    //如果该位置已经存在元素,说明有以下情况
    else {
        Node<K,V> e; K k;
        //情况1:如果 key 匹配上了,直接覆盖 value
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //情况2:key匹配不上,则去从树结构或链表查找
        else if (p instanceof TreeNode)
            //从树结构查找
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //遍历链表查找
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //如果链表上元素的个数已经达到了阀值(可以改变存储结构的临界值)
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        //还记得刚刚说的如果果元素数组长度小于MIN_TREEIFY_CAPACITY这个值
                        //就没有必要去进行结构转换吗,判断条件在这个方法里面,自己点进去看看吧
                        
                        //另外此时链表已经有至少9个节点了(binCount>=7,说明已经遍历了至少8次)
                        //由于上方的if ((e = p.next) == null)已将第九个节点挂在第八个节点上
                        //所以其实此时已有9个节点
                        treeifyBin(tab, hash);
                    break;
                }
                //若key匹配的上,则跳出循环,因为找到了相同的key对应的元素
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }

        // 走到此处表示需要进行覆盖值
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);//元素被访问之后的后置处理
            return oldValue;
        }
    }

    // 走到此处表示是新增元素,而不是覆盖元素
    ++modCount;//计数器递增
    //如果当前map的元素个数大于了扩容阀值,那么需要扩容元素数组了
    if (++size > threshold)
        resize();
    
    afterNodeInsertion(evict);//添加新元素之后的后后置处理
    return null;
}
复制代码

resize (expansion)

final Node<K, V>[] resize() {
    //扩容前数组
    Node<K, V>[] oldTab = table;
    //扩容前的数组的大小和阈值
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    //新数组大小与阈值
    int newCap, newThr = 0;
    if (oldCap > 0) {
        //如果数组元素个数大于等于限定的最大容量(2的30次方),不再扩容
        if (oldCap >= MAXIMUM_CAPACITY) {
            //扩容阀值设置为int最大值(2的31次方 -1 ),因为oldCap再乘2就溢出了
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        /*
         * 如果数组元素个数在正常范围内,那么新的数组容量为老的数组容量的2倍(左移1位相当于乘以2)
         *
         * 解释:扩容之后的新容量小于最大容量  并且  老的数组容量大于等于默认初始化容量(16),那么新数组的扩容阀值设置为老阀值的2倍。
         * (老的数组容量大于16意味着:要么构造函数指定了一个大于16的初始化容量值,要么已经经历过了至少一次扩容)
         */
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    /*
     * 标记1
     * 运行到这个else if  说明老数组没有任何元素
     * 如果老数组的扩容阀值大于0,那么设置新数组的容量为该阀值
     * 这一步也就意味着构造该map的时候,指定了初始化容量(具体看一下构造方法就知道,new HashMap<>(n)会初始化threshold)。
     */
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // 能运行到这里的话,说明是调用无参构造函数创建的该map,并且第一次添加元素
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }

    //若扩容阈值为0(标记1的情况)
    if (newThr == 0) {
        float ft = (float) newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ?
                (int) ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes", "unchecked"})
    Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap];
    //开始扩容
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K, V> e;
            //遍历将原数组复制到新的数组中
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                //若链表只有一个,直接赋值
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //若为树结构,则调红黑树相关操作
                else if (e instanceof TreeNode)
                    ((TreeNode<K, V>) e).split(this, newTab, j, oldCap);
                else { // preserve order
                    /*
                     * 链表复制,JDK1.8扩容优化部分
                     * 这个高低位运算我也看的不是很懂有兴趣自行百度吧
                     * 只需了解JDK 1.8 在扩容时并没有像 JDK 1.7 那样,重新计算每个元素的哈希值
                     * 而是通过高位运算(e.hash & oldCap)来确定元素是否需要移动
                     */
                    Node<K, V> loHead = null, loTail = null;
                    Node<K, V> hiHead = null, hiTail = null;
                    Node<K, V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        } else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
复制代码

infinite loop problem

The insertion method of JDK 1.7 linked list is the reverse order insertion of the header. In a multi-threaded environment, circular references will occur when the capacity is expanded, resulting in the sending of an infinite loop. This problem has been improved in JDK 1.8, and it has become a forward order insertion at the tail.

However, although JDK1.8 solves the above problems, there will still be infinite loop problems in other places, so please use ConcurrentHashMap for multi-threading: blog.csdn.net/gs_albb/art…

Handwritten a simple HashMap

/**
 * @author HeyS1
 * @date 2020/6/4
 * @description
 */
public class MyHashMap<K, V>  {

    @Data
    static class Entry<K, V>  {
        private Entry<K, V> next;
        private K key;
        private V value;

        Entry(Entry<K, V> next, K key, V value) {
            this.next = next;
            this.key = key;
            this.value = value;
        }
    }


    //数组默认长度
    private static final int DEFAULT_INITIAL_CAPACITY = 16;
    //默认阈值比例
    private static final float DEFAULT_LOAD_FACTOR = 0.75f;

    //数组长度
    private int arraySize;
    //阈值比例
    private float loadFactor;
    //一共存储了多少个entry
    private int entryUseSize;
    //存储数组
    private Entry<K, V>[] table;

    public MyHashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    public MyHashMap(int arraySize, float loadFactor) {
        this.arraySize = arraySize;
        this.loadFactor = loadFactor;
        table = new Entry[this.arraySize];
    }


    public void put(K k, V v) {
        //获取entry在数组中存放的位置
        int index = getIndex(k);
        Entry<K, V> t = table[index];

        if (t == null) {
            //如果该位置不存在元素,则放入即可
            table[index] = createEntry(k, v);
        }
        //若已存在元素
        else {
            if (keyEquals(t, k)) {
                //若该位置的元素key与放入元素的key是一致的,则替换value
                t.value = v;
            } else {
                //遍历链表
                while (t != null) {
                    if (t.next == null) {
                        t.next = createEntry(k, v);
                        break;
                    }
                    if (keyEquals(t.next, k)) {
                        t.next.value = v;
                        break;
                    }

                    t = t.next;
                }
            }
        }

        //若entry数量 >= 数组长度 * 阈值,则进行扩容
        if (entryUseSize >= arraySize * loadFactor) {
            resize(2 * entryUseSize);
        }
    }



    public V get(K k) {
        int index = getIndex(k);
        Entry<K, V> t = table[index];
        if (keyEquals(t, k)) {
            return t.value;
        }

        //遍历链表寻找元素
        while (t.next != null) {
            if (keyEquals(t.next, k)) {
                return t.next.value;
            }
            t = t.next;
        }
        return null;

    }

    /**
     * 扩容 (即将所有元素重新hash放入一个更大的数组)
     *
     * @param i
     */
    private void resize(int i) {
        arraySize = i;//重置数组长度为i
        entryUseSize = 0;//重置记录元素长度为0

        //将所有元素都放入List
        List<Entry<K, V>> list = new ArrayList<>();
        for (int j = 0; j < table.length; j++) {
            if (table[j] == null) {
                continue;
            }
            list.add(table[j]);

            while (table[j].next != null) {
                list.add(table[j].next);
                table[j] = table[j].next;
            }
        }

        //新建一个容量更大的数组并且将所有元素put进去
        table = new Entry[i];
        for (Entry<K, V> kvEntry : list) {
            put(kvEntry.getKey(), kvEntry.getValue());
        }
    }


    private Entry<K, V> createEntry(K k, V v) {
        entryUseSize++;//记录元素数量,每次创建一个Entry数量都要自增1,用于计算是否需要扩容
        return new Entry<>(null, k, v);
    }

    /**
     * 判断Key是否一致
     *
     * @param entry 某元素
     * @param key   key
     * @return
     */
    private boolean keyEquals(Entry entry, K key) {
        return entry.getKey() == key || (entry.getKey() != null && entry.getKey().equals(key));
    }

    /**
     * 根据hash获取元素在数组的位置
     *
     * @param k
     * @return
     */
    private int getIndex(K k) {
        return hash(k) & (arraySize - 1);
    }

    /**
     * 获取Key 的hash
     *
     * @param key
     * @return
     */
    private int hash(K key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

    public void getInfo() {
        System.out.println("当前数组长度:" + arraySize);
        System.out.println("当前元素个数:" + entryUseSize);
    }
}
复制代码
public class Test {
    public static void main(String[] args) {
        MyHashMap<Integer, String> myMap = new MyHashMap<>();
        for (int i = 0; i < 500; i++) {
            myMap.put(i, i + "");
            System.out.println(myMap.get(i));
        }

        //重复添加
        for (int i = 0; i < 500; i++) {
            myMap.put(i, i + "-覆盖");
            System.out.println(myMap.get(i));
        }
        myMap.getInfo();
    }
}
复制代码

Guess you like

Origin juejin.im/post/7084068155491876895