HashMap principle and source code analysis

HashMap principle and source code analysis
HashMap- Profile Design

image

Make a predetermined term for the easy understanding of FIG.

  • table array, called bucket (bucket)
  • a table array elements such as a table [. 1] called the slot
  • If the slot has a single linked list data corresponding to the called slot list (the slot containing data)

FIG have a inaccuracies on is, in fact, even if the list table corresponding to the slot is greater than the number of nodes is equal to 8, but also determines the amount of data exceeds a table 64, will be converted over the red-black tree, if there is no than would be the expansion of the table is converted to red-black tree without

As shown above, the composition HashMap is composed of the following components

  • table array
  • Single list
  • Red-black tree (exist only under certain circumstances follow-up will explain)
Thinking

Before see the source code, but before that we have a few questions to ponder, with a deeper understanding of the problem to see the source code

  1. HashMap expansion is not to turn out in each of the data calling putVal hash storage method to do it again?
  2. Why single list number 8 or greater than or equal to 64 and the table an array of time to be converted into a red-black tree?
  3. Why HashMap To use an array + + list red-black tree to achieve?
  4. Why should exist size> threshold limit value when the expansion to be expansion, rather than size = table.length of time?
  5. Why, when the number of red-black tree node to be reduced to less than 6 list?
  6. If HashMap when expansion does not require re-hash to improve the efficiency of how the expansion when it is implemented?
  7. HashMap is ordered it?
  8. HashMap is thread safe?

Question 2: Why singly linked list and table number 8 or greater than or equal to 64 when the array to be converted to a red-black tree?

Here we must first mention binary search tree , we know binary search tree in extreme cases, degenerate into a linked list

image

This can be seen from Figure One is a balanced binary search tree, his look very efficient O (log2n), whereas Figure II Find degradation efficiency O (n) as a single linked list, so if we can keep similar a diagram of this state, then we can look to maintain efficiency, so it leads to a balanced binary search tree, where the red-black tree that balance a binary search tree most widely used industrial. Red-black tree by adding nodes, modify nodes red and black colors rotating sub-tree and other operations so that the tree remains balanced state, the red-black tree principle is not complicated, but the code is to achieve a relatively complicated Interested students can go to search here temporarily do not speak, explained in a separate place to explain the red-black tree.

Here we know the reason must be transformed into red-black tree, then why not less than 8 slots and table lists an array of 64 or greater when converted, this was due to the balance of the proceeds of, if table itself is small, in fact, expansion the price is very low or very high efficiency.

Question 3: Why HashMap to use an array + + list red-black tree to achieve?

Using the table to save the array index position, the index position is based on the hash (key) & - directly calculate the table when the array index (table.length 1) calculated, then the next visit of a key you can directly access the efficiency of the O (1 ), why should we increase the list of it, this is because the problem hash algorithm for the hash algorithm, even if the data can hash to a different slot, hash algorithm itself is high efficiency, small hash conflict is so good hash algorithm, in fact, can not be fully guaranteed, such as hash clash is certainly exists, then the hash if there is a conflict, it will be to head for the slot data in order to add data to the end of the conflict, when retrieved one by one to determine hash, to verify whether the key is equal to the corresponding data found, and why black tree question 2 has been described

Question 4: Why there is size> threshold limit value when this expansion to conduct expansion?

The higher the likelihood of capacity because of certain circumstances, the greater the amount of data stored, then there is a hash conflict newly added data, it is necessary to give a time expansion threshold threshold represents a storage capacity up to the required value of the HashMap for expansion

Question 5: Why is the number of times smaller than a red-black tree node 6 to degenerate into a linked list?

When the single linked list node data is sufficiently small, the traversal time is negligible, retrieval speed fast enough. The complexity of the list is also low maintenance and better than the red-black tree, in HashMap expansion, it also may be better to split the list

As for the other issues we need to find the answer from the source, but also to see how the code is to achieve the above operation. First, look at the meaning of some of its member variables defined represented (some do not understand it does not matter, when will the code specific to explain, here to have a general impression)

Variable definitions
    // 默认的初始容量为 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    // 最大容量为 2^30
    static final int MAXIMUM_CAPACITY = 1 << 30;

    // 默认的装载因子
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    // 当一个桶中链表元素个数大于等于 8 的时候转化为红黑树
    static final int TREEIFY_THRESHOLD = 8;

    // 当一个桶中链表元素个数小于等于 6 的时候将红黑树转化为链表
    static final int UNTREEIFY_THRESHOLD = 6;

    // 当桶的个数达到 64 的时候并且单个槽位链表结点数量大于等 8 的时候进行树化
    static final int MIN_TREEIFY_CAPACITY = 64;

    // 数组,也就是桶
    transient Node<K,V>[] table;

    // 作为 entrySet() 的缓存
    transient Set<Map.Entry<K,V>> entrySet;

    // 元素的数量
    transient int size;

    // 修改次数,用于在迭代的时候执行 fail-fast
    transient int modCount;

    // 当桶的使用数量达到多少时候进行扩容
    int threshold;

    // 装载因子
    final float loadFactor;

    /**
     * 单链表结点
     * @param <K>
     * @param <V>
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        // 存储 key 的 hash 值
        final int hash;
        final K key;
        V value;
        // 下一个结点
        Node<K,V> next;

        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }

        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                        Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }
复制代码
Construction method
    // 指定容量和装载因子构建 HashMap
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                    loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    // 指定容量构造 map
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    // 无参构造全部使用默认值
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    // 根据 map 来创建一个 HashMap 使用默认的参数
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        // 将传入 map 数据复制到新的 map 中
        putMapEntries(m, false);
    }
    
复制代码

The above method is nothing more than instructions to create a map that can be specified when the variables only, initialCapacity loadFactor capacity and load factor, the other all the default values, the last constructor to build a new HashMap based on the incoming amount map, let's look at putMapEntries (m, false); how to achieve.

    // 方法是 final 不可被覆写
    final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        int s = m.size();
        if (s > 0) {
            // 如果桶数组还没有创建,则先进行创建
            if (table == null) { // pre-size
                // 使用传入 map 的长度 / 装载因子 + 1 作为当前的 map 容量
                float ft = ((float)s / loadFactor) + 1.0F;
                // 如果该值超出 map 承受的最大值,则取 map 最大值作为容量
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                        (int)ft : MAXIMUM_CAPACITY);
                // 如果 map 当前的容量超出了扩容限定值,则进行扩容
                if (t > threshold)
                    // 扩容并且返回新的 map 扩容限定值
                    threshold = tableSizeFor(t);
            }
            // 如果 map 的数据大小超出了扩容阀值
            else if (s > threshold)
                // 将数据迁移到一个新的 map 中搬移所有数据
                resize();
            // 遍历传入 map
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                // 将数据一个一个的重新放入 map 中
                putVal(hash(key), key, value, false, evict);
            }
        }
    }
复制代码

Here called tableSizeFor (t) to increase the threshold, use puVal () will turn into a new HashMap data in putVal () put into that section, look here tableSizeFor ()

    /**
     * 返回一个大于等于且最接近 cap 的 2 的幂次方整数
     * cap 无符号右移 1 位然后位或, 然后右移 2 位然后位或 3 .. 得到最终的结果
     * @param cap
     * @return
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
复制代码
Insert data
    public V put(K key, V value) {
        // onlyIfAbsent:false 表示如果存在则更新,不存在则插入
        return putVal(hash(key), key, value, false, true);
    }
    
    /**
     * 根据传入 key 的 hashCode 的无符号右移 16 位次方作为其 map 中的 hash 值
     * @param key
     * @return
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // 如果 table 为 null 先进行初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 如果 hash 后制定的槽位为 null 则直接放入数据即可
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            // 槽位存在数据就需要检查槽位链表是否存在对应的数据
            // 如果有根据策略选择是更新还是放弃
            // 如果没有这执行插入
            Node<K,V> e; K k;
            // 已经存在对应的 key 直接进行赋值后续根据 putIfAbsent 决定是否更新
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 如果当前节点是一个树节点那么将数据放入红黑树中
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // binCount 临时统计链表数量
                for (int binCount = 0; ; ++binCount) {
                    // 如果不存在对应的 key 则直接执行插入
                    if ((e = p.next) == null) {
                        // 创建一个新结点
                        p.next = newNode(hash, key, value, null);
                        // 当链表中的数据数量大于等于 8 的时候
                        // 需要进行树化
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // 如果已经存在对应的结点则直接返回后续根据 onlyIfAbsent 决定是否更新
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // 如果存在待更新的值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                // 是否更新数据
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // 修改次数 + 1
        // 该字段用于后续迭代器 fail-fast 
        ++modCount;
        // 数据量大于 threshold 进行 table 扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

复制代码

The code above there are a few key points, summarize

  • If the slot is not directly inserted data
  • If the data is detected insertion slot node key is not the current slot, which is the need to update the current data slot
  • Detecting whether there is a corresponding slot list select whether to update the key
  • If the tree node is directly inserted into the tree
  • If the list is also the slot of each corresponding key is being added
    • Detecting when the list of nodes to add more than 8 may be performed tree, note that this is possible, because there is also a determination condition treeifyBin the following will explain
  • Finally, if size> threshold for capacity expansion

Here we continue to analyze the resize () expansion code and treeifyBin () code tree

    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        // 如果数组槽位小于 64 不进行树化,而是对 table 进行扩容
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        // 否则进行树化
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            TreeNode<K,V> hd = null, tl = null;
            do {
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            if ((tab[index] = hd) != null)
                hd.treeify(tab);
        }
    }
复制代码

From this you can see that when a single linked list data and the table 8 or greater than or equal to the capacity of the array 64 will, when converted to red-black tree, otherwise, go to perform a resize () expansion, the linked list is longer than 8 by expansion of a single migration out.

final Node<K,V>[] resize() {
        // 重置之前暂记录之前数组桶的信息及相关配置信息
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        // 如果之前 table 中有数据的话
        if (oldCap > 0) {
            // 如果超出了最大容量值,设置 threshold 最大值
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // 将之前的 table 大小扩大一倍作为新的数组桶的容量,当然不能超出最大值
            // 前提是之前 table 大小要大于默认值,不然数据量小没有扩容的必要直接使用默认值即可
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                    oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }

        else if (oldThr > 0) // 如果之前 table 中没有数据,将之前 table 的 threshold 作为新 table 的容量大小
            newCap = oldThr;
        else {               // 如果 oldCap 与 oldThr 之前都没有指定那么使用默认值创建,初始化创建 map 其实就是进入的这个分支
            newCap = DEFAULT_INITIAL_CAPACITY;
            // 装载因子 * 默认容量大小作为新的 threshold
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // 如果新的 threshold == 0 使用新的容量大小重新计算
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
        }
        // 替换掉原先的 threshold 为新的值
        threshold = newThr;
        // 创建一个新的数组桶准备复制迁移数据
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        // 如果之前的 table 不为 null 开始迁移数据
        if (oldTab != null) {
            // 遍历之前的 table
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // 处理不为 null 的数据
                if ((e = oldTab[j]) != null) {
                    // 将原 table 中的数据置为 null 便于断开其可能存在的引用链利于垃圾回收
                    oldTab[j] = null;
                    // 如果只有数组桶的一个数据,也就是槽位链表没有数据,这直接放入新的 table 槽位即可
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // 如果节点是树节点 红黑树挡在单独章节分析 - TODO
                    // 如果链表结点数据小于 6 会将红黑树退化为链表
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // 处理 table 中槽位存在链表的情况并且不是树的情况,将原先的单个链表分化为 2 个链表
                        // 通过这段代码就避免了添加数据需要再次 hash puVal() 的低效率问题
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // 低位存储在 loHead 中
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else { // 否则放入 hiHead 链表中也就是 原索引槽位 + oldCap
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 将低位链表放置的位置与原先桶一样
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // 将高位链表反制的位置到原先的位置 + 原先的容量处
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
复制代码

When the red-black tree node data of less than 6, will degenerate into a linked list, here we look at this code list meet this condition will be placed on the original position of the slot, or we put slot index + oldCap position on, and why?

if ((e.hash & oldCap) == 0) {
    if (loTail == null)
        loHead = e;
    else
        loTail.next = e;
    loTail = e;
}
复制代码

For example, here, if the hash (key-A) = 14, cap = 16 then the old list position 14 & slot (16--1) = 14 to this location, because the cap at the time of expansion is 2 times of expansion It is the same as the new cap = 32, then the slot before the slot 14 of the linked list data migration need to do it? A simple idea is

  • If the resulting index slot as there is no need to make a change can be stored in its original position
  • Otherwise it will be migrating
  • Before java8 is calling putVal () is doing a hash & length - 1 to store data, java 8 optimized

14 & (32--1) = 14 indicates that the data do not need to migrate, look e.hash & oldCap = 14 & 16 = 0 indicates that the original position, a point to note here is the cap capacity is always 2 ^ n means only in a large data & than its data will be 16, for example (17, 18, 19 .... n) & 16 = 16, and 16 is just a current expansion unit, so e.hash> = 16 is data, the original position of the slot will be placed on the + 16.

retrieve data
    public V get(Object key) {
        Node<K,V> e;
        // 先计算其 hash 值然后调用 getNode
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // 如果有数据的话
        if ((tab = table) != null && (n = tab.length) > 0 &&
                (first = tab[(n - 1) & hash]) != null) {
            // 如果槽位中的数据 hash 值和 key 的 hash 相等
            // 并且他们的 key 相等(== 和 equals)
            // 那么槽位中的数据就是目标数据直接返回即可
            if (first.hash == hash && // always check first node
                    ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 如果槽位中存在链表
            if ((e = first.next) != null) {
                // 如果是红黑树就去红黑树中找 -- TODO
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    // 遍历链表知道找到目标值后返回
                    if (e.hash == hash &&
                            ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }
复制代码
delete data

They get to know is how to insert data and add data, delete data actually very simple

    public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
                null : e.value;
    }
    
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        // 如果 table 中存在 key 对应的 hash 值
        if ((tab = table) != null && (n = tab.length) > 0 &&
                (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            // 如果 key 就是对应槽位的 key 则找到数据
            if (p.hash == hash &&
                    ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            // 去槽位链表中查找
            else if ((e = p.next) != null) {
                // 如果是一个树去树节点中查找
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    do {
                        // 遍历槽位链表查找对应的数据
                        if (e.hash == hash &&
                                ((k = e.key) == key ||
                                        (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            // 如果找到了 key 对应的值
            // 根据后续的判断确定是否需要删除对应的数据结点
            // 默认 remove, matchValue: false 需要进行删除
            if (node != null && (!matchValue || (v = node.value) == value ||
                    (value != null && value.equals(v)))) {
                // 如果是树节点则删除树中的结点

                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                // 如果是 table 槽位上的值,则将其下一个结点复制到槽位上
                else if (node == p)
                    tab[index] = node.next;
                // 如果在槽位链表上删除当前节点
                else
                    p.next = node.next;
                // 修改次数 + 1 用于迭代器 fail-fast
                ++modCount;
                // 数据长度 - 1
                --size;
                // 删除后要做的事情留个子类实现
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

复制代码
Clear map
    public void clear() {
        Node<K,V>[] tab;
        // 修改次数 + 1
        modCount++;
        // 如果 table 不为空
        if ((tab = table) != null && size > 0) {
            // 重置 size 属性
            size = 0;
            // 遍历将每个槽位数据置位 null
            for (int i = 0; i < tab.length; ++i)
                tab[i] = null;
        }
    }

复制代码
It contains a value
    public boolean containsValue(Object value) {
        Node<K,V>[] tab; V v;
        // 如果 table 不为空
        if ((tab = table) != null && size > 0) {
            // 遍历 map 所有槽位
            for (int i = 0; i < tab.length; ++i) {
                // 遍历每个槽位链表如果找到则返回 true
                for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                    if ((v = e.value) == value ||
                            (value != null && value.equals(v)))
                        return true;
                }
            }
        }
        return false;
    }
复制代码

Guess you like

Origin juejin.im/post/5d790540f265da03c128c532