Depth analysis HashMap source

Brief introduction

Language Java HashMap is commonly used to store key data collection class types. With JDK (Java Developmet Kit) version of the update, JDK 1.8 HashMap to achieving the underlying optimized, but also the underlying implementation from the previous replaced by an array of linked lists + + + array of red-black tree list . HashMap common method is as follows:
Create a map

Map<String, String> map = new HashMap<>();

//如果 key 不存在则插入数据,如果 key 已存在则更新数据
map.put("test", "哈哈");

//根据 key 获取 value
map.get("test");

//上面已经插入了 key,这里相当更新 key 的 value
map.put("test", "呵呵");

//删除 key 及 value
map.remove("test");

//遍历
for (String key : map.keySet()) {
    System.out.println(key);
}
复制代码

Analysis of the principle of
the start HashMap JDK 1.8 using a linked list arrays + + underlying red-black tree to achieve, as shown below:
Here Insert Picture Description

Seen from the source, the HashMap class has a very important field is the Node [] Table hash bucket array i.e. , it is obviously an array of Node. We look at what Node Yes.

static class Node<K, V> implements Map.Entry<K, V> {
    final int hash;//用来定位数组索引位置
    final K key;
    V value;
    Node<K, V> next;//链表的下一个 node

    Node(int hash, K key, V value, Node<K, V> next) { ... }
    public final K getKey(){ ... }
    public final V getValue() { ... }
    public final String toString() { ... }
    public final int hashCode() { ... }
    public final V setValue(V newValue) { ... }
    public final boolean equals(Object o) { ... }
}
复制代码

Node is an internal class HashMap achieve a Map.Entry interface is essentially a mapping (key-value pairs). Each black dot in the figure above is a Node object.

Construction method

First we look at the constructor HashMap from source code you can see there are four HashMap constructor.

Constructor 1

public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
}
复制代码

Constructor 2

public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
复制代码

Constructor 3

public HashMap(int initialCapacity, float loadFactor) {
   if (initialCapacity < 0)
       throw new IllegalArgumentException("Illegal initial capacity: " +
                                          initialCapacity);
   if (initialCapacity > MAXIMUM_CAPACITY)
       initialCapacity = MAXIMUM_CAPACITY;
   if (loadFactor <= 0 || Float.isNaN(loadFactor))
       throw new IllegalArgumentException("Illegal load factor: " +
                                          loadFactor);
   this.loadFactor = loadFactor;
   this.threshold = tableSizeFor(initialCapacity);
}
复制代码

构造方法 4
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}

The above four construction method, usually use the most should be the first one. The first structure is simple, only the loadFactor variable is set to the default value. 2 constructor calls the constructor 3, but still only 3 constructor sets some variables. Shadow Copy constructor 4 sucked into another copy of the Map storage structure to its own, this method is not very common.

We have to understand a few fields under HashMap, we can see the source code from the default constructor method of HashMap, the constructor is to initialize a few fields below:

/** The default initial capacity - MUST be a power of two. */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/** The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75f;// 负载因子

/** The next size value at which to resize (capacity * load factor). */
int threshold;// 所能容纳的 key - value 对极限 

/** The load factor for the hash table. */
final float loadFactor;
复制代码

Firstly, the Node [] Table initialization length length (default is 16) , loadFactor load factor (default is 0.75), threshold is the maximum amount of data can accommodate the Node HashMap (key-value pairs) number . threshold = length * loadFactor. That is, after an array of well defined length, the greater the load factor, can accommodate the number of key-value pair more.

Define load factor binding Formula understood, threshold and is herein loadFactor length (array length) corresponding to the maximum allowed number of elements exceeds the number of re resize (expansion), the expansion HashMap twice the capacity before the capacity . The default load factor of 0.75 is a balance of space and time efficient option, it is recommended that you do not modify, except in time and space rather special circumstances, if a lot of the memory space of time but demanding efficiency, can reduce the load factor loadFactor value; the contrary, if the memory space is tight and less demanding on time efficiency, increase the value of the load factor loadFactor, this value may be greater than 1.

Here there is a problem, even if the load factor and reasonable then Hash algorithm design, inevitably there will be too long zipper case, once the zipper appear too long, it will seriously affect the performance of HashMap. Thus, in JDK1.8 version, the data structure has been further optimized, the introduction of red-black tree. And when the list is too long (more than 8 default), the list is converted to a red-black tree , red-black tree using the quick CRUD features to improve the performance of HashMap, which will use red-black tree insert, delete, search, etc. algorithm.

Key index position

Whether to add, delete, find the key-value pairs , navigate to the location of the hash bucket array are very critical first step. Preceding said data structure is a combination of arrays and HashMap linked list, of course, we want to position the elements inside the HashMap possible distribution of these, as far as possible so that the number of elements in each position is only one. So when we use the hash algorithm to obtain this position when we can immediately know the location of the corresponding elements of what we want, without traversing the list, greatly improving the efficiency of the query.

The method of calculating the hash as follows:

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
复制代码

Here key.hashCode () function call is the key that comes with the key type of hash function that returns a hash value of type int. Theoretically a hash value is of type int , if directly used as an index to access the main array, then the HashMap, taking into account the range of int 32-bit signed binary from -2147483648 to 2147483647, the front and rear add about 40 million mapping space. Uniform as long as the hash function mapping loose, general application is difficult to occur collision.

But the problem is an array of four billion length, the memory is not fit. As used herein, the length of the array so that the modulo operation to get the remainder for the orientation of the array index.

bucketIndex = indexFor(hash, table.length);
复制代码

The JDK 1.7 Gets an array index position method:

static int indexFor(int h, int length) {
    return h & (length - 1);//相当于 h % length
}
复制代码

Here just explains why the length of the array to be designed for the whole HashMap power of 2, such as h & (length - 1) corresponds exactly to h% length. Since the calculation of the efficiency does not take more than a high-bit computing, it is a small optimization , detailed description mod, please refer to Wikipedia - mod - performance issues.

But the problem again, so even if we hash value distribution loose again, just take the last few words, the collision will be very serious, not to mention the hash itself is not perfect. So here the source code to do a bit high displacement, will also join the high computing.
Here Insert Picture Description

Right here 16 32bit exactly half of the upper half and lower half of the area XOR, is to mix high and low original hash code to increase the randomness of the low. It was added and the lower part of the mixed characteristics of high, high-order information is also retained.

put()
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 1. tab 为空则创建
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 2. 计算 index,并对 null 做处理 
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        // 3. 节点 key 存在,直接覆盖 value
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 4. 判断该链为红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
             // 5. 该链为链表,对链表进行遍历,并统计链表长度
            for (int binCount = 0; ; ++binCount) {
                // 链表中不包含要插入的键值对节点时,则将该节点接在链表的最后
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 如果链表长度大于或等于树化阈值,则进行树化操作
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 条件为 true,表示当前链表包含要插入的键值对,终止遍历
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 判断要插入的键值对是否存在 HashMap 中
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // onlyIfAbsent 表示是否仅在 oldValue 为 null 的情况下更新键值对的值
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 6. 超过最大容量时,则进行扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
复制代码

Analyzing key array Table [] is empty or is null, or performs a resize () for expansion;

Calculated based on a key Key the hash worth inserted to an array index I , if the table [i] == null, a new node is added directly to the steering 6, if the table [i] is not empty, the steering 3;

Determining whether the table [i] and the first element in the key as if the same direct coverage value, else go to 4 , referring to here is the same H ashCode and the equals ;

Analyzing table [i] whether the treeNode, i.e., table [i] whether the red-black tree , if a red-black tree, the insertion of the key in the tree directly, else go to 5 ;

Traversing table [i], determines the length of the list is greater than 8 , more than 8 in the list is converted into red-black trees , red-black tree in performing insertion operation, otherwise the list for insertion; traversal key already exists if found directly covered value can be;

After successful insertion, to determine the actual number of existing key size over whether more than the maximum capacity threshold, if exceeded, for expansion.

Expansion mechanism

Expansion (a resize) is recalculated capacity , the elements are added to the non-stop HashMap object, and the array can not be loaded inside the HashMap object more elements, the object needs to be expanded length of the array, in order to mount more elements. Of course, Java in the array is not automatic expansion of the method is to use a new array in place of the existing array of small capacity, as we use a small bottled water, if you want to hold more water, you have to change a large bucket .

We analyzed under resize the source, in view of the JDK 1.8 into the red-black tree, more complex, in order to facilitate understanding of the code we still use JDK 1.7, a good understanding of some, not very different in nature, the specific differences article say.

void resize(int newCapacity) {   //传入新的容量
    Entry[] oldTable = table;    //引用扩容前的 Entry 数组
    int oldCapacity = oldTable.length;         
    if (oldCapacity == MAXIMUM_CAPACITY) {  //扩容前的数组大小如果已经达到最大(2^30)了
        threshold = Integer.MAX_VALUE; //修改阈值为 int 的最大值(2^31-1),这样以后就不会扩容了
        return;
    }
  
    Entry[] newTable = new Entry[newCapacity];  //初始化一个新的 Entry 数组
    transfer(newTable);                         //!!将数据转移到新的 Entry 数组里
    table = newTable;                           //HashMap 的 table 属性引用新的 Entry 数组
    threshold = (int)(newCapacity * loadFactor);//修改阈值
}
复制代码

Here is the use of a greater capacity to replace the existing array of small-capacity array, transfer () method to the array elements of the original Entry Entry copied to the new array.

void transfer(Entry[] newTable) {
    Entry[] src = table;                   //src引用了旧的Entry数组
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) { //遍历旧的Entry数组
        Entry<K,V> e = src[j];             //取得旧Entry数组的每个元素
        if (e != null) {
            src[j] = null;//释放旧Entry数组的对象引用(for循环后,旧的Entry数组不再引用任何对象)
            do {
                Entry<K,V> next = e.next;
                int i = indexFor(e.hash, newCapacity); //!!重新计算每个元素在数组中的位置
                e.next = newTable[i]; //标记[1]
                newTable[i] = e;      //将元素放在数组上
                e = next;             //访问下一个Entry链上的元素
            } while (e != null);
        }
    }
}
复制代码

newTable [i] is assigned to the reference e.next , i.e. using single mode insertion head of the list, the new element is always placed in the same position on the head position in the list; this element is placed on a first final index Entry will be placed in the tail of the chain (if the hash conflict then), which is differentiated and Jdk 1.8, explain below. In the old array elements on the same piece of Entry chain, by re-computing the index position, there is likely to be placed in a different position on the new array.

The following example illustrate the expansion process. We assume that the hash algorithm is simple with key mod about the size (ie, length of the array) table (%). Wherein the hash table bucket array size = 2, so the key = 3,7,5, put in the order of 5,7,3. In mod (%) after 2 are in conflict table [1] here. It is assumed that the load factor loadFactor = 1, i.e. for expansion when the actual size of the actual size is larger than the size of the key of the table. The next three steps to resize hash bucket array 4, and then re all Node rehash process.
Here Insert Picture Description
Below we explain under what had been done to optimize JDK1.8. Can be found through the observation, we are using the secondary expansion power of (refers to a length of 2 times the original diffuser) , therefore, the position of the element either in situ, or moving position, then the power of 2 in the original position . See the figure will be appreciated that this means, of n-length table, view (a) shows an example of the index position before expansion key1 and key2 two kinds of key determination, (b) of a rear expansion key key1 and key2 two kinds of determining the index position of the sample, which corresponds to the hash key1 hash1 is high and the operation result.

After recalculating the hash element, as n becomes two times, then n-1 mask range in the high multiple 1bit (red), so the new index will be such changes:
Here Insert Picture Description

Therefore, when we expand the HashMap, not like the realization JDK1.7 as recalculated hash, only need to look at new original hash value that is 1 or 0 bit like, is 0, then the index has not changed, 1 is then indexed into the "original index + oldCap", you can look at the schematic picture shows the 16 expanded to resize 32:
Here Insert Picture Description
this design is very clever indeed, not only eliminates the need for time to recalculate the hash value, and at the same time, due to the new growing 1bit 0 or 1 may be considered to be random, resize process, uniform dispersion of the node to the conflict before the new bucket. This is a new optimization JDK 1.8 points. Note that there is little difference, JDK 1.7 rehash in time, when the old list of new migration list, if the same array index in the new table position, the list elements will be reversed, but can be seen from the figure, JDK 1.8 will not be inverted. Interested students can study under JDK 1.8 resize the source code, written in great praise, as follows:

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // 如果 table 不为空,表明已经初始化过了
    if (oldCap > 0) {
        // 当 table 容量超超过最大值就不再扩充了,就只好随你碰撞去吧
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 没超过最大值,就扩充为原来的 2 倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        /*
         * 初始化时,将 threshold 的值赋值给 newCap,
         * HashMap 使用 threshold 变量暂时保存 initialCapacity 参数的值
         */ 
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
         /*
         * 调用无参构造方法时,桶数组容量为默认容量,
         * 阈值为默认容量与默认负载因子乘积
         */
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 计算新的 resize 上限
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // 创建新的桶数组,桶数组的初始化也是在这里完成的
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        // 把每个 bucket 都移动到新的 buckets 中
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // 重新映射时,需要对红黑树进行拆分
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // 遍历链表,并将链表节点按原顺序进行分组
                    do {
                        next = e.next;
                        // 原索引
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 原索引 + oldCap
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 原索引放到 bucket 里
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 原索引 + oldCap 放到 bucket 里
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
复制代码

get()

HashMap lookup operation is relatively simple to find the principle of step with the article about the same, that is, first locate where the key position of the bucket, and then on the list or to find red-black tree . This lookup is completed by two steps, operations related to the code is as follows:

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 1. 定位键值对所在桶的位置
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            // 2. 如果 first 是 TreeNode 类型,则调用黑红树查找方法
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 3. 对链表进行查找
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}
复制代码

remove()

HashMap delete operation is very simple, just three steps to complete. The first step is to locate the position of the bucket, the second step traverse the list and find the equivalent key node, the third step delete nodes . The relevant source code as follows:

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        // 1. 定位桶位置
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        // 如果键的值与链表第一个节点相等,则将 node 指向该节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {  
            // 如果是 TreeNode 类型,调用红黑树的查找逻辑定位待删除节点
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // 2. 遍历链表,找到待删除节点
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        // 3. 删除节点,并修复链表或红黑树
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}
复制代码

Traversal

And look like everyone traversal operation is a relatively high frequency of operation. For traversing HashMap, we will generally use the following ways:

for(Object key : map.keySet()) {
    // do something
}
复制代码
或

for(HashMap.Entry entry : map.entrySet()) {
    // do something
}
复制代码

As can be seen from the above code fragment, usually set to the HashMap key or set of Entry traverse. The above code snippet with foreach generated through the collection keySet method, at compile time is converted into an iterative traversal, is equivalent to:

Set keys = map.keySet();
Iterator ite = keys.iterator();
while (ite.hasNext()) {
    Object key = ite.next();
    // do something
}
复制代码

We find that during the traversal of HashMap, when repeatedly HashMap is traversed through the results are consistent order. But this order of insertion order and are generally inconsistent .

public Set<K> keySet() {
    Set<K> ks = keySet;
    if (ks == null) {
        ks = new KeySet();
        keySet = ks;
    }
    return ks;
}

final class KeySet extends AbstractSet<K> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
    // 省略部分代码
}

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    HashIterator() {
        expectedModCount = modCount;
        Node<K,V>[] t = table;
        current = next = null;
        index = 0;
        if (t != null && size > 0) { // advance to first entry
            // 寻找第一个包含链表节点引用的桶
            do {} while (index < t.length && (next = t[index++]) == null);
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Node<K,V> nextNode() {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
        if ((next = (current = e).next) == null && (t = table) != null) {
            // 寻找下一个包含链表节点引用的桶
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }

    public final void remove() {
        Node<K,V> p = current;
        if (p == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        current = null;
        K key = p.key;
        removeNode(hash(key), key, null, false, false);
        expectedModCount = modCount;
    }
}复制代码

As above code, through all of the keys, first to obtain the object KeySet key set, then the traversal of KeyIterator KeySet by the iterator . KeyIterator class inherits from class HashIterator, encapsulated in the core logic HashIterator class. HashIterator logic is not complicated, at initialization, starting with bucket array HashIterator found bucket list node contains references. Then the barrel pointed to traverse the list. After the traversal is complete, and then continue to look for the next bucket list node contains references to find continue traversal. Can not be found, then the end of the traverse .


Guess you like

Origin juejin.im/post/5d247595f265da1bb47d8adf