15, HashMap works and the expansion mechanism

1. HashMap works

HashMap as good Java Collections Framework is an important member, in many programming scenarios work for us. HashMap as a hash table data structure to achieve, in terms of its works are listed separately in terms of a blog, but points. Because this paper is a brief summary of its expansion mechanism, so for the realization of the principle of HashMap do a brief overview only.

HashMap is implemented inside a bucket array, each bucket stored in the first node of a single linked list. Wherein each node stores a key-value pair is integral (Entry), HashMap resolve hash collision method using the fastener (hash collision will be introduced later on).

Since Java8 some places HashMap optimized, the following summary and source code analysis is based on Java7.

Diagram is as follows:

20171119123859600.png

HashMap provides two important basic operations, put (K, V), and get (K).

When the put operation invoked, HashMap key K calculated hash value is then mapped to the HashMap one bucket (bucket) on; found in this case the tub for the first node a single chain, then sequentially traversed single list to find a node in the Entry Key is equal to the given parameter K; if found, it is replaced with the old V V specified parameter; Entry otherwise insert a new node is directly linked list tail.

General idea put function:

1, the key hashCode () to do the hash, then calculate the index;
2, if not directly into the bucket in a collision;
3, if the collision, the linked list in the form of buckets;
. 4, if the list is too long resulting in a collision ( greater than or equal 5, TREEIFY_THRESHOLD), converted into a red-black tree list put;
5, if the node already exists replaces old value (key guarantee uniqueness)
6, if the bucket is full (more than load factor * current capacity), will resize.

public V put(K key, V value) {
    // HashMap允许存放null键和null值。
    // 当key为null时，调用putForNullKey方法，将value放置在数组第一个位置。
    if (key == null)
        return putForNullKey(value);
    // 根据key的keyCode重新计算hash值。
    int hash = hash(key.hashCode());
    // 搜索指定hash值在对应桶中的索引。
    int i = indexFor(hash, table.length);
    // 如果 i 索引处的 Entry 不为 null，通过遍历桶外挂的单链表
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            // 如果发现已有该键值，则存储新的值，并返回原始值
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    // 如果i索引处的Entry为null，表明此处还没有Entry。
    modCount++;
    //遍历单链表完毕，没有找到与键相对的Entry，需要新建一个Entry放在单链表中，
    //如果该桶不为空且数量达到筏值，有可能触发扩容
    addEntry(hash, key, value, i);
    return null;
}

For get (K) similar to the operation of the put operation, the HashMap key by the hash value calculation, to find the corresponding bucket, and then traversing the tub single list stored cf. Entry of a value to find the corresponding key.

function to get a rough idea

1, the first node of the bucket (the absence of a single chain), direct access;
2, if there is a conflict, key.equals through (K) to find the corresponding entry
when a tree, the tree through the key.equals (k) to find, O (logn);
if the linked list, is searched by key.equals (k) in the list, O (n).

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 直接命中
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 未命中
        if ((e = first.next) != null) {
            // 在树中get
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 在链表中get
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

The most important reason frequently generate hash collisions like too much Entry to be stored, and the bucket is not enough, this and similar conflicts in short supply. Therefore, when the HashMap stored Entry more, we will consider increasing the number of buckets, so to be stored for subsequent Entry is concerned, it will greatly ease the hash collision.

So it comes to HashMap expansion, regarded as the top answer why the expansion, so when expansion? How much expansion? How expansion? The second part is to summarize the

2. HashMap expansion

2.1 HashMap expansion opportunity

HashMap using the process, we often encounter such a constructor with parameters.

public HashMap(int initialCapacity, float loadFactor) ;

The first parameter: the initial capacity, initial specified number of the tub; corresponds to the size of the bucket array.
The second parameter: loading factor, is a coefficient between 0 and 1, based on its capacity is needed to determine the threshold value, the default value is 0.75.
Alibaba to develop the manual there is such a description:

image.png

Remember when put to explain the above function, if not singly linked list traversal corresponding Entry found it? This time is called a addEntry (hash, key, value, i) function, add an Entry in a single list.

 void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
          //当size大于等于某一个阈值thresholdde时候且该桶并不是一个空桶；
          /*这个这样说明比较好理解：因为size 已经大于等于阈 值了，说明Entry数量较多，哈希冲突严重， 
            那么若该Entry对应的桶不是一个空桶，这个Entry的加入必然会把原来的链表拉得更长，因此需要扩容；
            若对应的桶是一个空桶，那么此时没有必要扩容。*/
            //将容量扩容为原来的2倍
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            //扩容后的，该hash值对应的新的桶位置
            bucketIndex = indexFor(hash, table.length);
        }
        //在指定的桶位置上，创建一个新的Entry
        createEntry(hash, key, value, bucketIndex);
    }
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);//链表的头插法插入新建的Entry
       //更新size
        size++;
    }

size is the number of records contained in the Entry map
And threshold is the need to resize the recording threshold value and threshold = loadFactor * capacity
capacity is actually the length of the barrel

threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);

So now we summed up the expansion of opportunity:

When the number of the Entry map contains greater than or equal threshold = loadFactor * capacity when new and Entry just falls on a non-empty bucket, at the moment the trigger mechanism of the expansion, the expanded capacity is doubled.

When the size of greater than or equal threshold, will not necessarily trigger the expansion mechanism (such as increasing the entry corresponding to an empty barrel, then loaded directly inside the empty barrel, if the correspondence is not empty bucket, will lengthen the list, it will trigger expansion), but will likely to trigger expansion mechanism, as long as there is a new Entry hash collision occurs, then immediately resize.

3. Summary

We can now begin to answer a few questions, and deepen their understanding of the HashMap:

When will use HashMap? He What are the characteristics?
Map interface is implemented based on, when the key-value pair is stored, it can receive the key null, non-synchronized, the HashMap stored Entry (hash, key, value, next) object.
Do you know works HashMap do?
HashMap is actually a "hash list" data structure, i.e., a combination of arrays and lists. It is a non-Map interface hash table synchronization implementation.
He is based on the principle of hashing algorithm to store and retrieve values and get (key) method put (key, value) of.

Deposit: We key-value pair K / V passed to the put () method, which calls the hashCode () method of calculating K target hashCode bucket position whereby, after storage Entry objects. (The HashMap is stored in the key object in the bucket and the object values, as of Map.Entry)
taken: acquiring object, we pass the key to the get () method, and then call the K hashCode () method to obtain further hashCode acquired bucket position, recall of K equals () method to determine the key value pairs returned object.

Collision: hashcode when the two objects are the same, they have the same bucket position, 'collision' occurs. How to solve, it is the use of the list structure for storage, namely the use of HashMap LinkedList storage object. However, when the chain length is greater than 8 (the default), the list will be converted into red-black trees, red-black insertion acquisition operations in the tree.

Expansion: If the size exceeds HashMap load factor is defined as the capacity for expansion will. The default load factor is 0.75. That is, when a map filled with 75% of the bucket when the bucket will create an array of HashMap twice the size of the original (jdk1.6, but not exceeding the maximum capacity) to resize the map, and the original objects placed in a new bucket array. This process is called rehashing, since it calls the hash method to find new bucket position.

3. Why is the expansion to expansion in multiples of 2?
The answer of course is for performance. When the tub is positioned HashMap location via a hash value of the key, called a indexFor (hash, table.length); method.

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

To & bitwise AND operator
can see that there is a hash value h and the bucket of the array length-1 (actually map capacity -1) were obtained with the operating position of a corresponding tub, h & ( length-1).

But why not use this calculation h% length of it?
Because Java is% / 10 operating times slower than about & therefore use & computing improve performance.

By limiting the length is a power of a 2, h & (length-1) h% length and the results are consistent. That's why you want to restrict capacity must be the cause of a power of 2.

For a simple example illustrates the consistency of the results of these two operations:

Suppose there is a hashcode 311, corresponding to a binary (100,110,111)

length is 16, the corresponding bit (10,000)

% Action: 311 = 19 + 16 * 7; 7 therefore result, the bit (0111);

& Operation: (100110111) & (0111) = 0111 = 7 bits (0111)

100110111 = (100,110,000) + (0111) = (1 2 ^ 4 + 1 2 ^ 5 + 0 2 ^ 6 + 0 2 ^ 7 + 1 2 ^ 8) + 7 = 2 ^ 4 (1 + 2 + 0 + 0 + 16) * 19 + 7 + 7 = 16; and% uniform operation.
If the length is the number of a power of 2, then the length-1 becomes a mask, it will be taken out hashcode low, low hashcode of the actual number is more than, and take over operations compared with operational performance improvement will be a lot of .
Overall, the provisions lengh is a power of 2, you can get when calling indexFor method subscript barrels, using bits and operations, and operations will be higher than the position I take a lot of computing performance.

Reproduced in: https: //www.jianshu.com/p/c3633291ecda