Source resolve the underlying type of java HashMap

Finally came to more complex HashMap, due to internal variables, internal classes, methods than more, not as straightforward as ArrayList is laid out flat, so be prepared from the perspective of a number of specific cuts.

Bucket structure

Each storage location of HashMap, also called a bucket, when a Key & Value into the map, and a bucket to allocate storage based on its hash value.

Look at the definition of barrel: table is the so-called bucket structure, it means a node array.

transient Node<K,V>[] table;
transient int size;

node

HashMap is a map structure that is different from the structure Collection, instead of storing a single object, but is stored pairs.
Thus the basic storage unit is the internal nodes: Node.

Node is defined as follows:

class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

In addition to the visible nodes stored key, vaue, hash value of three, and a next pointer, such as a plurality of Node-way and form a list. This is one way to solve the hash conflicts, if multiple nodes are assigned to the same bucket, you can make up a list.

There is another HashMap internal node type, called TreeNode:

class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
 }

TreeNode is inherited from Node, which can be composed of a red-black tree. Why is there this stuff do? Mentioned above, if the node is hashed to the same bucket, it may lead to a particularly long list, so that access efficiency drops precipitously. At this time, if the key is comparable (Comparable interface implements), this list will be transferred to the HashMap a balanced binary tree, to restore some efficiency. In actual use, we expect this phenomenon never occurs.

With this knowledge, we can look at HashMap several related constants are defined:

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

TREEIFY_THRESHOLD, when a number of nodes inside the tub reaches this number list can be converted into a tree;
UNTREEIFY_THRESHOLD, when a barrel lower than this number inside the number, tree back into the list;
MIN_TREEIFY_CAPACITY, if this number is lower than the number of buckets, then the priority to expand the number of buckets, instead of converting the list into a tree;

put方法：Key&Value

Plug-in interface:

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

put method calls a private method putVal, but it is worth noting that, key the hash value is not directly used hashCode, the final hash = (hashCode right 16) ^ hashCode.

When the hash value is mapped to the position of the bucket, take part hash value is low, so that if there is only the upper part of a group of key inconsistencies, will gather in the same bucket inside. (If the number is relatively small bucket, key is the Float type, and is a consecutive integer, there will be such a case).

Implementation process inserted:

V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;

        //代码段1
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);            
        else {
            Node<K,V> e; K k;
            //代码段2
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            //代码段3    
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //代码段4
                for (int binCount = 0; ; ++binCount) {
                    //代码段4.1
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    //代码段4.2
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            //代码段5
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        //代码段6
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

Case processing period of the bucket array has not been assigned a beginning;
Snippet 1: i = (n - 1) & hash, calculated hash bucket corresponding position, as is the deep n times 2, which is an efficient modulo operation; if the location is empty, then directly create Node up into the opinions OK; otherwise the conflict node position denoted by P;
Code segment 2: If the key and the node P is equal to the incoming key, then that new value to be placed inside an existing node, referred to as E;
3 snippet: If a tree node P, then the key & value is inserted into the inside of the tree;
Code segment 4: P is the head of the list, or a single node, in both cases, can be done by scanning the linked list;
4.1 snippet: If the list to the tail, inserting a new node, and if necessary, to turn into a tree list;
4.2 snippet: If the list found equal key, the same processing and code segments 2;
5 snippet: The value stored in the node e
Code section 6: If the size exceeds a certain value, to adjust the number of buckets, the policy will talk about resize below

remove method

Learn the put method, remove method is easy, and direct private methods removeNode explain it.

public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}
    
Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    
    //代码段1
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        
        //代码段2：
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
            
        //代码段3:
        else if ((e = p.next) != null) {
            //代码段3.1:
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                //代码段3.2:
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        
        //代码段4:
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {                 
            //代码段4.1:
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            //代码段4.2:
            else if (node == p)
                tab[index] = node.next;
            //代码段4.3:
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

Listing 1: This condition is determined if the corresponding hash bucket is empty, and if so, then there is certainly not the map Key; otherwise referred to as a first node P;
Code segment 2: If the key parameter P node key equal to find a node to be removed, referred to as Node;
Code segment 3: scanning the other nodes inside the tub
Code section 3.1: If this is the bucket of a tree, the tree of execution lookup logic;
3.2 snippet: performing a scan chain logic;
4 snippet: If you find a node, then try to remove it
Code section 4.1: If a tree node, the node tree to perform deletion logic;
Snippet 4.2: node list is the first node, into the bucket on the node.next OK;
Code section 4.3: delete intermediate node list

rehash

rehash reassigning the barrel, and the original to the new node re-hash bucket position.

Look at the number two and barrels of associated member variables

final float loadFactor;
int threshold;

loadFactor load factor, a value is set when creating the HashMap, the upper limit of the ratio of the number of the number of entries contained in the barrel or map; map load once this value is reached, it is necessary to expand the number of buckets;
the number of threshold map to reach this value, you need to expand the barrel, its value is substantially equal 桶的容量*loadFactor, I feel that a cached value, accelerate related operations, do not always go computing;

Expansion strategy barrel, see the following function, if needed capacity cap, the real capacity is expanded offerings times greater than the cap of a 2.
So dependent, each extended, increased capacity is a multiple of two.

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

This is a logical extension of the specific

Node<K,V>[] resize() {
    
     //此处省略了计算newCap的逻辑

    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                
                //分支1
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //分支2
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //分支3
                else { // preserve order
                    //此处省略了链表拆分逻辑   
                }
        }
    }
    return newTab;
}

First, assign a new bucket array;
Scan old barrels, will be migrated element;
Branch 1: there is only one bucket new node, then placed in a position corresponding to the new bucket;
Branch 2: bucket of a tree, the tree splitting logic execution
Branch 3: bucket which is a linked list, the linked list split execution logic;

As the number of new barrels is a multiple of the old barrel 2, so each can correspond to two old barrels or more new barrels, without disturbing each other. So the above migration logic does not need to check whether there are new barrels node.

Be seen, the cost is great rehash, preferably during initialization, it is possible to set a suitable capacity, to avoid rehash.

Finally, although the above code does not reflect, in HashMap life cycle, the number of buckets will only increase, not decrease.

Iterator

Iterator is the heart of all this HashIterator

abstract class HashIterator {
    Node<K,V> next;        // next entry to return
    Node<K,V> current;     // current entry
    int expectedModCount;  // for fast-fail
    int index;             // current slot

    final Node<K,V> nextNode() {
        Node<K,V>[] t;
        Node<K,V> e = next;
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        if (e == null)
            throw new NoSuchElementException();
        if ((next = (current = e).next) == null && (t = table) != null) {
            do {} while (index < t.length && (next = t[index++]) == null);
        }
        return e;
    }
}

For simplicity, leaving only the code next section. The principle is very simple, next points to the next node, which is certainly in a barrel (barrel position is the index). So if there are other nodes in the same bucket, you will be able to find along next.next, whether it is a linked list or a tree. Otherwise, scan the next barrel.

With the above node iterator, visible to other users of iterators are achieved through it.

final class KeyIterator extends HashIterator
    implements Iterator<K> {
    public final K next() { return nextNode().key; }
}

final class ValueIterator extends HashIterator
    implements Iterator<V> {
    public final V next() { return nextNode().value; }
}

final class EntryIterator extends HashIterator
    implements Iterator<Map.Entry<K,V>> {
    public final Map.Entry<K,V> next() { return nextNode(); }
}

view

KeySet part of the code: This is not a separate Set, but a view, its interface is HashMap internal access data.

final class KeySet extends AbstractSet<K> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<K> iterator()     { return new KeyIterator(); }
    public final boolean contains(Object o) { return containsKey(o); }
    public final boolean remove(Object key) {
        return removeNode(hash(key), key, null, false, true) != null;
    }
}

EntrySet, Values and KeySet is similar, not repeat them.

Takeaways

1, key & value stored in the nodes;
2, node there may be a linked list node, there may be a tree node;
3, assigned the tub to the node based on key hash value;
4, if the bucket which has a plurality of nodes, then either forming a linked list or a tree form;
5, the load factor limiting the number of nodes and the ratio of the tub, the tub expands the number if necessary;
6, the number of times the tub 2 must be deep, the reallocation process is called the rehash the tub, which is very expensive operation;