Java source code Series 2 - HashMap

HashMapSource are many and complex, this is just a simple removal of commonly used parts of the code analysis. Limited capacity, please correct me.

HASH calculated value

Pre-knowledge - bit computing

The bitwise exclusive OR operator ^: 1 ^ 1 = 0, 0 ^ 0 = 0, 1 ^ 0 = 0, the same value is 0, the value 1 is different. Bitwise XOR is for every one of the binary XOR operation.

   1111 0000 1111 1110
^  1111 1111 0000 1111
______________________
   0000 1111 1111 0001
复制代码

Zero padding bit right shift operator >>>: RIGHT left operand specified by the operand digits right, moving to fill the resulting gaps with zero.

        1110 1101 1001 1111
>>> 4  
___________________________
        0000 1110 1101 1001
复制代码

Perturbation function

Why do disturbances?

Theoretically hash value is a inttype, if the value of the direct Naha Xi subscripts do so, taking into account the range of values in Table 2 int 32-bit signed binary from -2147483648 to 2147483648. Front and rear add up to about 40 billion in mapping space. Such a large array, the memory is less than the deposit, so the hash value can not be directly used with. Also the length of the array do first modulo operation prior to use, the remainder obtained to be used to access the array index.

Since taking only the last few, so greatly increased the possibility of a hash collision, this time the value of the disturbance function came.

Perturbation calculation

First call the hashCode()method derived hash value, then the disturbance operations.

16-bit right shift, the 32bit exactly half ( int32-bit), its upper half and a lower half region XOR, is to high and low level mixed original hash code, in order to increase the low-order stochastic sex . After mixing and low doped upper portion of the feature, such information can also disguise retained high.

static final int hash(Object key) {
  int h;
  return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
复制代码

Modulo index is calculated

In the calculation of target time, so that the length of the hash value list do a modulo operation, so that the calculated hash value within the range of the list, for the n- listlength

i = (n - 1) & hash
复制代码

Why the HashMaplength of the array to take the whole power of 2

Because of this (Array Length - 1) corresponds exactly to a " low mask ." &The result is high-hash value of the entire operation to zero, leaving only low-value, used for indexed access the array. 16 to the initial length, for example, is 16-1 = 15,2 0000 1111 hexadecimal. Make a hash value, and &operates as follows:

    1010 0011 0110 1111 0101
&   0000 0000 0000 0000 1111
____________________________
    0000 0000 0000 0000 0101
复制代码

What is a deposit table

HashMapStored tablevalue is not the only value, but is configured as a Node object instance stored in the table.

Node objects there: hash, (linked list when hash collision) key, value, next

Theoretical maximum capacity

int MAXIMUM_CAPACITY = 1 << 30;

2 to the power 30

Load factor

Load factor is used to calculate the load capacity (the maximum number that can be accommodated Node), the length of the length of the current list, the load factor loadFactor

The load capacity is calculated as:

threshold = length * loadFactor

The default load factor is 0.75. That is, when the number of Node 75% of the length of the current list, the expansion will be carried out, otherwise it will increase the likelihood of a hash collision. Load factor is the role of strike a balance in space and time efficiency.

float DEFAULT_LOAD_FACTOR = 0.75f

Expansion done what actions

  1. Entry to create a new empty array, the length is 2 times the original array. When the load exceeds the number of Node capacity for expansion.

old << 1 The left one is equivalent to old * 2.

  1. Re Hash

    Entry traverse the original array, all the re-Hash Entry to the new array.

    Why re-hash? Because after expansion length, hash value is also changed (the array subscripts of the array length is calculated hashcode modulo).

    So that you can list the original hash collision flare, the array becomes sparse.

final Node<K,V>[] resize() {
    // 保存现有的数组
    Node<K,V>[] oldTab = table;
    // 保存现有的数组长度
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // 保存现有的负载容量
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // 如果现有容量已经超过最大值了,就没办扩容了,只好随你碰撞了
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 原有容量左移一位,相当于 oldCap * 2
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                    oldCap >= DEFAULT_INITIAL_CAPACITY)
            // 负载容量也扩大一倍
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 负载容量为0,根据数组大小和负载因子计算出来
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // 遍历数组中所有元素,重新进行hash
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                // 删除旧索引位置的值
                oldTab[j] = null;
                if (e.next == null)
                    // 给新的索引位置赋值
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // 优化链表
                    // 把原有链表拆成两个链表
                    // 链表1存放在低位(原索引位置)
                    Node<K,V> loHead = null, loTail = null;
                    // 链表2存放在高位(原索引 + 旧数组长度)
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // 链表1
                        // 这个位运算的原理可以参考第三篇参考资料
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 链表2
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 链表1存放于原索引位置
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 链表2存放原索引加上旧数组长度的偏移量
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
复制代码

Tree transformation

The list is too long, it will be transformed into a red-black tree.

When the length of the list exceeds the MIN_TREEIFY_CAPACITYmaximum threshold of the tree, the tree will be transformation.

final void treeifyBin(Node<K,V>[] tab, int hash) {
  int n, index; Node<K,V> e;
  if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
    resize();
  else if ((e = tab[index = (n - 1) & hash]) != null) {
    ...
  }
}
复制代码

Why tree?

Is a security problem ** nature. ** Because the query performance impact linked list, if someone maliciously causing a hash collision, it will constitute a hash collision denial of service attacks, the server CPU is used to list query-intensive, resulting in slow service or unavailable.

Source series

Java source code Series 1 - ArrayList

Java source code Series 2 - HashMap

Java source code Series 3 - LinkedHashMap

reference

Java 8 series of new understanding HashMap

What is the principle hash method HashMap JDK source code is? Fat Jun answer

HashMap detailed source code analysis (JDK1.8)

This article first appeared on my personal blog chaohang.top

Author Zhang super

Please indicate the source

I welcome attention to the micro-channel public number [Ultra] will not fly, get updates first time.

Guess you like

Origin juejin.im/post/5e5273fde51d4526d87c6de4