Depth understanding HashMap: those clever bit manipulation

capacity is always a power of 2

We all know the capacity of HashMap is always a power of 2, while HashMap is not a member of the named capacity, capacity is the size of the array as a table implicitly present.
When the user gives a strange HashMap configuration capacity, it will pass this.threshold = tableSizeFor(initialCapacity)calculate a user is just greater than or equal to the capacity of a given power of 2. For example, the user 10 to calculate 16; to 16, 16 is also calculated.

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

Then we began to explain:

  • Ignore plus one minus the foremost and a rearmost operation, to be understood that five -position or operation. See the figure, assume that the bit is n 000000...01XXXXXXXXXX...XXXXX, where 1 represents the first one from the left, the latter X represents an uncertainty (0 or 1). Then bits or processes as shown below:

Here Insert Picture Description

  • Seen the beginning of time, only know n the front there is a 1-bit bit. However, implementation of n |= n >>> 1the can in front of a determined 1*2number of 1; performed n |= n >>> 2, the front can be determined 2*2a 1 a. According to this process, carried out n |= n >>> 16can be determined in front of 16*2a 1 up. But int total of only four bytes, 32 bit, that is, executing the posterior or five times, left all bit behind the first one will certainly become 1 , but if n is relatively small, when less than 5 times or bit operation, has been left from the first such that a 1 1 is a full rear.
  • The reason that there is a final n+1operation, because the front of the bit or the left so that the first operation has a 1 until the last bit have become 1, the form 000111...11111, together with case 1, such that n can become 2 power, shaped like 001000...00000. The purpose of this function is to return the power of 2 as a capacity. Add an operation effect: that n bits or five post becomes a power of 2.
  • The reason why the previous int n = cap - 1operation, because if the parameter cap happens to be a power of 2, the function returns the number of hope itself. Cap is for example 1000: If an operation does not reduce, then executing the operation after the position or into a 1111, plus a 1, becomes 10000; but now with a Save operation, after a reduction of 0111, or execute the bit operation after, or 0111, together with a 1, change back itself 1000up. A subtraction operation function: that returned parameter cap 2 itself when the power is just.
  • The last (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1, is to determine the maximum capacity, and ultimately determine the capacity can not be compared MAXIMUM_CAPACITYalso big. Is defined MAXIMUM_CAPACITY 1 << 30i.e. 01000...000, because the number of sign bit MSB is the sign bit, so it has the largest power within the positive range of 2. If counted out after five or n bits than the 1 << 30bigger (it must be 01111...111), plus a 1 have overflowed.

When the index table to take, without modulo operation with &

HashMap entire source code can be seen everywhere this remove the standard operation first = tab[(n - 1) & hash](where n is the table that is the size of capacity, hash to the hash value of the key), in fact Normally, remove the standard here, we should use hash % n, so you can calculate in a 0 ~ n-1range of numbers as a subscript of the array. However, due to capacity is always a power of 2, so where n is also the power of two.

When n is a power of 2, with (n - 1) & hashthe results obtained, and hash % nare the same.
Here Insert Picture Description

  • N is now assumed 1000, then reduced to a 0111. If the hash is ...QWERXYZ, then after the execution & operation, hash to become 000...000XYZnamely XYZ.
  • From the range in terms of the range of requirements before 0 ~ 7, now three bit XYZmay also range 0 ~ 7.
  • In principle, 0111the equivalent of a mask played a role in the 0111post-& operation, bit the last three retained the remaining bit all zeros.
  • From the weight is concerned, the least significant bit bit of a weight of 1 ( 2 0 2^0 ), the penultimate bit of a weight of 2 ( 2 1 2^1 ) ... and so on. Then, and0111(1000after obtaining a subtracted) & performed operations will discard ownership value> = 2 3 2^3 bit, retaining only 2 2 2^2 2 1 2^1 2 0 2^0 bit of weights. This is not exactly the perfect time "to take over" Operation What!

resize () in a separate list

Part Source taken as follows:

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
    // 在do的逻辑里,是为了把e存到low或者high链表里
    // 在do的逻辑里,next只是存一下e的下一个元素
    next = e.next;
    // 现在多了一个bit能到影响元素的新table下标,所以看这个bit是否等于0
    // 如果这个bit等于0,说明新table下标和旧table下标是一样的
    if ((e.hash & oldCap) == 0) { 
         // 如果low链表的head和tail还没初始化,这里只要执行过一次,head和tail都不会是null的
        if (loTail == null)
            loHead = e;
        else
            loTail.next = e;// 把e赋值给tail的后继
        loTail = e;//更新tail
    }
    // 如果这个bit等于1,说明新table下标和旧table下标不一样的
    else {
        if (hiTail == null)
            hiHead = e;
        else
            hiTail.next = e;
        hiTail = e;
    }
} while ((e = next) != null);//将do逻辑里存的next赋值给e,把e这个指针往后移动,因为do逻辑已经处理了e
// 如果哈希桶里,所有元素的那个bit都为1,那么它们都会存到high链表里去。自然low链表为null
if (loTail != null) {
    loTail.next = null;
    newTab[j] = loHead;
}
// 如果哈希桶里,所有元素的那个bit都为0,那么它们都会存到low链表里去。自然high链表为null
if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}

resize function to expand and re-hashed, the hash buckets of the various elements will carry out alone to see if it is the original hash bucket should be placed inside it, or should be placed in a new hash bucket inside.

Note, however, the new hash bucket index is actually directly j + oldCapcounted out (j original hash bucket index, oldCap old capacity), the next will explain:
Here Insert Picture Description

  • 假设旧容量是0b1000016,那么可能的table下标范围为0b0000 - 0b1111,即能影响到元素所在table下标的bit只有后4位bit0b????
  • 假设有四个元素,它们的hash值的最后4位bit都是XYZQ,由于当前容量16的限制,它们会被放置到同一个哈希桶(table下标为0bXYZQ)里。
  • 现在resize里扩容后,新容量升为0b10000032,所以现在能影响到元素所在table下标的bit只有后5位bit0b?????,但相比之前,只有右起第5位bit可能发生变化。
  • 所以,如果这个关键bit为0,那么元素还是处于原table下标,如果这个关键bit为1,那么元素处于 原table下标+旧容量 的新下标。
  • 图中可见,原下标与新下标的相差值,刚好就是旧容量0b10000
  • 图中通过颜色来表示不同的元素,注意链表分离后,它们也能保持之前的相对位置。比如,蓝色元素还是在黄色元素前面,这是因为do while循环是按照原顺序处理的。

静态函数hash()

此函数是用来计算Key的hash值的。

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
  • 这里把key的hashCode与它本身无符号右移16位的异或结果作为返回值。
  • 根据之前提到的table数组取下标的操作,可以得知:当capacity比较小时,那么永远只有hash值的那几个低位bit能够影响到计算出来的table下标,而这可能会造成更多的哈希冲突。
  • So the source in use (h = key.hashCode()) ^ (h >>> 16), the upper 16 bits of this hash value or remain the same (as unsigned or exclusive right to fill any number is 0,0 itself) low, the hash value of 16 by 16 high impact , may change. Right 16 because int a total of only 32 bit, so to the right half of the bit, that is 16 bit.
    Here Insert Picture Description
    In an example 8 bit, unsigned shift right half of the bit, and then XOR up. So, the first half bit the same, after the half bit by the disturbances.

other

Based jdk8 source.

Published 171 original articles · won praise 130 · views 280 000 +

Guess you like

Origin blog.csdn.net/anlian523/article/details/103812105