JDK source code HashMap principle of the method of hash

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/GoSaint/article/details/88977196

First to a piece of code, look HashMap is how to calculate the hash value.

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This code is called the above-described function disturbance. ⾯ zoomed all know the code on the ⾥ key.hashCode () function call Using the hash function with the self-key type of the key, it returns an int hash value. Theoretically a hash value is ⼀ int type, if the hash value as a direct holding HashMap main array subscripting, then, taking into account the range of values ​​in Table 2 int 32-bit signed binary from -2147483648 to 2147483648. Approximate add up to around 4 billion mapping space. As long as the hash function mapping was ⽐ more evenly loose, shoots as usual to apply it is difficult to occur collision. But the problem is ⼀ a 4 billion ⻓ of the array, the memory is not fit. You think, before the initial array of HashMap expansion zoomed ⼩ was 16. So the hash value can not be directly used Use of. Should do first of an array of ⻓ modulo operation before first use, the remainder to Use get access to an array subscript. Source code modulo operation in this indexFor () function ⾥ completed.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        ......
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
            // n为数组的长度
        if ((p = tab[i = (n - 1) & hash]) == null)
            // p = tab[i = (n - 1) & hash  计算元素在数组中的位置
        ......
} 

putVal code is relatively simple, is the hash value and the array length minus 1 do "and" operation.

⼀ under way, which also helps to explain why the power of a whole array of HashMap to take ⻓ 2. Because of this (array ⻓ deg -1) corresponds exactly ⼀ a "low mask." Standard Access "and" the result is ADVANCED bit hash value of the entire operation to zero, leaving only a low value, do Using the array. 16 as an example of an initial ⻓, tab into the monitor 16-1 = 15.2 00000000 0,000,000,000,001,111. And a hash value do "and" follows, the result is the interception of the lowest four-digit value. 

  10100101 11000100 00100101
& 00000000 00000000 00001111
----------------------------------
  00000000 00000000 00000101 

But when the question came, so even if I hash value distribution loose again, if only to get accustomed to take the last bit, it will be very serious collision. Even worse, if the hash does not do well in this chest upwards, distribution into vulnerability arithmetic sequence, just to get accustomed to make a final presentation regularity in repeated low, it ⽆ ⽐ egg pain. This time value "of the disturbance function" is manifested when it comes to individual cases from home should zoomed guessed. Look ⾯ this figure,

                                 

16-bit right shift, the 32bit exactly halfway, Your Own ADVANCED half region and the lower half region XOR, is to mix the original bits and lower ADVANCED hash code, in order to add zoomed low randomness. On it and the lower doped portion of the mixed bits ADVANCED features, such ADVANCED information bits are also retained in disguise.

Guess you like

Origin blog.csdn.net/GoSaint/article/details/88977196