Why does HashMap use h^(h >>>>16) to calculate hash value? The number of slots must be 2^n?

Hello everyone, I am Yihang!

Yesterday at noon, a fan friend sent me a private message on WeChat and asked: Why is the hash value calculation format of HashMap like this: (h = key.hashCode()) ^ (h >>> 16)? What does h ^ (h >>> 16) mean?

The following is the source code for HashMap in Java8 to calculate the hash corresponding to the key:

static final int hash(Object key) {
    
    
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

After explaining it for a while, I found that it is a little troublesome to explain this issue clearly without examples. So I just wrote an article to talk about what routines are involved here?

Let’s talk about the conclusion first :

All operations are only to increase randomness and reduce hash collision probability; to make the locations where values ​​are saved more dispersed, to have better hashing performance, and to improve read and write performance.

This article will explore the following questions?

  1. Why does calculation of hash require h ^ (h >>> 16)calculation?
  2. Why does the number of slots (array length) have to be 2^n?
  3. Can HashMap use empty objects (null) as keys?

Let’s analyze it together with conclusions and questions;

Preparation

Before analysis, we need to review &(与运算), |(或运算), ^(异或运算)and 位运算符, these are the prerequisites, otherwise there will be no way to proceed;

  • &(AND operation)

    If two binary values ​​are both 1 in the same bit, the bit in the result is 1, otherwise it is 0

    Example:

      1111   (10进制:15)
    & 1011   (10进制:11)
    --------------------
    = 1011   (10进制:11)
    

    Java code:

    public static void main(String[] args) {
          
          
        System.out.println("15 & 11 = " + (15 & 11));
    }
    

  • |(or operation)

    If two binary values ​​have at least one 1 in the same bit, the bit in the result is 1, otherwise it is 0

    Example:

      1111   (10进制:15)
    | 1011   (10进制:11)
    --------------------
    = 1111   (10进制:15)
    

    Java code:

    public static void main(String[] args) {
          
          
        System.out.println("15 | 11 = " + (15 | 11));
    }
    

  • ^(XOR operation)

    If two binary values ​​are the same in the same bit, the bit in the result is 0, otherwise it is 1

      1111   (10进制:15)
    ^ 1011   (10进制:11)
    --------------------
    = 0100   (10进制:4)
    

    Java code:

    public static void main(String[] args) {
          
          
        System.out.println("15 ^ 11 = " + (15 ^ 11));
    }
    

  • Bit shift operator

    • Signed left shift<<

      Move x bits to the left (move in whichever direction the vertex is), and add x 0s to the low bit (rightmost) of any positive or negative number;

      Example: 20 << 2

      20的二进制(反码,补码):0001 0100   
               向左移动两位后:0101 0000
                       结果:80
      

      Example: -20 << 2

      原码:1001 0100
      反码:1110 1011      // 符号位不变,其他位全部取反
      补码:1110 1100      // 反码+1
      左移两位后:1011 0000
      反码:1010 1111      // 在右移动后的补上上-1
      原码:1101 0000      // 除符号位外,反码其他位全部取反
      结果:-80
      
    • Signed shift right>>

      Move x bits to the right . If the number is a positive number, the high bit (leftmost) is filled with x 0s. If it is a negative number, the highest bit is filled with x 1s.

      Example: 20>>2

      原码(反码,补码):00010100
      右移两位(最左边两位添0)
      原码(反码,补码):00000101
      结果:5
      

      Example: -20 >> 2

      原码:10010100
      反码:11101011    // 符号位不变,其他位取反
      补码:11101100    // 反码 + 1
      右移两位(最左边两位添1)
      补码:11111011
      反码:11111010    // 补码 - 1
      原码:10000101    // 符号位不变,其他位取反
      结果:-5
      
    • unsigned right shift>>>

      Similar to >>, but does not pay attention to the sign bit, and fills all 0s on the left side;

      Example: 2>>1

      原码(反码,补码):00000000 00000000 00000000 00000010
      右移一位(最左边一位添0)
      原码(反码,补码):00000000 00000000 00000000 00000001
      结果:1
      

      Example: -2>>1

      原码:10000000 00000000 00000000 00000010
      反码:11111111 11111111 11111111 11111101  // 符号位不变,其他位取反
      补码:11111111 11111111 11111111 11111110  // 反码 + 1
      右移1位(无符号位运算符,最左边一位只添0)
      补码:01111111 11111111 11111111 11111111
      反码:01111111 11111111 11111111 11111111  // 高位为0,正数
      原码:01111111 11111111 11111111 11111111  // 与反码相同
      结果:2147483647
      

HashMap hash and slot calculation

The underlying data structure of HashMap is 数组+ 链表+ 红黑树, and the slot calculation of the array is the first step of the entire access; the following is not the detailed process of HashMap, but only the steps related to the calculation of hash and array slots in this article, and other steps that have nothing to do with the topic of this article. , I won’t go into details here.

  • The first step is to get the hash of the key

    h = key.hashCode()
    
  • The second step is to calculate the hash of HashMap

    static final int hash(Object key) {
          
          
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
    
  • The third step is to calculate the slot (array subscript)i = (n - 1) & hash]

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
          
          
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
          
          
            ....
    }
    
  • Next steps, save

    slightly

Question 1: Why is the hash calculated h ^ (h >>> 16)?

According to the above steps, we use an example to calculate

public static void main(String[] args) {
    
    
    Map map = new HashMap();
    map.put("hello world", "1");
}

Instantiating HashMap does not specify the length, and the default array length is used n = 2^4 = 16 ;

The slot calculation process is as follows:

   h = key.hashCode()    01101010 11101111 11100010 11000100
             h >>> 16    00000000 00000000 01101010 11101111 
------------------------------------------------------------
hash = h ^ (h >>> 16)    01101010 11101111 10001000 00101011
  (n - 1) = (2^4 - 1)    00000000 00000000 00000000 00001111
------------------------------------------------------------
     (2^4 - 1) & hash    00000000 00000000 00000000 00001011
            10进制结果    11

Process analysis:

  • h = key.hashCode()

    The hashCode() of the object is an int value, and the value range is:[-2147483648,2147483647]

    text hashCode() binary
    “hello world” 1,794,106,052 01101010 11101111 11100010 11000100
  • h >>> 16

    Unsigned right shift of hashCode by 16 bits

    operate value
    hashCode() 1,794,106,052
    binary 01101010 11101111 11100010 11000100
    h >>> 16 00000000 00000000 01101010 11101111
  • hash = h ^ (h >>> 16)

    Operation instructions: The high 16 bits are unchanged; the low 16 bits are XORed with the high 16 bits ; the participation of the high 16 bits increases the randomness of the result.

      01101010 11101111 11100010 11000100
    ^ 00000000 00000000 01101010 11101111
    -------------------------------------
    = 01101010 11101111 10001000 00101011
    
  • (n - 1) & hash

    The length of the array in the n code HashMap is not specified initially. By default, n is2^4 = 16

    (n - 1) = 16 - 1 = 15

    Then there is another question: why n-1?

    Taking the default length: 16 (2^4) as an example, the corresponding subscript of the array is 0-15between

    Calculation method: hash % (2^4); the essence is to take the remainder of the sum length

    Equivalent calculation method: hash & (2^4 - 1)

         hash  01101010 11101111 10001000 00101011
    &
    (2^4 - 1)  00000000 00000000 00000000 00001111
    ----------------------------------------------
            =  00000000 00000000 00000000 00001011
      十进制 =  11
    

    From this, it can be concluded that the final slot to which "hello world" belongs is: 11

If no h ^ (h >>> 16)operation is performed

We have already understood the entire hash and slot calculation process, but h ^ (h >>> 16what would be the process without this step? The slot calculation steps are much simpler.

hash = key.hashCode()    01101010 11101111 11100010 11000100
              (n - 1)    00000000 00000000 00000000 00001111
------------------------------------------------------------
     (n - 1) & hash =    00000000 00000000 00000000 00000100

Combining the above examples, you will find that the entire hash value, except for the lower four bits participating in the calculation, does not play any role. This will lead to the fact that the hash value of the key is the same in the low bits. If the high bits are different, the calculated slot will be lower. The signs are all the same, which greatly increases the chance of collision;

But if it is used h ^ (h >>> 16)and the high bits are involved in the operation of the low bits, the overall randomness will be greatly increased;

Question 2: Why does the number of slots (array length) have to be 2^n?

According to the source code, whether it is initialization or expansion during the saving process, the length of the slot number is always 2^n; (2^n - 1) & hashthe slot index calculated by the formula is more hashing; if the length of the default slot number n is not 16 (2^ 4), but 17, what effect will there be?

When doing **(n - 1) & hash** operation, the calculation process is as follows:

         hash  01101010 11101111 10001000 00101011
&
(17 - 1) = 16  00000000 00000000 00000000 00010000
----------------------------------------------
            =  00000000 00000000 00000000 00000000
  十进制 =  0

Since the binary number of 16 is 00010000, only 1 bit participates in the end &(与运算), and all other values ​​​​are blocked by 0; resulting in the final calculated slot subscript will only be 0or 16, then all values ​​will only be stored in these two Under the slot; other indexes will never be able to hit, which is undoubtedly catastrophic for HashMap. The more values ​​saved, the access efficiency will be greatly reduced.

Question 3: Can HashMap use empty objects (null) as keys?

The answer is: yes ;

It can be seen from the source code for calculating key hash value:

static final int hash(Object key) {
    
    
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

(key == null)The hash value obtained at that time (n - 1) & hash. The empty object is the saved slot: 0;

Sample code:

public static void main(String[] args) {
    
    
    Map map = new HashMap();
    map.put(null, "1");
    System.out.println(map.get(null));
}

The value can be obtained normally, but there are pitfalls :

Since the null object can be used as the key here, when saving and retrieving the value, be sure to pay attention. It is very likely that the key object is still null when the value is stored, but when the value is retrieved, the key has been assigned a value. , resulting in the final value not being retrieved:

public static void main(String[] args) {
    
    
    HashMap map = new HashMap();
    String key = null;
    map.put(key, "1");
    // .. 其他操作
    key = "k";
    System.out.println("用k取值:" + map.get(key));
    System.out.println("用null取值:" + map.get(null));
}

In this way, the calculation "routine" of hash and slot is explained clearly;

For novices, writing code only needs to be able to run, but for masters, writing it well is enough; good code starts from every tiny detail, and finally achieves a more perfect effect; just a hash and slot operation, masters can also To maximize performance, maybe this is the difference!

Guess you like

Origin blog.csdn.net/lupengfei1009/article/details/124397576