(Reprinted) Why X% length == X & (length-1) in HashMap (remainder% and operation & conversion problem)

Reprinted: Original link https://blog.csdn.net/ricardo18/article/details/108846384
Statement: If I violate anyone's rights, please contact me and I will delete it.
Welcome experts to spray me

One, lead to the problem

When explaining the source code implementation of HashMap, there are the following points:

① The initial capacity is 1<<4, that is, 24 = 16
Insert picture description here
② The load factor is 0.75. When the proportion of elements stored in the HashMap exceeds 75% of the entire capacity, expand the capacity, and when it does not exceed the range of the int type , Perform the expansion of the power of 2 (referring to the length of the original 2 times) and
Insert picture description here
double it.
Insert picture description here
③ When a new element is added, the position of this element in the HashMap is calculated, which is the main character hash operation of this article. It is divided into three steps:

The first step: take the hashCode value: key.hashCode()

Step 2: Participate in high-level operations: h>>>16

The third step: Modulo operation: (n-1) & hash

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
 
    tab[i = (n - 1) & hash];

ps: The sixth line of code was added by myself.

We know that a good hash algorithm can make the distribution of elements more even, thereby reducing hash collisions. The processing of HashMap in this area is very clever:

The first step is to get the hashCode. This method is a native method decorated with native. It returns an int type value (a value converted according to the memory address). Usually we will rewrite this method.

In the second step, the obtained hash value is unsigned to the right by 16 bits, and the high bits are filled with 0. And perform a bitwise XOR operation with the hash code obtained in the previous step. What is the use of this? This is actually a disturbance function, in order to reduce the collision of the hash code. The right shift is 16 bits, which is exactly half of the 32 bit. The high half area and the low half area are XORed to mix the high and low bits of the original hash code to increase the randomness of the low bits. Moreover, the mixed low-level features are doped with some of the high-level features, so that the high-level information is also preserved in disguise. That is to ensure that both high and low Bit bits are involved in the calculation of Hash.

If you are interested, you can take a look at JDK1.7. In fact, it did 4 disturbances, but only once in JDK1.8. I guess it is to reduce conflicts while ensuring efficiency.
  Insert picture description here
The focus of this article is the third step, the hash value obtained through the previous two steps, and the collection length of the HashMap minus 1 for bitwise AND operation: (n-1) & hash. But in fact, many hash algorithms, in order to make the element distribution uniform, use a modulo operation, using a value to modulate the total length, that is, n%hash. We know that the efficiency of & in the computer is much higher than %, so how to convert% to & operation? In HashMap, (n-1) & hash is used for calculation, so why is this?

This is the question we will understand in this blog.

2. Conclusion

We first give the conclusion:

当 lenth = 2n 时,X % length = X & (length - 1)

In other words, when the length is 2 to the nth power, the modulo% operation can be transformed into a bitwise AND operation.

For example: 9% 4 = 1, the binary value of 9 is 1001, the binary value of 4-1 = 3, and the binary value of 3 are 0011. 9 & 3 = 1001 & 0011 = 0001 = 1

Another example: 12% 8 = 4, the binary value of 12 is 1100, and the binary value of 8-1 = 7, 7 is 0111. 12 & 7 = 1100 & 0111 = 0100 = 4

The above two examples 4 and 8 are both the n-th power of 2, and the conclusion is true. What about when the length is not the n-th power of 2?

For example: 9% 5 = 4, 9 is 1001 in binary, 5-1 = 4, 4 is 0100. 9 & 4 = 1001 & 0100 = 0000 = 0. Obviously it is not true.

why is it like this? Let's analyze in detail below.

3. Analysis process

First of all, we need to know the following rules:

①, "<<" Left shift: Add 0 to the vacant bit on the right, and the left bit will be squeezed out from the beginning of the word, and the left shift by one bit is equivalent to multiplying by 2.

②, ">>" shift to the right: the bit on the right is squeezed out, and the value is equivalent to dividing by 2. For the space shifted out on the left, if it is a positive number, the space is filled with 0, if it is a negative number, it may be filled with 0 or 1, depending on the computer system used.

③, ">>>" unsigned shift to the right, the bit on the right is squeezed out, and 0 is added to the space shifted out on the left.

According to the characteristics of binary numbers, I believe everyone understands it well.

Given an arbitrary decimal number XnXn-1Xn-2...X1X0, we decompose it in binary representation:

XnXn-1Xn-2…X1X0 = Xn2n+Xn-12n-1+…+X121+X020 3-1公式

The decimal number here has only three digits. Similarly, when there are N digits, the power of 2 will increase from 0 to N in turn.

Back to the above conclusion: When lenth = 2n, X% length = X & (length-1)

And for division, the dividend meets the distribution rate (divisor does not meet):

Established: (a+b)÷c=a÷c+b÷c 3-2 formula

Not true: a÷(b+c)≠a÷c+b÷c

Through the 3-1 formula and the 3-2 formula, we can get that when any decimal number is divided by a 2k number, we can convert the decimal number into the representation of the 3-1 formula:

(XnXn-1Xn-2…X1X0) / 2k = (Xn2n+Xn-12n-1+…+X121+X020) / 2k = Xn2n / 2k +Xn-12n-1 / 2k +…+ X121 / 2k + X020 / 2k

If we want to find the remainder of the above formula, I believe everyone can tell at a glance:

①. When 0<= k <= n, the remainder is Xk 2k+Xk-1 2k-1+…+X1 21+X0 20, that is to say, the n-th power greater than k, we discarded (large All of them can be divisible by 2k), and we are all left with those who are smaller than k (the smaller ones cannot be divisible by 2k). Then the remainder is the remainder.

②. When k> n, the remainder is the entire decimal number.

Seeing this, we are very close to proving the conclusion. Going back to the binary shift operation mentioned above, shifting to the right by n bits means dividing by the power of 2n. From this we get a very important conclusion:

A decimal number takes the remainder of a 2n number. We can convert this decimal to a binary number and shift the binary number to the right by n places. The n digits removed are the remainder.

Know how to calculate the remainder, so how do we get the number of n removed?

Let's look at 20, 21, 22... 2n in binary as follows:

0001,0010,0100,1000,10000…

We reduce the above number by one:

0000,0001,0011,0111,01111…

According to the rule of the AND operator &, when the bits are all 1, the result is 1, otherwise it is 0. So when any binary number takes the remainder of 2k, we can perform a bitwise AND operation between this binary number and (2k-1), even if the remainder is retained.

This perfectly proves the conclusion given earlier:

当 lenth = 2n 时,X % length = X & (length - 1)

Note that it must be 2n to satisfy the above formula, otherwise it is wrong.

to sum up

Through the above analysis process, we perfectly proved the correctness of the formula. Back to the implementation process of HashMap, we know why the initial capacity of HashMap is 1<<4, and each expansion is doubled. Because the hash algorithm must be perfectly satisfied.

Guess you like

Origin blog.csdn.net/qq_45531729/article/details/112370306