Integer.bitCount() function understanding (as easy to understand as possible)

The bitCount(int i) function realizes counting the number of 1s in the binary digits of a number. For example, the binary value of 5 is 101, and 2 is returned.

The source code of Jdk1.8 is as follows. At first glance, I was stunned, and then I was still stunned. After 2 hours of analysis, I was stunned and cheerful, so I wrote this article.

public static int bitCount(int i) {
    
    
   // HD, Figure 5-2
    i = i - ((i >>> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
    i = (i + (i >>> 4)) & 0x0f0f0f0f;
    i = i + (i >>> 8);
    i = i + (i >>> 16);
    return i & 0x3f;
}

Basic knowledge

  • & And operation: a & b, both a and b are 1 and the result is 1, otherwise the result is 0. It can be used to take the value of a fixed interval, such as 1001 1100, take the first four digits 1001 1100 & 1111 0000 = 1001, take the last four digits 1001 1100 & 0000 1111 = 1100.
  • ">>>" and "<<< ": unsigned right shift and unsigned left shift. (It involves how to represent negative numbers, which is beyond the scope of this article)
  • The binary representation of the hexadecimal number in the code is as follows
original Binary
0x55555555 01010101 01010101 01010101 01010101
0x33333333 00110011 00110011 00110011 00110011
0x0f0f0f0f 00001111 00001111 00001111 00001111
0x3f 00000000 00000000 00000000 11111111

Start with the results

  • Int occupies 4 bytes, and C1, C2, C3, C4 represent these 4 bytes from left to right. The input parameter i is represented by C1, C2, C3, C4, and after a series of calculations, i is represented by D1, D2, D3, and D4.

  • By the return value i & 0x3fIt can be seen that only D4 is involved in the operation and the value of D4 is the result value . Why is only the value of D4 the result? The answer is in the last two lines of code

i = i + (i >>> 8);// 第一行
i = i + (i >>> 16);// 第二行
  • The logic of the last two lines of code is as follows: a byte has 8 bits, right shifting 8 bits is to remove the last byte, similarly right shifting 16 bits is to remove the last two bytes. After these 2 lines of code are executed, the value of D4 is actually D1 + D2 + D3 + D4,So just take D4 to participate in the calculation and return.
    Insert picture description here

Divide and conquer

  • Before the timeline returns to the last two lines of code, what do D1, D2, D3, and D4 represent? Why is addition the result?
  • Conclusion: D1 indicates how many 1s are in C1. Similarly, D2, D3, and D4 indicate how many 1s are in C2, C3, and C4 respectively. So D1 + D2 + D3 + D4 can be returned as the result.
  • So how does C1 go to D1? ,This is what the first three lines of code do
i = i - ((i >>> 1) & 0x55555555); 
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i + (i >>> 4)) & 0x0f0f0f0f;
  • Assume that C1 is 10110011. Let's look at the meaning of each bit of C1 from another perspective:Each bit indicates how many 1s there are in this bit.It seems to be nonsense. A bit of 1 means that this bit has a 1 and a bit of 0 means that this bit has zero ones. After the first line of code is executed, this idea becomes meaningful. Two digits are used to indicate how many 1s there are in these two digits. For example, the first two digits 10 can use 01 to indicate how many 1s there are in these two digits, and three or four digits 11 can use 10 to indicate how many 1s are in the two digits, and so on.
  • After the first line of code is executed, C1' = 01 10 00 10, which means that one or two digits have one 1, three or four digits have two 1, five or six digits have zero 1s, and seven or eight digits have two 1s.
  • After the second line of code is executed, C1'' = 0011 0010, which means that there are three 1s in the first four digits and two 1s in the last four digits.
  • After the third line of code is executed, D1 = 00000101, the decimal is 5, which can be directly added to D2, D3, and D4 as a sub-result and returned.

The execution process is as follows:
Insert picture description here

Understanding the first three lines of code

  • The last two lines of code have been analyzed before. We already know the effect of the first three lines of code, but how to achieve the above effect is still awkward, changing to an equivalent writing is easier to understand, of course the efficiency of the source code is higher.
源码:
i = i - ((i >>> 1) & 0x55555555); 
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i + (i >>> 4)) & 0x0f0f0f0f;

等价写法:
i = (i & 0x55555555) + ((i >>> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i & 0x0f0f0f0f) + ((i >>> 4) & 0x0f0f0f0f);
  • Still taking C1 = 10110011 as an example, each bit represents an interval, corresponding to the previous nonsense: the bit bit is 1 means that this bit has a 1, and the bit bit is 0 means that this bit has zero ones . The interval 1 to 8 is convenient for description. The odd position is called the "odd interval", and the even position is called the "even interval".
  • Before the plus sign in the first line of code, 1 is obtained in the even interval, and after the plus sign, 1 in the odd interval is obtained. The addition is equivalent to merging the adjacent "parity interval", and the interval of C1 is reduced to 4.
  • Similarly, the second line of code reduces the interval to two, and the third line of code reduces the interval to one. Based on this, complete C1 to D1.
  • The whole process is actually a reverse order of divide and conquer~

Finally, the above process can be drawn on paper and is clear at a glance.

Guess you like

Origin blog.csdn.net/qq_27007509/article/details/112246576