question
Although it has been known for a long time that the CPU uses predictive technology when processing judgment statements such as if, so if the condition is always a result, the efficiency is very good. Conversely, if you use math to avoid if judgments, then it means that the performance must be better than if.
plan 1
A function happened to encounter this problem today, so I just tested the following.
The method to be tested is to obtain how many bits are 1 in an int32 data. My solution is to split an int32 into 4 bytes, and then judge one by one. Here's a solution using an if test (that ? : ternary operator is the if statement).
1 static int getInt32TrueCount(int value) 2 { 3 if (value == 0) 4 { 5 return 0; 6 } 7 8 return getByteTrueCount(value & 0xFF) + 9 getByteTrueCount((value >> 8) & 0xFF) + 10 getByteTrueCount((value >> 16) & 0xFF) + 11 getByteTrueCount((value >> 24) & 0xFF); 12 } 13 14 static int getByteTrueCount(int value) 15 { 16 if (value == 0) 17 { 18 return 0; 19 } 20 21 int a = (value & 0x1) == 0 ? 0 : 1; 22 int b = (value & 0x2) == 0 ? 0 : 1; 23 int c = (value & 0x4) == 0 ? 0 : 1; 24 int d = (value & 0x8) == 0 ? 0 : 1; 25 26 int e = (value & 0x10) == 0 ? 0 : 1; 27 int f = (value & 0x20) == 0 ? 0 : 1; 28 int g = (value & 0x40) == 0 ? 0 : 1; 29 int h = (value & 0x80) == 0 ? 0 : 1; 30 31 return a + b + c + d + e + f + g + h; 32 }
It can be seen that each bit of operation has an if judgment, and the terrible thing is that the result of this if judgment is unstable and extremely random. I wrote a timer program that takes 12 seconds in my computer. (i5 6500 Release .net core 2.0 )
1 static void GetBitCountTest() 2 { 3 var wathch = Stopwatch.StartNew(); 4 5 var rand = new Random(); 6 7 for (int i = 0; i < 10000_0000; i++) 8 { 9 int value = rand.Next(); 10 int p = getInt32TrueCount(value); 11 } 12 13 wathch.Stop(); 14 Console.WriteLine("GetBitCount 耗时:" + wathch.Elapsed.ToString()); 15 16 }
Scenario 2
The second method is to change the if judgment to a mathematical operation, by moving the bit after the and operation to 0, so that it is 0 or 1.
1 static int getInt32TrueCount2(int value) { 2 if (value == 0) { 3 return 0; 4 } 5 6 return getByteTrueCount2(value & 0xFF) + 7 getByteTrueCount2((value >> 8) & 0xFF) + 8 getByteTrueCount2((value >> 16) & 0xFF) + 9 getByteTrueCount2((value >> 24) & 0xFF); 10 } 11 12 static int getByteTrueCount2(int value) { 13 return (value & 0x1) + 14 ((value & 0x2) >> 1) + 15 ((value & 0x4) >> 2) + 16 ((value & 0x8) >> 3) + 17 18 ((value & 0x10) >> 4) + 19 ((value & 0x20) >> 5) + 20 ((value & 0x40) >> 6) + 21 ((value & 0x80) >> 7); 22 }
Running the test case again, the execution time increased to 2 seconds ! 6 times higher.
Summarize:
In high-performance computing, avoid using branch instructions and try to use mathematical operators.