Yo, I found that the optimization of Dubbo seems not thorough enough? , Remember an interview experience of Ant Financial's Java R&D post

Yo, I found that the optimization of Dubbo seems not thorough enough?

Good guy, I ran it several times, this all if is much better than if + switch, so should all the source code be changed to if else, you can see that the throughput is high, and it’s not as good as if now After a while, the switch looks a bit nondescript.

I changed the value generated by state to random again, and ran it again to see the result:

Yo, I found that the optimization of Dubbo seems not thorough enough?

I ran it many times and the throughput of if is the highest. Why is this whole if is the best.

Decompile if and switch

===================================================================================

In my impression, this switch should be better than if, regardless of CPU branch prediction, when this is the case from the bytecode point of view, let's take a look at the bytecodes generated by each.

First look at the decompilation of the switch, and intercept the key parts.

Yo, I found that the optimization of Dubbo seems not thorough enough?

That is to say, the switch generates a table switch. After the above getstatic gets the value, it can directly check the table according to the index, and then jump to the corresponding row for execution, that is, the time complexity is O(1).

For example, if the value is 1, it will directly jump to execute line 64, if it is 4, it will directly jump to line 100.

There are some small details about switch. When the values ​​in swtich are discontinuous and have a large gap, lookupswitch is generated. According to the online statement, it is a binary query (I have not verified it), and the time complexity is O( logn), not directly based on the index, I think the generated lookup should be divided into two, because it is sorted by value.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Also, when the values ​​in the switch are discontinuous but the gap is relatively small, tableswtich will still be generated but filled with some values. For example, in this example, the values ​​in my switch are 1, 3, 5, 7, and 9, and it automatically fills in 2 , 4, 6, and 8 all refer to the line skipped by default.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Let's look at the decompilation result of if again:

Yo, I found that the optimization of Dubbo seems not thorough enough?

It can be seen that if will take out the variable and compare it with the condition every time, while switch will look up the table and jump to the correct line after taking the variable once. From this point of view, the efficiency of switch should be better than that of if. Of course, if the if passes the first judgment, it will goto directly, and the following judgments will not be executed.

Therefore, from the perspective of generated bytecode, the efficiency of switch should be greater than that of if, but from the perspective of test results, the efficiency of if is higher than that of switch. Whether it is a randomly generated state or 99.99% of it is the same state in the case of.

First of all, the optimization of CPU branch prediction is affirmative. If it is still better than switch in random cases, I am not sure why. It may be the optimization operation done by JIT, or the success of branch prediction in random cases. benefit outweighs the failure to predict?

Could it be that my enumeration values ​​are too few to reflect the effect of the switch? However, switch should not be weaker than if in random situations. I added 7 enumeration values, and tested again with a total of 12 values. The results are as follows:

Yo, I found that the optimization of Dubbo seems not thorough enough?

It seems that the distance has been shortened. I saw a show, so I recited 26 letters, and to tell you the truth, I still sang the typed letters.

Yo, I found that the optimization of Dubbo seems not thorough enough?

After expanding the number of branches, another wave of tests was carried out. This time, swtich is up to the challenge and is finally stronger than if.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Digression: I see that there are comparisons between if and switch on the Internet. The result of their comparison is that switch is better than if. First of all, jmh is not written correctly. A constant is defined to test if and switch, and the result of the test method is written without consumption. I don’t know how this code will be optimized by JIT. After writing dozens of lines, it may be directly optimized to return a certain value.

Summarize the test results

============================================================================

After comparing so many, let's summarize.

First of all, the hot branch is extracted from the switch and judged independently by if, making full use of the convenience brought by CPU branch prediction is indeed better than pure swtich. From our code test results, the throughput is roughly twice as high.

In the case of the hot branch, the throughput is improved even more when it is changed to pure if judgment instead of if + swtich. It is 3.3 times of pure switch and 1.6 times of if + switch.

In the case of random branching, the difference between the three is not very big, but the case of pure if is the best.

But from the perspective of bytecode, the efficiency of the switch mechanism should be higher, whether it is O(1) or O(logn), but it is not from the perspective of test results.

In the case of fewer selection conditions, if is better than switch. I don't know why. It may be that the consumption of table lookup is greater than the benefit in the case of fewer values? Anyone who knows can leave a message at the end of the article.

In the case of many selection conditions, switch is better than if. I have not tested any more selection values. If you are interested, you can test it yourself, but the trend is like this.

CPU branch prediction

============================================================================

Next, let's take a look at how this branch prediction is done, why there is a branch prediction thing, but before we talk about branch prediction, we need to introduce the instruction pipeline (Instruction pipeline), which is the pipeline of modern microprocessors.

The essence of the CPU is to fetch and execute instructions, and let’s take a look at the five major steps of fetching and executing instructions, which are fetching instructions (IF), instruction decoding (ID), executing instructions (EX), memory access (MEM), and writing back results (WB) , and then look at a picture on Wikipedia.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Of course, there may actually be more steps. Anyway, this means that so many steps need to be experienced, so one execution can be divided into many steps, so so many steps can be parallelized to improve processing efficiency.

So instruction pipelining is an attempt to keep every part of the processor busy with a few instructions, by breaking incoming instructions into a series of sequential steps to be executed by different processor units, with different instruction parts processed in parallel.

Just like the assembly line in our factory, when my Ultraman's feet are put together, the next Ultraman's feet will be assembled immediately. I will not wait for the last Ultraman to be assembled before assembling the next Ultraman.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Of course, it is not so rigid. It does not necessarily mean sequential execution. Some instructions are waiting and the following instructions do not depend on the previous results, so they can be executed in advance. This is called out-of-order execution.

Let's go back to our branch prediction.

Just like our life, this code is always faced with a choice. Only after making a choice can we know how to go. But in fact, it is found that this code often takes the same choice, so I came up with a branch prediction. device, let it predict the trend and execute the instructions along the way in advance.

Yo, I found that the optimization of Dubbo seems not thorough enough?

What if the prediction is wrong? This is different from our life. It can throw away all the results of the previous execution and do it again, but it also has an impact, that is, the deeper the pipeline, the more mistakes and more waste. The wrong prediction delay is 10 to 20 clock cycles, so there are still side effects.

To put it simply, the branch predictor is used to predict the instructions that will be jumped and executed in the future, and then pre-executed, so that the result can be obtained directly when it is really needed, which improves the efficiency.

Branch prediction is divided into many kinds of prediction methods, including static prediction, dynamic prediction, random prediction, etc. There are 16 types from Wikipedia.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Let me briefly talk about the three types I mentioned. Static prediction is stunned, just like multiple choice questions in Mongolian English. I don’t care what questions you have, I will choose A, which means that it will predict a trend, go forward indomitably, and be simple and rude.

Dynamic prediction will determine the direction of prediction based on historical records. For example, if the previous selections are true, then I will take the instructions to be executed. If the last few times are changed to false, then I will become false. These instructions executed actually use the principle of locality.

You can know the random prediction just by looking at the name. This is another way of multiple choice questions in Mongolian English. You can guess blindly and choose a direction at random to execute directly.

There are many more that I won’t list one by one. You are interested in researching on your own. By the way, in 2018, Google’s zero project and other researchers announced a catastrophic security vulnerability called Specter, which can use CPU’s branch prediction Executing leaks of sensitive information will not be expanded here, and a link will be attached at the end of the article.

Later, there was another attack called BranchScope, which also used predictive execution, so whenever a new gadget comes out, it will always bring pros and cons.

So far we have known what instruction pipeline and branch prediction are, and why Dubbo is so optimized, but the article is not over yet, I also want to mention this very famous question on stackoverflow, look at the number.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Why is it faster to process ordered arrays than unordered arrays?

======================================================================================

This question was raised at the beginning of that blog. Obviously, this is also related to branch prediction. Now that you have seen it, let’s analyze it again. You can answer this question first in your mind. After all, we all know the answer. , See if the thinking is clear.

It is the following code, the loop is faster after the array is sorted.

Yo, I found that the optimization of Dubbo seems not thorough enough?

Then the great masters from all walks of life popped up, let's take a look at what the first praise masters said.

As soon as he opened his mouth, he hit the vital point directly.

You are a victim of branch prediction fail.

Immediately after the above picture, it looks like an old driver.

Yo, I found that the optimization of Dubbo seems not thorough enough?

He said that let us go back to the 19th century, a time when long-distance communication was impossible and radio was not popular. If you were a switchman at a railway crossing, how would you know which side to turn when the train was approaching?

Stopping and restarting the train consumes a lot of energy. Every time you stop at a fork, you ask him, where are you going, and then turn the road, and restarting is time-consuming. What should I do? Guess!

Yo, I found that the optimization of Dubbo seems not thorough enough?

If you guess correctly, the train doesn't need to stop, just keep going. If you guess wrong, stop and back up, then change lanes and drive again.

So it's up to you to guess right! Give it a go and turn your bike into a motorcycle.

Then the boss pointed out the assembly code corresponding to the key code, that is, the jump instruction, which corresponds to the fork of the train, and it is time to choose a road.

Yo, I found that the optimization of Dubbo seems not thorough enough?

I will not analyze it later, everyone should know that after the sorted array is executed until the value is greater than 128, all of them must be greater than 128, so the result of each branch prediction is correct! So the execution efficiency is very high.

The unsorted array is out of order, so many times the prediction will be wrong, and if the prediction is wrong, the instruction pipeline must be emptied, and then it will be done again. Of course, the speed will be slow.

So the boss said that you are the victim of branch misprediction in this topic.

Guess you like

Origin blog.csdn.net/m0_65484000/article/details/122007809