【转帖】https://blog.csdn.net/weixin_33845477/article/details/89104450

Ali programmers working tips: Understand CPU branch prediction, improved code efficiency

HTTPS: // blog.csdn.net/weixin_33845477/article/details/89104450 

characteristics of the hardware to improve performance.

 


The value of the spread of technology, not just in our path to shorten build applications by commercial products and open source projects to accelerate business on-line rate, reflected in the work will also enhance efficiency in good programmers, performance optimization and user experience improvement, etc. tips aspects of sharing, to enhance our ability to work.

This article programmers from Alibaba middleware technology team off the ridge, he was the project team members Ali micro-service open source project Dubbo, is also responsible for online Java open source project diagnosis of Arthas.

First, the basic concept of
a Dubbo: is a high performance, lightweight open-source Java RPC framework that provides three core capabilities: an interface for remote method invocation, intelligent fault tolerance and load balancing, and automatic registration and service discovery;

. B ChannelEventRunnable: callback interface Dubbo in all network events;

c JMH:. That Java Microbenchmark Harness, is a suite of tools designed for micro-benchmarks of the code. During performance optimization may be used to optimize JMH quantify the results of the analysis.

Second, the demand origin:
There is a very well-known problems on Stack Overflow: Why an ordered array of fast processing than non-ordered array? Conclusions from the point of view of the problem, is to enhance the efficiency of branch prediction code has played a very important role.

Today's CPU is supported branch prediction (branch prediction) and the instruction pipeline (instruction pipeline), maybe combined can greatly improve the efficiency of the CPU, thereby improving the efficiency of code execution. But if only for the simple jump, but jump for Switch, CPU is no good solution, because the Switch essentially According to the index, it is to take the address and then jump from the address array.

Third, think and plan assumptions
to be efficient code, an important realization of the principle is to avoid CPU to empty the pipeline, from the results of the discussions on Stack Overflow point of view, by improving branch prediction success rate, can reduce the CPU to empty the pipeline The probability. So, in addition to the hardware level, if you can consider the code level to help determine in advance the CPU to efficient code it?

Fourth, the program verify
a Switch in the ChannelEventRunnable in Dubbo judged channel state. When a channel is established, the situation is more than 99.9%, its state is ChannelState.RECEIVED, we can consider this judgment in advance.

The following was verified by JMH, determines whether the code execution efficiency can be improved after the advance.

public class TestBenchMarks {\tpublic enum ChannelState {\t\tCONNECTED, DISCONNECTED, SENT, RECEIVED, CAUGHT\t}\t@State(Scope.Benchmark)\tpublic static class ExecutionPlan {\t\t@Param({ \u0026quot;1000000\u0026quot; })\t\tpublic int size;\t\tpublic ChannelState[] states = null;\t\t@Setup\t\tpublic void setUp() {\t\t\tChannelState[] values = ChannelState.values();\t\t\tstates = new ChannelState[size];\t\t\tRandom random = new Random(new Date().getTime());\t\t\tfor (int i = 0; i \u0026lt; size; i++) {\t\t\t\tint nextInt = random.nextInt(1000000);\t\t\t\tif (nextInt \u0026gt; 100) {\t\t\t\t\tstates[i] = ChannelState.RECEIVED;\t\t\t\t} else {\t\t\t\t\tstates[i] = values[nextInt % values.length];\t\t\t\t}\t\t\t}\t\t}\t}\t@Fork(value = 5)\t@Benchmark\t@BenchmarkMode(Mode.Throughput)\tpublic void benchSiwtch(ExecutionPlan plan, Blackhole bh) {\t\tint result = 0;\t\tfor (int i = 0; i \u0026lt; plan.size; ++i) {\t\t\tswitch (plan.states[i]) {\t\t\tcase CONNECTED:\t\t\t\tresult += ChannelState.CONNECTED.ordinal();\t\t\t\tbreak;\t\t\tcase DISCONNECTED:\t\t\t\tresult += ChannelState.DISCONNECTED.ordinal();\t\t\t\tbreak;\t\t\tcase SENT:\t\t\t\tresult += ChannelState.SENT.ordinal();\t\t\t\tbreak;\t\t\tcase RECEIVED:\t\t\t\tresult += ChannelState.RECEIVED.ordinal();\t\t\t\tbreak;\t\t\tcase CAUGHT:\t\t\t\tresult += ChannelState.CAUGHT.ordinal();\t\t\t\tbreak;\t\t\t}\t\t}\t\tbh.consume(result);\t}\t@Fork(value = 5)\t@Benchmark\t@BenchmarkMode(Mode.Throughput)\tpublic void benchIfAndSwitch(ExecutionPlan plan, Blackhole bh) {\t\tint result = 0;\t\tfor (int i = 0; i \u0026lt; plan.size; ++i) {\t\t\tChannelState state = plan.states[i];\t\t\tif (state == ChannelState.RECEIVED) {\t\t\t\tresult += ChannelState.RECEIVED.ordinal();\t\t\t} else {\t\t\t\tswitch (state) {\t\t\t\tcase CONNECTED:\t\t\t\t\tresult += ChannelState.CONNECTED.ordinal();\t\t\t\t\tbreak;\t\t\t\tcase SENT:\t\t\t\t\tresult += ChannelState.SENT.ordinal();\t\t\t\t\tbreak;\t\t\t\tcase DISCONNECTED:\t\t\t\t\tresult += ChannelState.DISCONNECTED.ordinal();\t\t\t\t\tbreak;\t\t\t\tcase CAUGHT:\t\t\t\t\tresult += ChannelState.CAUGHT.ordinal();\t\t\t\t\tbreak;\t\t\t\t}\t\t\t}\t\t}\t\tbh.consume(result);\t}}
Verification instructions:

benchSiwtch in pure Switch judge;
benchIfAndSwitch in advance by a judge if the state is ChannelState.RECEIVED.
Benchmark results are:

Result \ u0026quot; io.github.hengyunabc.jmh.TestBenchMarks.benchSiwtch \ u0026quot ;: 576.745 ± (99.9%) 6.806 ops / s [Average] (min, avg, max) = (490.348, 576.745, 618.360), stdev = 20.066 CI (99.9%): [ 569.939, 583.550] (assumes normal distribution) # Run complete Total time:. 00: 06: 48Benchmark (size) Mode Cnt Score Error UnitsTestBenchMarks.benchIfAndSwitch 1000000 thrpt 100 1535.867 ± 61.212 ops / sTestBenchMarks.benchSiwtch 1000000 thrpt 100 576.745 ± 6.806 ops / s
can be seen, is determined in advance if improved code efficiency nearly 3 times the performance of this technique can be placed in critical areas. .

V. Summary
switch CPU would be difficult to do branch prediction.
Switch if certain conditions are relatively high probability can be determined in advance if the code layer is provided, the branch prediction mechanism to make full use of the CPU.
View original link: Ali programmers working tips | understand CPU branch prediction, improved code efficiency
----------------
Disclaimer: This article is CSDN blogger original article "weixin_33845477", and follow CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/weixin_33845477/article/details/89104450

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/12181854.html