Essential for performance optimization - flame graph

introduction

This article mainly introduces flame graphs and usage skills, and learns how to use flame graphs to quickly locate software performance stuck points.
Combined with best practice cases, it helps readers gain a deeper understanding of the structure and principles of flame graphs, understand CPU time-consuming, and locate performance bottlenecks.

background

current status

Assuming there is no flame graph, how do you tune the program code? Let's take a look.

1. Function switch method

I think back when I was just working and I was still a technical novice, I could only rely on metaphysics to troubleshoot problems. I could roughly guess that the problem might be caused by a certain function code. At this time, the troubleshooting method was to delete redundant function codes, and then run it to check CPU consumption to determine the problem. (So ​​far, when I work, I still find that some old people use this method to debug performance.)

public void demo() {
    
    

    if (关闭1) {
    
    
        // 功能1
        handle1();
    }

    if (关闭2) {
    
    
        // 功能2
        handle2();
    }

    if (打开3) {
    
    
        // 功能3
        handle3();
    }

    // 功能4
    handle4();
}

This method depends entirely on "experience" and "luck", and changes the code structure. Assuming that this is an integrated area code that has passed the test, it is very dangerous to modify the code function to debug the program at this time. Of course, there is a Git warehouse that can "restore with one key".

2. StopWatch buried point method

When a program has a performance problem and you are not sure which piece of code is time-consuming, you can use the time-consuming method to judge. At this time, we only need to append the time-consuming logs before and after calling the method to determine which method is the most time-consuming.

public void demo() {
    
    
    Stopwatch stopwatch = Stopwatch.createStarted();
    handle1();
	log.info("method handle1 cost: {} ms", 
             stopwatch.elapsed(TimeUnit.MILLISECONDS));
    
    handle2();
    log.info("method handle2 cost: {} ms", 
             stopwatch.elapsed(TimeUnit.MILLISECONDS));
    
    handle3();
    log.info("method handle3 cost: {} ms", 
             stopwatch.elapsed(TimeUnit.MILLISECONDS));
    
    handle4();
    log.info("method handle4 cost: {} ms", 
             stopwatch.stop().elapsed(TimeUnit.MILLISECONDS));
}

The advantage of this method over the previous method is that without changing the logic of the code, only some observation points are enhanced, and the performance bottleneck is located by the time-consuming method. However, assuming that the processing call stack of the method is very deep, you have to bury the point again in the submethod. At this time, the judgment process is: bury point -> release version -> locate -> bury point -> release version -> locate -> ... and essentially change the code, there is a possibility of error. Tired and not efficient!

3. The TOP command locates hot threads

Generally, the software services of enterprises are deployed on the Linux operating system. The most convenient way for experienced veterans to check performance is top positioning.

top -p pid -H

insert image description here

It is obvious that pid 103 consumes 40% of the CPU, and the corresponding stack thread information is found as follows (ignore the search method, I assume you already know it :)):
insert image description here

At this point, it can be concluded that the thread that consumes the most CPU currently is writing to disk files, and the tracing code will eventually locate that it is because a large number of INFO logs are typed in a high-concurrency scenario, causing disk writing to become a bottleneck.

Summary: The TOP command is very effective for finding CPU performance bottlenecks, but there are several problems as follows:

  • The top one must be the one that consumes the most CPU at that time, but it is not necessarily the cause of program performance. For example, a large number of ERROR logs are printed due to a certain BUG. In the end, the LOG to disk consumes the most CPU, but it is not the culprit.
  • TOP is destined to make you only pay attention to the highest. After you fix the most CPU-consuming problem, you will often encounter other program problems that cause high CPU, that is, you can only see one problem at a time, and you can't see the whole picture.
  • The expressiveness of the text is very limited: first of all, you must be very familiar with Linux and JVM commands, and secondly, when the text analyzes the correlation of two or more values, it will be stretched. At this time, another analysis tool-graph is urgently needed.

What is a flame graph

Flame Graphs , named for their resemblance to flames.
insert image description here

The above is a typical flame graph, which is composed of squares of various sizes/colors, and text is marked inside each square. The top of the entire picture is uneven, resembling clusters of "flames", hence the name flame graph.
The flame graph is generated by SVG, so it can interact with the user. When the mouse hovers over a block, the internal text will be displayed in detail. After clicking, it will expand upwards based on the currently clicked square.

Features
Before using the flame graph analysis, we must first understand the basic structure of the flame graph

  • Each column represents a call stack, and each cell represents a called function
  • The character on the square identifies the calling method, and the number indicates the number of occurrences of the current sample
  • The Y axis represents the depth of the call stack, and the X axis merges multiple call stacks and displays them in alphabetical order
  • The width of the X-axis indicates the frequency of occurrence in the sampled data, that is, the larger the width, the greater the possible cause of the performance bottleneck ( note: it is possible, not certain )
  • The colors are meaningless, randomly assigned (maybe the founders wanted you to look more like a flame..)

What can flame graphs do

At this time, you already know the flame graph, how to locate the software problem? We need a methodology for finding performance bottlenecks.
It is clear that the CPU consumption of high caliber

CPU 消耗高的口径 = 调用栈出现频率最高的一定是吃 CPU 的

As above, we have known the structure of the flame graph and the meaning of "material". At this time, our focus should be on the width of the square. The width of the square represents the number of times the call stack appears in the entire sampling history. The number of times means the frequency, that is, the more the number of occurrences, the most likely to consume the CPU.
insert image description here

But it is useless to only focus on the longest one. For example, the root at the bottom and the square in the middle are very wide, which can only show that these methods are "entry methods", that is, methods that will be passed every time a call is initiated.
We should pay more attention to the high number of "plateaus" on the top of the flame mountain, that is, there is no sub-call , and the frequency of sampling is high, indicating that the execution time of the method is long, or the execution frequency is too high (such as long polling), that is, most of the CPU execution is allocated to the "plateaus", which is the root cause of the performance bottleneck.

Summary methodology: Look at the "flat-topped mountain" in the flame graph, and the function on the top of the mountain may have performance problems!

Best Practices

Practice is the only criterion for testing truth! Below I will use a small demo to show how to locate program performance problems and deepen the understanding of the use of flame graphs.

The demo program is as follows:

public class Demo {
    
    

    public static void main(String[] args) throws InterruptedException {
    
    
        ExecutorService executorService = Executors.newFixedThreadPool(20);

        while (true) {
    
    
            executorService.submit(Demo::handle1);
            executorService.submit(Demo::handle2);
            executorService.submit(Demo::handle3);
            executorService.submit(Demo::handle4);
        }
    }

    @SneakyThrows
    private static void handle4() {
    
    
        Thread.sleep(1000);
    }

    @SneakyThrows
    private static void handle2() {
    
    
        Thread.sleep(50);
    }

    @SneakyThrows
    private static void handle3() {
    
    
        Thread.sleep(100);
    }

    @SneakyThrows
    private static void handle1() {
    
    
        Thread.sleep(50);
    }
}

The code is very simple, of course, it would not be written like this in reality, mainly for the performance. .
The main reason is to open a thread pool and execute four tasks separately. The time consumption of different tasks is inconsistent. At this time, our performance bottleneck is on the task of handle4. On the premise of knowing the conclusion, we compare the flame graph to see if the answer is in line with expectations!

1. Pull JVM stack information

Currently I am running the program on my Mac, and it is very convenient for idea to execute this program, so how to get the PID of the main function currently running?
At this time, you need to use the TOP command. The above is a while loop. It is obvious that the CPU is the most powerful. You only need to find the highest PID belonging to the Java thread.
insert image description here

Obviously get COMMAND = java Highest PID = 20552
At this time, execute the following command to obtain stack information and write it to the tmp.txt file

jstack -l 20552 > tmp.txt

2. Generate a flame graph

There are many tools for generating flame graphs. I usually use FastThread to analyze the stack online, which is very convenient. At the same time, it supports the generation of flame graphs, which is convenient for us to locate problems.
insert image description here

Open the homepage of the official website, select the stack file you just dumped, and click Analyze. At this time, you only need to wait for the website to be analyzed (normally 3~5 s), and then you can view the flame graph

The fastThread website analysis report is very rich. For general problems, we can basically locate the problem directly through the conclusions given by it. This article does not need to pay attention to it for the time being. If you are interested, I will share it later and directly pull it to the subtitle of Flame Graph
insert image description here

At this time, it is obvious that there are 4 "flat-topped mountains", and com.Demo.handle4 has the largest width, followed by com.Demo.handle3, which is in line with expectations!

Principle Analysis

Based on the above small demo, we have a deep understanding of the generation principle of the flame graph.

For example, for your understanding, suppose we want to observe what a person is busy with and what takes up his time the most, what will we do?
From the perspective of time, and regardless of cost, I will definitely arrange a surveillance camera to monitor him 24 hours a day, 360 degrees, and then arrange personnel to check frame by frame, and summarize what he does. It is concluded that he sleeps for 8 hours, works for 8 hours, plays with mobile phones for 4 hours, eats for 2 hours, and does other things for 2 hours. Thus it can be concluded that sleeping takes up the most of his time.

From the above, a set of analysis process can be summarized as follows:

记录(监控)-> 分析&归并(逐帧排查) ->  Top N -> 得出结论

Take the process to see how we should check the CPU during execution, and what things (processes/threads) take up its time the most?
The simple and crude method is to record the executed method stacks every moment, and then summarize and merge them to find out where the most time-consuming method stacks are. The problem with this approach is that

  • Big amount of data
  • long time

In fact, you just need to sample to observe what the CPU is doing. This is a problem of probability. If the CPU takes a long time to execute a certain method, it has a high probability of sampling, and the merged results will be the most. Although there are errors, the difference is not much different after repeated statistics.
In the same way, dump the stack to see what most threads are doing, and aggregate according to the frequency of each method in the stack. The method with the most frequency is the method with the most CPU allocation and execution.

"pool-1-thread-18" #28 prio=5 os_prio=31 tid=0x00007f9a8d4c0000 nid=0x8d03 sleeping[0x000000030be59000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at com.Demo.handle2(Demo.java:31)
    at com.Demo$$Lambda$2/1277181601.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Locked ownable synchronizers:
- <0x00000006c6921ac0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

As for how our jstack information is processed into a flame graph format, the community has provided tools for common dump formats, stackcollapse-jstack.pl processes jstack output.

Example input:

"MyProg" #273 daemon prio=9 os_prio=0 tid=0x00007f273c038800 nid=0xe3c runnable [0x00007f28a30f2000]
    java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        ...
        at java.lang.Thread.run(Thread.java:744)

Example output:

MyProg;java.lang.Thread.run;java.net.SocketInputStream.read;java.net.SocketInputStream.socketRead0 1

Summary & Outlook

This concludes the introduction of the flame graph, I believe you have another way to troubleshoot problems!
Existence is reasonable, needless to say in terms of the importance of tool development, I always face new things with an inclusive attitude, it really solves some pain points and stands out.
I will introduce more troubleshooting methods in the future. If you like the style of this article, please pay attention or leave a message. Welcome to discuss!

Guess you like

Origin blog.csdn.net/weixin_43975482/article/details/126596202