[Technical Vision] DevEco Profiler of Hongmeng Development Kit helps you easily analyze application performance issues

Author: shizhengtao, Huawei performance tuning tool expert

Application performance optimization has always been a major problem faced by developers. The newly unveiled HarmonyOS NEXT developer preview version at the 2023 HDC conference, including the Harmony development kit DevEco Profiler, provides a solution to the problem of application lagging. What abilities do you have? This article will take you to find out.

1. Realtime Monitor , efficiently detect stuck problems

Realtime Monitor monitors a series of performance indicators during application operation in real time and displays these indicators in a visual panel. It is very simple for developers to use. You only need to select the application process you want to observe in the upper left corner of the DevEco Profiler tool interface, and this function will be automatically turned on.

Figure 1 Realtime Monitor

In Realtime Monitor, you can see the following indicator items:

① System performance events: With the help of HarmonyOS NEXT's built-in performance detection capabilities, it can help you automatically discover some operating events related to performance and stability.

② System abnormal events: With the help of HarmonyOS NEXT’s built-in abnormal detection capabilities, it can help you automatically discover some abnormal operating events.

③Foreground Ability: Detect the UIAbility currently displayed by the application in the foreground. When an abnormal indicator is found, you can quickly learn which UIAbility is generated when it is running.

④CPU usage: Detect the CPU usage of the application and the overall CPU load of the system. Continuously high CPU usage often leads to the problem of excessive energy consumption, which needs to be focused on.

⑤Memory usage: Detect the memory usage of the application and the overall memory load of the system. If there is a periodic increase in the application memory, a memory leak is likely to have occurred, which requires special attention.

⑥Device FPS: Detect the current FPS frame rate of the device. When the application interface is static and the FPS is high, there may be an over-rendering problem. When the application interface changes greatly but the FPS is not high, there may be a frame loss problem, which is what was mentioned earlier. Fluency issues.

⑦Device GPU utilization: Detect the current GPU utilization of the device. When the FPS frame rate is not enough, a preliminary demarcation can be made by comparing the GPU utilization and CPU utilization indicators: whether the bottleneck is on the GPU side or on the CPU side.

⑧Device energy consumption: Detect the power consumption and total power consumption of each physical device used by the application to help you quickly analyze the current energy consumption distribution of the application.

With the help of these real-time performance indicators, you can quickly understand the running performance of the application process, so that you can quickly discover and delimit when certain performance problems occur in the application.

2. Scenario-based analysis, directly analyzing the source code of the problem

At the beginning of the design of the DevEco Profiler tool, we determined a core concept, which is to provide scenario-based low-threshold tuning tools, build a Top-Down UX interaction design, and guide developers to be able to peel off the cocoons when analyzing performance data. , proceed layer by layer, instead of falling directly into the details of the data ocean from the beginning. This is very important in the field of performance tuning. We hope to intuitively reflect the fault model behind each type of performance problem to developers through interface design. Developers can complete the preliminary delimitation and judgment of the problem as soon as they get the performance data, and then analyze the captured data in a targeted manner. A clear solution idea is one of the necessary conditions for solving performance problems. . In addition, another extremely important point is that in all scenarios we hope to help developers directly locate the problem line of code. After locating the bottleneck function through the tool, you can directly double-click the function stack frame to quickly find the problem code in DevEco. Open the relevant source files in Studio's editor, and developers can immediately analyze and optimize them. In the DevEco Profiler unveiled at the HDC conference in August, in addition to the basic tuning templates related to function hotspots and memory analysis that have been released, this year has brought to developers two major ones that truly embody the concept of scenarioization. Advanced templates: Launch Insight and Frame Insight.

Launch Insight: Comprehensively dismantles the application cold start process, captures time-consuming data at different stages, and helps developers quickly analyze time-consuming bottlenecks in the cold start process.

Frame Insight: records the rendering data of each frame, automatically identifies stuck frames, and provides system trace information and function stack sampling data for the same period to help developers efficiently analyze the location and causes of stuck frames.

Next, let’s take a closer look at the Frame template and see how it helps developers analyze the frame loss problem in a targeted and clear-cut way.

3. Frame Insight , efficiently locate stuck problems

We mentioned in the previous section that the fault model behind the performance problem will be intuitively reflected to developers. Therefore, before introducing the Frame template, developers first need to briefly understand the process of graphics rendering in HarmonyOS NEXT. If there is a lag, what are its possible stages and causes.

In HarmonyOS NEXT, the graphics system adopts a unified rendering mode and follows a typical pipeline mode. Taking the 60Hz refresh rate as an example, the whole process is shown in Figure 2 below. If it is 90Hz, the cycle of each Vsync is 11.1ms. .

Figure 2 60Hz refresh rate rendering process

In the entire rendering process, the application side first responds to the consumer's screen click and other input events. After the application side processes it, it is submitted to the Render Service. After the Render Service coordinates the GPU and other resource processing, the final image is sent uniformly. to display on the screen. Smart readers must have been able to deduce the corresponding failure mode from this process at this time, as shown in Figures 3 and 4.

Figure 3 Fault model of application lagging leading to frame loss

Figure 4 Fault model of Render Service stuck causing frame loss

Throughout the entire processing process, lags may occur on both the application side and the Render Service side, causing the end user to observe frame loss. We named these two situations AppDeadlineMissed and RenderDeadlineMissed respectively. Generally speaking, the former may be caused by the application logic processing code not being efficient enough, and the latter may be caused by the interface structure being too complex or the GPU load being too heavy. Both fault models can be visualized through our Frame template. After supplementing these preliminary knowledge, let us get to the point.

The first step is to select and record the template. This step is very simple. Developers can reproduce stuttering and dropped frames during recording with just a few clicks of the mouse. The entire process is shown in Figure 5. During the recording, DevEco Profiler will use the rich DFX tools in HarmonyOS NEXT to automatically collect various performance data required for developers in frame loss scenarios. After the recording and analysis is completed, Started the analysis.

Figure 5 Frame template recording analysis

After the recording is completed, you can observe a series of data lanes, as shown in Figure 6.

Figure 6 Frame template data lane

① Frame swim lane: Visually presents the performance data corresponding to the frame loss fault model, helping developers quickly locate the period of frame loss and make a preliminary judgment on the cause of frame loss.

②ArkTS Callstack lane: Capture and present ArkTS function hotspots to help developers locate time-consuming bottlenecks on the ArkTS side.

③Native Callstack lane: Capture and present the function hotspots of Native C++ to help developers locate time-consuming bottlenecks on the Native side.

④CPU Core swim lane: Captures and presents the running details of each CPU core, helping developers locate performance issues caused by thread priority, system scheduling, etc., as well as the actual running details of the thread.

⑤System Trace swim lane: Capture and present the system trace and user trace of each process to help developers understand and view the running details of the system and the accurate running time of certain core tasks. When analyzing the frame loss problem, you can first focus on expanding the first Frame lane. In this swim lane, we captured some key node information in the graphics rendering process and visualized it, as shown in Figure 7.

Figure 7 Frame Lane

① Application frame processing: It shows you the processing time of each frame on the application side. The length of the box is the specific time consumption. The green ones are the frames completed within the predetermined period, and the red ones are not within the predetermined period. completed frame

RenderService frame processing: It shows you the processing time of each frame on the Render Service side. The bar display logic is the same as that on the application side.

③Submission relationship: By connecting the frames submitted by the application side and the frames received and processed by the Render Service side, and marked with corresponding numbers, you can immediately observe the rendering process of this application to the system.

④ Expected start and end processing time: Two vertical lines mark the selected frame, the expected start processing time and the expected completion time of processing. Once timed out, you can use these two vertical lines to observe other performance data at the same time.

⑤Detailed data: Provides you with detailed data of the selected frame. Through the jump button of Corresponding Slice or Preceding Flows, you can quickly find the corresponding detailed system trace for further analysis. Through this swim lane, developers can quickly discover missing data. The position of the frame and complete the preliminary delimitation: If red appears during application frame processing, it is necessary to further examine the processing logic in the UI thread to see whether it is too complex or inefficient, or whether resources are preempted by other tasks. ; If red appears during RenderService frame processing, you need to examine whether the interface layout is too complex. The latter can be further analyzed with the help of tools such as ArkUI Inspector in DevEco Studio, which will not be elaborated in this article. Let us continue to analyze the former phenomenon.

After finding the processing timeout frame, developers have two options, one is to analyze the system trace, and the other is to analyze the sampled function hot spots. The former method requires an in-depth understanding of the entire system and key trace points, which may be difficult for developers at this stage, so we still recommend that developers directly analyze function hot spots. The method of analysis is very simple. Just find the ArkTS Callstack lane and select it. Here is a little trick. Developers can click the Collection button in the swim lane information area to put the swim lane collection of application frame processing on top, as shown in Figure 8, which can effectively prevent the loss of contextual information.

Picture 8: Pin the key lanes to the top

After selecting the period box of the red frame in the ArkTS Callstack swim lane, you can view the function hot spots during this period in the details panel below. We provide two function hotspot display forms for developers to choose from. One is a tree list in the form of Top-Down, as shown in Figure 9; the other is a flame graph that many developers may be more familiar with, as shown in Figure 9. shown in ten. Developers can choose to view it in their own way. Generally speaking, if the function stack is relatively complex in the selected time period, it will be more efficient to use flame graph to find hot spots. In addition to these two forms, we also provide the ability to automatically find the bottleneck path, which is shown on the right side of Figures 9 and 10. When you click a specific function stack frame node in the Top-Down or flame graph on the left, we will calculate for you the most time-consuming calling path from this node downwards. When you click this When calling a function stack frame on the path, the chart on the left will automatically expand or focus on the corresponding node, improving the efficiency of bottleneck location for you.

Figure 9 Function hotspot Top-Down view

Figure 10 Function hot spot flame graph

After locking a hot function, you only need to double-click the function node, and the Profiler tool will automatically open the corresponding source file for you and focus on the corresponding line of code. Of course, there is a premise here. The source file must belong to the current project and be compiled in DevEco Studio.

4. Verification and iterative optimization

Through the previous steps, developers should have been able to locate the bottleneck code, but the task is far from over. You also need to use the first few capabilities of the tool again to verify after the optimization is completed. Generally speaking, performance optimization is not something that can be achieved overnight and requires gradual and step-by-step improvement. This requires experience, but even more so, patience. Every lag is likely to be caused by the superposition of multiple sub-problems. This is one of the reasons why the task of performance optimization often needs to be carried out by experts in the team.

Of course, this actually points out a way for us programmers to get promoted and get a salary increase. Learn to tune and solve difficult problems such as performance problems. If you solve more problems, you will naturally become the technical backbone of the team. I also hope that this article The article and our tuning tool DevEco Profiler can bring some help to developers on their promotion path. Thank you for reading.

Guess you like

Origin blog.csdn.net/HarmonyOSDev/article/details/132907873