[Performance Optimization] Use Perfetto to locate bottlenecks in application startup performance

Many people have written articles related to Android application startup optimization, but they mainly focus on what changes have been made for startup performance. Few articles talk about how to analyze and identify the startup performance of applications.

This article will combine my personal experience of using Perfetto to explain how the startup time of in-vehicle applications is measured. After measuring the startup time, how can we analyze the performance bottlenecks.

Before analyzing the application startup performance, we first briefly understand some basic common sense about application startup time in Android.

Application startup time

Initial display time (TTID)

The Time to Initial Display (TTID), which is the time from the system receiving the startup intention to the application displaying the first frame of the interface, is the time when the user sees the application interface.

Measure TTID

When the application completes all the work mentioned above, the following log output can be seen in logcat.

/system_process I/ActivityTaskManager: Displayed xxxx/.MainActivity: +401ms

Displayed time metrics in Logcat output before all resources are fully loaded and displayed, omitting resources that are not referenced in the layout file or the time the app creates resources as part of object initialization.

Sometimes the logs in the logcat output will contain an additional field total. As follows:

ActivityManager: Displayed com.android.myexample/.StartupTiming: +3s534ms (total +1m22s643ms)

In this case, the first time measurement is only for the first activity drawn. totalThe time measurement is measured from when the app starts, and can include another activity that starts for the first time but does not display anything on the screen. totalTime measurements are only displayed if there is a difference between the individual activity's time and the total startup time.

am start -S -WThe time measured using commands introduced in some blogs is actually the initial display time. In most cases, the initial display time does not represent the real startup time of the application. For example, when the application starts, it needs to synchronize the latest data from the network. The real startup time is the time it takes when all the data is loaded. This This is the TTFD that will be introduced next - Time Total Display .

Time to Full Display (TTFD)

The Time to Full Display (TTFD, The Time to Full Display). It is the time from when the system receives the launch intent to when the application has finished loading all resources and view hierarchies, that is, when the user can actually use the application.

Measure TTFD

Method 1: CallreportFullyDrawn()

reportFullyDrawn()Method can be called after the application has finished loading all resources and view hierarchies, letting the system know that the application has been fully displayed so that the full display time can be calculated. If this method is not called, the system can only calculate the TTID but not the TTFD.

system_process I/ActivityTaskManager: Fully drawn xxxx/.MainActivity: +1s54ms

Method 2: Unframe

The frame splitting method is currently the most common method for calculating the startup time of in-vehicle applications. The frame splitting method has many different recording and frame splitting methods.

It is common to use a camera that supports 60fps (mobile phones that support 60fps camera can also be used) to shoot the startup video of the application, and then use the video player to view the difference in the number of frames from the Potplaydesktop click to the application screen being fully displayed , and then divide it by 60. You can get the startup time of the application.

The above method is suitable for testers. Here is another method that is more suitable for developers: FFmpeg frame splitting.

FFmpeg download address: http://ffmpeg.org/download.html?aemtn=tg-on

First use adb to connect to the Android device and use the screen recording command to record the video when the application starts.

adb shell screenrecord /sdcard/launch.mp4

Use FFmpeg to check the frame number of a video

ffmpeg -i launch.mp4 

If the number of video frames is less than 60fps, continue to use FFmpeg to add frames to the video to 60fps.

ffmpeg -i launch.mp4 -filter:v fps=60 output.mp4

Split each frame of the frame-filled video into a picture, and then calculate the difference in the number of frames from when you click on the desktop to when the application screen is fully displayed .

ffmpeg -i output.mp4 output_%04d.jpg

Or convert the frame-filled video into an animated GIF and use a picture browser to count the frames (the built-in picture browser of MAC OS will do).

ffmpeg -i output.mp4 -vf fps=60,scale=320:-1:flags=lanczos -loop 0 output.gif

This is the introduction to application startup time and measurement methods. For more information, please refer to Android's official document " Application Startup Time | Application Quality | Android ", which is very detailed.

It is worth mentioning that the current average startup time of mainstream vehicle applications (taking the 8155 platform as an example) is as follows:

  • Cold start TTFD

Third-party large-scale Internet applications need to be controlled below 2.6s, and vehicle system applications need to be controlled below 1.6s.

  • Warm start TTFD

Generally it needs to be controlled below 0.8s.

The above is my personal experience. Different host manufacturers will definitely have different performance requirements.

Introduction to Perfetto

PerfettoIt is a system-level tracking tool introduced in Android 10, supporting Android, Linux and Chrome, and is used to replace Systrace. Compared with Profilerand AGI, it is no longer limited to the application, but can provide the running status of the entire system. When we need to see whether the application affects the stability and fluency of the system, or conversely, it can be used to analyze the impact of the system on the application. It can be used Perfettofor system-level tracing and analysis when running impacts are detected .

For the basic content of Perfetto, you can check out the official Android video I translated before: [Translation] Modern Android Development Skills - Getting Started with Perfetto

Get started quickly with Perfetto

PerfettoThere are many ways to use it, and I personally recommend using record_android_tracescripts. It is Perfettoan auxiliary script provided to help us collect performance data from Android devices using adb. This script does the following:

  • Automatically detects whether perfettothe binary is available on the device, and if not, attempts to download it from GitHub and push it to the device.
  • Automatically set tracking configuration parameters such as tracking time, buffer size, output file path, etc.
  • Automatically execute perfettothe command and pull the output file to your computer after the trace is complete.
  • Automatically opens output files in a browser, allowing you to view and analyze tracking results.

Download address of record_android_trace: https://raw.githubusercontent.com/google/perfetto/master/tools/record_android_trace

record_android_traceThe usage is as follows:

./record_android_trace [options] [category1] [category2] ...

Among them, options are some optional parameters, such as:

  • -o OUT_FILE: Specify the path of the output file. If not specified, the default is perfetto_trace.pb.
  • -t TIME: Specify the tracking time. If not specified, the default is 10 seconds.
  • -b SIZE: Specifies the trace buffer size. If not specified, the default is 32 MB.

Category is some atrace or ftrace categories to be traced. You can use –list to view the Trace categories supported by the device. The output may be as follows:

link@link-PC:~/Desktop$ ./record_android_trace --list
         gfx - Graphics
       input - Input
        view - View System
     webview - WebView
          wm - Window Manager
          am - Activity Manager
          sm - Sync Manager
       audio - Audio
       video - Video
      camera - Camera
         hal - Hardware Modules
         res - Resource Loading
      dalvik - Dalvik VM
          rs - RenderScript
      bionic - Bionic C Library
       power - Power Management
          pm - Package Manager
          ss - System Server
    database - Database
     network - Network
         adb - ADB
    vibrator - Vibrator
        aidl - AIDL calls
       nnapi - NNAPI
         rro - Runtime Resource Overlay
         pdx - PDX services
       sched - CPU Scheduling
         irq - IRQ Events
         i2c - I2C Events
        freq - CPU Frequency
        idle - CPU Idle
        disk - Disk I/O
        sync - Synchronization
       workq - Kernel Workqueues
  memreclaim - Kernel Memory Reclaim
  regulators - Voltage and Current Regulators
  binder_driver - Binder Kernel driver
  binder_lock - Binder global lock trace
   pagecache - Page cache
      memory - Memory
     thermal - Thermal event
         gfx - Graphics (HAL)
         ion - ION allocation (HAL)

For example, if you want to trace sched, gfx and view, the output file is trace.perfetto-trace, the trace time is 5 seconds, and the buffer size is 16 MB, you can execute the following command:

./record_android_trace -o trace.perfetto-trace -t 5s -b 16mb sched gfx view

Perfetto analyzes startup performance

It is very simple to use Perfetto to analyze the startup performance of the application. First, use record_android_trace to capture the startup data of the application and execute the following instructions:

./record_android_trace -o trace.perfetto-trace -t 15s -b 200mb gfx input view webview wm am sm audio video camera hal res dalvik rs bionic power pm ss database network adb vibrator aidl nnapi rro pdx sched irq i2c freq idle disk sync workq memreclaim regulators binder_driver binder_lock pagecache memory gfx ion

After 15s, record_android_trace will automatically open the browser for us. Android App StartupsShown in one column is the startup time of the application. As shown below, it is important to note that Android App Startupsthe time displayed is the TTID of the application - the initial display time .

Select Metrics on the left side of Perfetto, then select android_startup , click Run, Perfetto will automatically help us analyze various data when the application is started, as shown below.

android_startup {
  startup {
    startup_id: 1
    startup_type: "warm"
    package_name: "com.xxx.xxx.weather"
    process_name: "com.xxx.xxx.weather"
    process {
      name: "com.xxx.xxx.weather"
      uid: 1000
      pid: 3376
    }
    zygote_new_process: false
    activity_hosting_process_count: 1
    event_timestamps {
      intent_received: 100680138137
      first_frame: 102167532928
    }
    to_first_frame {
      dur_ns: 1487394791
      dur_ms: 1487.394791
      main_thread_by_task_state {
        running_dur_ns: 1316606193
        runnable_dur_ns: 34121303
        uninterruptible_sleep_dur_ns: 20429636
        interruptible_sleep_dur_ns: 84415940
        uninterruptible_io_sleep_dur_ns: 12221457
        uninterruptible_non_io_sleep_dur_ns: 8208179
      }
      time_activity_manager {
        dur_ns: 16070209
        dur_ms: 16.070209
      }
      time_activity_start {
        dur_ns: 97578437
        dur_ms: 97.578437
      }
      time_activity_resume {
        dur_ns: 833413073
        dur_ms: 833.413073
      }
      time_choreographer {
        dur_ns: 481555469
        dur_ms: 481.555469
      }
      time_inflate {
        dur_ns: 1241538748
        dur_ms: 1241.538748
      }
      time_get_resources {
        dur_ns: 6173178
        dur_ms: 6.173178
      }
      time_verify_class {
        dur_ns: 1675365
        dur_ms: 1.675365
      }
      time_gc_total {
        dur_ns: 82049531
        dur_ms: 82.049531
      }
      time_dlopen_thread_main {
        dur_ns: 15522344
        dur_ms: 15.522344
      }
      time_lock_contention_thread_main {
        dur_ns: 4711976
        dur_ms: 4.711976
      }
      time_jit_thread_pool_on_cpu {
        dur_ns: 375033124
        dur_ms: 375.033124
      }
      time_gc_on_cpu {
        dur_ns: 81314427
        dur_ms: 81.314427
      }
      jit_compiled_methods: 218
      other_processes_spawned_count: 6
    }
    verify_class {
      name: "com.xxx.xxx.weather.service.VoiceActionManager"
      dur_ns: 1675365
    }
    dlopen_file: "/system/priv-app/Weather/Weather.apk!/lib/arm64-v8a/libffavc.so"
    dlopen_file: "/system/priv-app/Weather/Weather.apk!/lib/arm64-v8a/libpag.so"
    dlopen_file: "/vendor/lib64/hw/[email protected]"
    dlopen_file: "libadreno_utils.so"
    dlopen_file: "/vendor/lib64/hw/[email protected]"
    dlopen_file: "/vendor/lib64/hw/gralloc.msmnile.so"
    dlopen_file: "libadreno_app_profiles.so"
    dlopen_file: "libEGL_adreno.so"
    system_state {
      dex2oat_running: false
      installd_running: false
      broadcast_dispatched_count: 0
      broadcast_received_count: 0
      most_active_non_launch_processes: "media.codec"
      most_active_non_launch_processes: "app_process"
      most_active_non_launch_processes: "media.hwcodec"
      most_active_non_launch_processes: "/vendor/bin/hw/vendor.qti.hardware.display.allocator-service"
      most_active_non_launch_processes: "/system/bin/audioserver"
      installd_dur_ns: 0
      dex2oat_dur_ns: 0
    }
slow_start_reason: "GC Activity"
slow_start_reason: "Main Thread - Time spent in Running state"
slow_start_reason: "Time spent in view inflation"
  }
}

android_startup is a data structure used to record and analyze the startup performance of Android applications. It contains various information during the application startup process, such as startup type, startup time, startup reason, startup dependencies, system status, etc.

The content of android_startup is a text in protobuf format, which represents the startup data of a weather application named com.xxx.xxx.weather. Among them, the most important one is slow_start_reason in the last paragraph , which shows us the reasons why the application may cause slow startup. We will focus on the analysis in the third section.

The meanings of other fields are as follows:

startup_id: is a unique identifier, indicating that this is the first startup;

startup_type: is an enumeration type, indicating that this is a warm startup, that is, the application process already exists, but there is no activity in the foreground;

package_name and process_name represent the application package name and process name;

Process: Represents information about the application process, including name, user identifier (uid) and process identifier (pid);

zygote_new_process: Indicates whether a new process is created through zygote, here it is false;

activity_hosting_process_count: Indicates how many activities are hosted in this process, here it is 1;

event_timestamps: indicates the timestamps of various events. For example, intent_received indicates the time when the startup intention is received, and first_frame indicates the time when the first frame is displayed;

to_first_frame: Indicates the time and details from receiving the startup intention to displaying the first frame, including the total time, the time of various states of the main thread, the time of various operations, the usage of various resources, etc.;

verify_class: Indicates information about verifying class loading, including class name and time;

dlopen_file: Indicates information about opening shared library files, including file names;

system_state: Information indicating system status, including whether dex2oat or installd is running, whether broadcasts are sent or received, which non-startup processes are the most active, etc.;

Perfetto Practice

Trigger GC on startup

Phenomenon : "GC Activity" appears in slow_start_reason , indicating that GC activity slows down the startup of the application during the startup phase.

Analysis : Click [show timeline] to return to the Perfetto timeline interface. In the startup timeline, you can see that there is a thread named HeapTaskDaemon , which is the GC thread of the application. It was active for about 100ms during the startup phase, causing the activityResume timeline to be stretched by 100ms. In order to prevent the phenomenon from being discovered accidentally, we conducted multiple measurements and found that the GC activity will definitely be triggered when the application is started. as the picture shows:

Reason : According to the forward analysis of the timeline, it was found that the application will load a special font during the startup phase. The font is about 13MB. After communicating with the application development, it was confirmed that the font has been moved to the system layer and the application layer does not need to load the font. After removing the font, GC is no longer 100% triggered when the application starts.

Time-consuming operations on the main thread

Phenomenon : "Main Thread - Time spent in Running state" appears in slow_start_reason , which means that during the startup phase, more time-consuming operations are performed in the main thread.

Reason : This situation is very common during application development. Application developers will naturally place some cross-process data acquisition operations under the OnCreate or onStart method of the main thread Activity. Although these IPC methods will not trigger ANR will slow down the startup of the application and should be placed in the thread pool or coroutine for execution.

OpenDexFilesFromOat takes time

Phenomenon : "Main Thread - Time spent in OpenDexFilesFromOat*" appears in slow_start_reason , indicating that more time is spent reading dex files during the startup phase.

Reason : This situation is more common in car Android systems. This may be because the system has modified the dex2oat process in the system in order to speed up startup, causing this phenomenon. If it does not take much time, it can be ignored.

I just say "possible" here because today's car OS has a lot of modifications to native Android in order to start quickly. We need to make a detailed analysis based on our own actual situation.

Continuous multi-frame drawing timeout

Phenomenon: The startup time of an application in Perfetto is not long, about 1.3s. However, after using the frame splitting method, it is found that the application will be slightly lagging after startup, causing the actual startup time to be extended to 2.1s. The performance on Perfetto is as follows. After the first frame is drawn, the drawing time of the subsequent 2, 4, and 5 frames exceeds 150ms.

Analysis : The frame drawing timeline given by Perfetto shows that most of the time is spent in the Layout of the View, which shows that after the first frame is drawn, multiple page redraws are triggered.

Reason : Combining the code with the application log, it was found that the application refreshes the page once with empty data when it starts, and then obtains data from the IPC interface to update the page again. Moreover, due to code defects, the data refresh will be executed 4 times continuously, resulting in the situation. Modify the defective code so that continuous drawing timeout will no longer occur after the first frame.

References

https://developer.android.com/topic/performance/vitals/launch-time#time-initial

Guess you like

Origin blog.csdn.net/linkwj/article/details/132460341