Android startup optimization, solution research

The speed of an application can be summed up in four words, namely "user experience"

1. User experience

User Experience (UE/UX for short) is a purely subjective feeling established by users in the process of using a product. But for a well-defined group of users, the commonalities of their user experience can be recognized through well-designed experiments. With the development of computer technology and the Internet, the form of technological innovation is changing. User-centered and people-oriented are getting more and more attention. User experience is therefore called the essence of the innovation 2.0 model. In China's innovation 2.0 for the knowledge society - the exploration of the application innovation park model, the user experience is also regarded as the first of the " three tests " innovation mechanism. The ISO 9241-210 standard defines user experience as "people's cognitive impressions and responses to products, systems or services that are used or expected to be used". In layman's terms, it is "whether this thing is good or not, it is inconvenient to use". Therefore, the user experience is subjective , and it focuses on the effect produced in practical application. A supplement to the ISO definition explains the following: User experience, i.e. the full range of feelings a user experiences before, during, and after using a product or system, including emotions, beliefs, preferences, cognitive impressions, physical and psychological reactions, behaviors, and achievements and other aspects. The note also lists three factors that affect the user experience: system , user , and usage environment .

However, all user experience is premised on using the product


The first point of using a product is startup speed, which is why all products have been pursuing startup optimization.

In the question of establishing a quantile system and using standards to quantify the optimization effect , I listed many optimization items to start optimization, which is also an optimization method often used in application layer development;

2. Routine Optimization of Variation

After using some of the above general optimizations, can the startup speed be improved? The answer is yes, the way is to make some adjustments in space and time, the most outstanding of which is the " directed acyclic graph launcher "

2.1 Analysis of Directed Acyclic Graph Starters

First of all, his author also cut in by means of conventional optimization, and then divided the startup items into many separate tasks, and finally used the data structure of the directed acyclic graph to arrange the scheduling of tasks in order, thus speeding up the startup speed. .

2.1.1 Cold start time detection
  1. Detected by adb
adb shell am start -W package/XXXActivity
复制代码
  1. Analysis of running results
  • TotalTime: The startup time of the application, including the creation process + Application initialization + Activity initialization to interface display.
  • WaitTime: Generally larger than TotalTime, it is the total time taken by AMS to start Activity.
  • There is no WaitTime below Android 5.0, so we just need to focus on TotalTime.
Starting: Intent { act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] cmp=***/***.SplashActivity }
Warning: Activity not started, intent has been delivered to currently running top-most instance.
Status: ok
LaunchState: UNKNOWN (0)
Activity: ***/***.MainActivity
TotalTime: 785
WaitTime: 787
Complete
复制代码
2.2.2 The business, tools, UI and other components required by the application are initialized in the application , resulting in time-consuming

Cold start process:

  1. Application's onCreate() method

Usually, we will do a lot of initialization operations here, various libraries, business components, etc. If there are too many tasks here, and they are all executed serially in the main thread, it will greatly affect the cold start speed.

The main solution is also the time-consuming here
  1. Turn the serial task of the main thread into a concurrent task

When it comes to concurrency , the most commonly used ones in Android are AsyncTask and thread pools. However, whether all tasks are scheduled in sub-threads will really reduce the cold start time, which is more than the cost of our transformation. In other words, whether the benefit is great or not, and whether it is the optimal solution, are all issues that we need to focus on. Therefore, how to choose a suitable thread pool is the first problem we need to solve.

需要注意的是子线程的数量不是越多越好,可运行的子线程能合理分配到CPU的调度是最好的选择

  1. 任务调度线程池的选择
3.1 线程池导致的主线程卡顿

首先,我们需要达成共识,线程在 Android 是干嘛的,它是 CPU 任务调度的基本单位,而 并发 的本质就是共享CPU的时间片,所以,如果我们在线程池中的任务极大的消耗了CPU的资源,这就会导致一个直观的问题,看似串行任务变成了多线程并发任务,却造成了主线程卡顿,导致我们的所作所为出现了副作用;

3.2 怎么选择线程池?

基于上面的描述, 关于 线程池的选择,在这种场景下,我们最优的选择无非就是 定容 线程池,缓存线程池;

3.3 两种都存在的情况下优先级这么确定?

确定任务是CPU密集型还是IO密集型

我们需要知道某一个任务是否是 CPU 消耗型的任务(比如运算类的操作),还是说 IO 类型的任务( 内存 分配型),前者消耗的CPU时间片较多,我们就把它放在 定容 线程池里调度,后者消耗的时间片少,我们就把它放在缓存线程池中,这样,技能充分的调用CPU资源,又不容易过度占用CPU,使得任务 并发 运行,达到时间优化的目的。

  1. 使用SysTrace 来确定一个任务的耗时
  1. 代码插入收集
private void initAnalyzeAync() {
        TraceCompat.beginSection("initAnalyzeAync");
        PbnAnalyze.setEnableLog(BuildConfig.DEBUG);
        PbnAnalyze.setEnableAnalyzeEvent(true);
        initAnalyze();
        TraceCompat.endSection();
    }
复制代码
  1. 找到systrace.py
  1. 执行python命令
python systrace.py-t 10 -a <package_name> -o xxtrace.html sched gfx view wm am app
复制代码
  1. 运行App,等待html文件生成
  1. 打开html文件,查看耗时cpu Duration为消耗cpu的时间,wall Duration为总时间

  1. cpu Duration几乎占了全部的wall Duration,所以这个任务为cpu消耗型任务,所以我们优化的时候要把这个任务放在定容线程池中
  1. 任务调度的先后顺序

如何为任务选择合适的线程池问题我们已经解决了,但是实际使用中,我们的任务执行是有先后顺序的,可能在主线程串行的时候,任务顺序我们非常容易控制,但是,多线程 并发 时,并且使用的不同的线程池后,这些任务执行的顺序问题又该如何解决呢?

有向无环图这个数据结构完美的解决了我们的问题。具体在代码中如何实现,待会细看,其实就是每个任务用countDownLatch来标记入度

  • 先执行入度为0的任务
  • 让依赖于它的任务入度-1(countDownLatch.countDown()),直到入度为0,执行该任务
  • 重复以上两个步骤

  1. 任务执行等待问题

经常会遇到这种场景,splashActivity的启动必须依赖于某个库初始化完成才行,直白一点来说就是在application中阻塞执行这个任务,基于我们的多线程 并发 任务调度,最简便的方法就是任务管理器使用CountDownLatch,在任务开始执行时调用countDownLatch.await(),在我们构造图结构时,把需要在application中阻塞执行的任务标记好,然后每执行完一个任务countDownLatch.countDown(),直到所有阻塞任务都执行完毕后,阻塞结束

\

三、从系统层进行启动优化

官方推的Baseline Profile ,能优化30% ~ 40%的启动优化成绩并且是一个通用的解决方案Android 强推的 Baseline Profiles 国内能用吗?我找 Google 工程师求证了! - 掘金 该文中调研了在国内的可行性。

\

3.1 原理

Baseline Profile 并不是一个新的东西。而且它也不是一个 Jetpack Library,它只是存在于 Android 系统的一个文件。

  • 对于 Android 5.0、6.0 系统来说,我们的代码会在安装期间进行全量 AOT 编译。虽然 AOT 的性能更高,但它会带来额外的问题:应用安装时间大大增加、磁盘占用更加大。
  • 对于 Android 7.0+ 系统来说,Android 支持 JIT、AOT 并存的混合编译模式。在这些高版本的系统当中,ART 虚拟机会在运行时统计到应用的热点代码,存放在/data/misc/profiles/cur/0/包名/primary.prof这个路径下。ART 虚拟机会针对这些热点代码进行 AOT 编译,这种方式要比全量 AOT 编译灵活很多。

由于 ART 虚拟机需要执行一段时间以后,才能统计出热点代码,而且由于每个用户的使用场景、时长不一样,最终统计出来的热点代码也不一定是最优的。

Google 的思路就是 让开发者自己统计热点代码,打包到apk中

\

3.2 使用

  1. 统计热点代码

Baseline Profile 其实就是一个文件,它里面会记录我们应用的热点代码,最终被放在 APK 的 assets/dexopt/baseline.prof 目录下。有了它,ART 虚拟机就可以进行相应的 AOT 编译了。

3.2.1 自动收集热点代码并生成文件

Google 更加推荐我们使用 Jetpack 当中的 Macrobenchmark。它是 Android 里的一个性能优化库,借助这个库,我们可以:生成Baseline Profile文件

  • 将依赖项添加到 应用程序中的ProfileInstaller 库build.gradle,以启用本地和 Play 商店基线配置文件编译。这是在本地旁加载基线配置文件的唯一方法。
dependencies {
     implementation("androidx.profileinstaller:profileinstaller:1.2.0-beta01")
}
复制代码
  • 定义一个名为的新测试BaselineProfileGenerator,类似于:
@ExperimentalBaselineProfilesApi
@RunWith(AndroidJUnit4::class)
class BaselineProfileGenerator {
    @get:Rule val baselineProfileRule = BaselineProfileRule()

    @Test
    fun startup() =
        baselineProfileRule.collectBaselineProfile(packageName = "com.example.app") {
            pressHome()
            // This block defines the app's critical user journey. Here we are interested in
            // optimizing for app startup. But you can also navigate and scroll
            // through your most important UI.
            startActivityAndWait()
        }
}
复制代码
  • 连接userdebug运行 Android 9 或更高版本的 Android 开源项目 (AOSP) 模拟器或已获得根的 Android 开源项目 (AOSP) 模拟器。

(生成基线配置文件需要**userdebug运行 Android 9 或更高版本的 root / build。没有Google API 的Google 分布式模拟器是支持此工作流程的理想构建,因为您无法adb root**在具有 Google Play 的模拟器上使用。)

  • 从终端运行adb root命令以确保 adb 守护程序以 root 权限运行。
  • 运行测试并等待其完成。
  • 在 logcat 中找到生成的配置文件位置。搜索日志标签 Benchmark
com.example.app D/Benchmark: Usable output directory: /storage/emulated/0/Android/media/com.example.app
复制代码
# List the output baseline profile
ls /storage/emulated/0/Android/media/com.example.app
SampleStartupBenchmark_startup-baseline-prof.txt
复制代码
  • 从您的设备中提取生成的文件。
adb pull storage/emulated/0/Android/media/com.example.app/SampleStartupBenchmark_startup-baseline-prof.txt .
复制代码
  • 将生成的文件重命名为baseline-prof.txt并将其复制到 src/main您的应用程序模块的目录中。
3.2.2 手动定义配置文件规则

baseline-prof.txt您可以通过在目录中创建一个名为的文件,在应用程序或库模块中手动定义配置文件规则src/main。这是包含该AndroidManifest.xml文件的同一文件夹。

该文件每行指定一个规则。每个规则代表一个模式,用于匹配应用程序或库中需要优化的方法或类。

这些规则的语法是人类可读的 ART 配置文件格式 (HRF) 的超集adb shell profman --dump-classes-and-methods语法与描述符和签名的语法非常相似 ,但也允许使用通配符来简化规则编写过程。

以下示例显示了 Jetpack Compose 库中包含的一些 Baseline Profile 规则:

HSPLandroidx/compose/runtime/ComposerImpl;->updateValue(Ljava/lang/Object;)V
HSPLandroidx/compose/runtime/ComposerImpl;->updatedNodeCount(I)I
HLandroidx/compose/runtime/ComposerImpl;->validateNodeExpected()V
PLandroidx/compose/runtime/CompositionImpl;->applyChanges()V
HLandroidx/compose/runtime/ComposerKt;->findLocation(Ljava/util/List;I)I
Landroidx/compose/runtime/ComposerImpl;
复制代码
3.2.4 规则语法

这些规则采用以下两种形式之一来定位方法或类:

[FLAGS][CLASS_DESCRIPTOR]->[METHOD_SIGNATURE]
复制代码

类规则使用以下模式:

[CLASS_DESCRIPTOR]
复制代码

这些模式可以有通配符,以便让单个规则包含多个方法或类。如需在 Android Studio 中使用规则语法编写时获得指导帮助,请查看 Android Baseline Profiles 插件。

通配符规则的示例可能如下所示:

HSPLandroidx/compose/ui/layout/**->**(**)**
复制代码

衡量改进

警告: 确保您使用运行 Android 7 或更高版本的物理设备测量基准配置文件的性能。

使用 Macrobenchmark 库自动测量

宏基准允许您通过 CompilationMode API 控制预测量编译,包括BaselineProfile使用情况。

如果您已经BaselineProfileRule在 Macrobenchmark 模块中设置了测试,则可以在该模块中定义一个新测试来评估其性能:

@RunWith(AndroidJUnit4::class)
class BaselineProfileBenchmark {
  @get:Rule
  val benchmarkRule = MacrobenchmarkRule()

  @Test
  fun startupNoCompilation() {
    startup(CompilationMode.None())
  }

  @Test
  fun startupBaselineProfile() {
    startup(CompilationMode.Partial(
      baselineProfileMode = BaselineProfileMode.Require
    ))
  }

  private fun startup(compilationMode: CompilationMode) {
    benchmarkRule.measureRepeated(
      packageName = "com.example.app",
      metrics = listOf(StartupTimingMetric()),
      iterations = 10,
      startupMode = StartupMode.COLD,
      compilationMode = compilationMode
    ) { // this = MacrobenchmarkScope
        pressHome()
        startActivityAndWait()
    }
  }
}
复制代码

通过测试结果的示例如下所示:

这是一个小测试的结果。较大的应用程序将从基线配置文件中获得更大的好处。

请注意,虽然上面的示例着眼于StartupTimingMetric,但还有其他重要的指标值得考虑,例如 Jank(帧指标),可以使用 Jetpack Macrobenchmark 进行测量。

手动衡量应用改进

首先,让我们测量未优化的 应用启动 以供参考。

PACKAGE_NAME=com.example.app
# Force Stop App
adb shell am force-stop $PACKAGE_NAME
# Reset compiled state
adb shell cmd package compile --reset $PACKAGE_NAME
# Measure App startup
# This corresponds to `Time to initial display` metric
# For additional info https://developer.android.com/topic/performance/vitals/launch-time#time-initial
adb shell am start-activity -W -n $PACKAGE_NAME/.ExampleActivity \
 | grep "TotalTime"
复制代码

接下来,让我们侧载基线配置文件。

注意: 此工作流程仅在 Android 9 (API 28) 到 Android 11 (API 30) 版本上受支持。

# Unzip the Release APK first
unzip release.apk
# Create a ZIP archive
# Note: The name should match the name of the APK
# Note: Copy baseline.prof{m} and rename it to primary.prof{m}
cp assets/dexopt/baseline.prof primary.prof
cp assets/dexopt/baseline.profm primary.profm
# Create an archive
zip -r release.dm primary.prof primary.profm
# Confirm that release.dm only contains the two profile files:
unzip -l release.dm
# Archive:  release.dm
#   Length      Date    Time    Name
# ---------  ---------- -----   ----
#      3885  1980-12-31 17:01   primary.prof
#      1024  1980-12-31 17:01   primary.profm
# ---------                     -------
#                               2 files
# Install APK + Profile together
adb install-multiple release.apk release.dm
复制代码

要验证软件包在安装时是否已优化,请运行以下命令:

# Check dexopt state
adb shell dumpsys package dexopt | grep -A 1 $PACKAGE_NAME
复制代码

输出应说明包已编译。

[com.example.app]
  path: /data/app/~~YvNxUxuP2e5xA6EGtM5i9A==/com.example.app-zQ0tkJN8tDrEZXTlrDUSBg==/base.apk
  arm64: [status=speed-profile] [reason=install-dm]
复制代码

现在,我们可以像以前一样测量应用程序启动性能,但无需重置编译状态。

注意: 确保不要重置包的编译状态。

# Force Stop App
adb shell am force-stop $PACKAGE_NAME
# Measure App startup
adb shell am start-activity -W -n $PACKAGE_NAME/.ExampleActivity \
 | grep "TotalTime"
复制代码

注意: 为了获得更高的稳定性和准确性,建议使用 Macrobenchmark 来测量性能影响,因为它可以循环重复测量,捕获跟踪以进行性能调试,并提高可靠性(例如,通过清除操作系统的磁盘缓存)。

创建基线配置文件时,还有一些额外的注意事项:
  • Android 5 到 Android 6(API 级别 21 和 23)已经在安装时 AOT 编译 APK。
  • 规则文件必须命名baseline-prof.txt并放置在主源集的根目录中(它应该是文件的同级 AndroidManifest.xml文件)。
  • 7.1.0-alpha05仅当您使用 Android Gradle 插件或更高版本 (Android Studio Bumblebee Canary 5)时才会使用这些文件 。
  • Bazel 目前不支持读取基线配置文件并将其合并到 APK 中。
  • 基线配置文件压缩后的大小不能超过 1.5 MB。因此,库和应用程序应该努力定义一小组能够最大化影响的配置文件规则。
  • 编译过多应用程序的广泛规则可能会由于磁盘访问增加而减慢启动速度。您应该测试基线配置文件的性能。
已知的问题

目前,使用基线配置文件有几个已知问题:

  • 从 app bundle 构建 APK 时,基线配置文件未正确打包。要解决此问题,请应用 com.android.tools.build:gradle:7.3.0-beta02及更高版本 ( issue )。
  • 基线配置文件仅针对主classes.dex 文件正确打包。这会影响具有多个.dex文件的应用程序。要解决此问题,请应用com.android.tools.build:gradle:7.3.0-beta02及更高版本 ( issue )。
  • user(Non-root) builds do not allow resetting the ART profile cache. To fix this, androidx.benchmark:benchmark-macro-junit4:1.1.0-rc02include a fix ( issue ) to reinstall the app during benchmarking.
  • Android Studio Profilers doesn't install baseline profiles when profiling the app ( problem ).
  • Non-Gradle build systems (Bazel, Buck, etc.) do not support compiling baseline profiles into output APKs.

4. Summary

  • Regular optimization is what we have to do;
  • It is very efficient to perform spatial transformations on the basis of regular optimizations;
  • There are downsides to system-level startup optimization, but it's worth trying:

    • The collection manufacturer is heavily customized and may not be used

5. Reference articles

Guess you like

Origin juejin.im/post/7119368593644470285