APP startup optimization for Android

background

Why mention startup optimization again? First of all, the only way for users to enter the APP is to start it, which is the first link to experience the core link. Startup is divided into cold start, hot start and warm start. Unless otherwise specified, the word "start" in this article refers to cold start . If the startup time is too long, it will cause user loss, that is, cold start bounce. From the overall perspective of the APP experience, startup is at a core checkpoint, which is an extremely important link and one of the core indicators of the APP. As businesses become more and more complex, more and more things need to be done during the startup process. Many apps are faced with the problem that the original startup plan continues to deteriorate, resulting in a poor experience, and we are no exception. Startup optimization is a commonplace issue. This article will focus on the current situation of 1688 and share with you what we have done specifically, hoping to bring you some gains.

Starting the technology development journey

Before talking about startup optimization, let us first review the development history of startup optimization related technologies. From adding images to the windowBackground at the beginning to all kinds of operations later, I will briefly express it with a picture:

In the picture, I have simply listed some solutions that everyone is familiar with. It does not represent all of them and is for reference only. Friends who often do startup optimization may be more familiar with them. In terms of startup, many companies spare no effort and even use a lot of black technology to squeeze performance to achieve optimization results. For individuals or teams with limited development costs, some things are not suitable for doing. Below we will talk about our ideas to obtain greater benefits at limited costs.

Note: The content of this article applies to all Android development students.

What does 1688 startup optimization do?

Before optimizing, you must first analyze the current situation. The 1688 APP is called "Alibaba" and can be downloaded from all major application stores. It is hereinafter collectively referred to as Alibaba APP . What is the current situation of Alibaba APP?

  • Use windowBackground to enhance body feel

  • Single point optimization has been done for time-consuming tasks

  • Utilizes multi-threaded batch scheduling tasks

Brief description of the implementation of batch scheduling: Divide the startup task into four batches. First, execute the first batch in Application onCreate, then execute the second batch in the sub-thread, execute the third batch in the sub-thread in HomeActivity onCreate, and render the home page. After that, the sub-thread executes the fourth batch. In the same batch, they are grouped according to the set parameters. The same group is executed concurrently, and the rest are executed sequentially according to the parameters. This scheduling scheme has a huge drawback: except for the tasks in the Application that are deterministically run, the tasks of other sub-threads cannot know the exact timing at which they will be completed. We can only say to start running as early as possible. As business complexity increases, some tasks that must be completed before entering the home page can only be added in the Application stage. It was a bit troublesome to write down the startup tasks one by one, so there were several things done in one task, and the tasks also became corrupted.

startup definition

First of all, we must first understand what "cold start" is. Android’s official definition of startup can be found in the document: https://developer.android.com/topic/performance/vitals/launch-time. We can filter out the startup time through Displayed’s flitter in Logcat. For example:

So how does Alibaba Group define Android application startup? It is mainly divided into the following four stages:

  • system initialization
  • Application initialization
  • Above the fold
  • Startup completed (interactive)

This article focuses on the optimization of the two stages of "application initialization" and "first screen display". Application initialization refers to the time from the application startup to before onResume of the home page Activity. The first screen display refers to the time from the application startup to the time when the home page rendering is 80% completed. In the following, application initialization stage optimization refers to the stage from Application startup to homepage Activity onResume. It focuses on how to use multi-core capabilities for better task scheduling in the Application stage, thereby reducing the CPU idle time of a single task. Home page display optimization focuses on the stage from home page Activity onResume to the completion of 80% rendering of the home page, focusing on the optimization strategy of the home page.

Let’s first take a picture to show the overall plan for startup optimization:

Application initialization phase optimization

Above, several disadvantages of the old startup framework have been mentioned:

  • Except for the tasks in Application that are deterministically run, other tasks do not know when they will be run. Subsequent scenarios require a large number of check init.

  • Tasks gradually become onerous and corrupt

  • There is lock competition between tasks, and the experience on different models is quite different. A single task takes longer, and the upper layer cannot sense it.

With the continuous development of business, we increasingly need to schedule certain tasks at certain times, so that it can be made clear at any point in the subsequent process whether the tasks required by the module have been successfully initialized, the pre-initialization tasks on the homepage, and the subsequent flexible scheduling plan. Determinism is also needed. Problems that were previously hidden between sub-threads must now be exposed and dealt with. Secondly, startup tasks need to be reorganized and arranged to determine a single responsibility. This can also prevent the continuous corruption of future tasks. At the same time, this is also the basis for "lock-free" task scheduling, making better use of multi-core capabilities and reducing single task idleness. It takes time to achieve the overall optimal. For multi-scenario situations, such as client calling, web external link startup, automatic login, front-end and back-end switching, etc., it is necessary to support customized orchestration of startup tasks based on scenarios.

On this basis, we also need more comprehensive monitoring capabilities that can sense the running status of any online task. During the development period, we can integrate with the test platform to perform bayonet forced control startup tasks to sense whether there are abnormally time-consuming tasks. Tasks appear and so on. Let every Android student in the group see very clearly how startup tasks are scheduled and how the startup links associated with the business modules they are responsible for are executed in different startup scenarios.

Based on the above demands, we redeveloped the startup framework and named it: Yasuo.

And as you move forward with the strong wind, you must also pay attention to what is behind you.

As we all know, face the blast! We hope that the new launch framework can make the APP launch as fast as the wind, so we named it Yasuo.

So what does Yasuo's overall structure look like? Based on Taobao's DAG scheduling framework, we further customized and encapsulated the middle layer capabilities, connected with our original startup library, and transformed the entire startup link at a relatively low cost.

The picture above shows the overall architecture of Yasuo in the simplest layered architecture. It has access to Taobao DAG scheduling capabilities downwards and inherited the previous startup framework of 1688. The core modules are:

  • core (upper layer encapsulation based on DAG)

  • statistics (monitoring statistics module)

  • common (public module)

  • config (start configuration module)

  • api (api module called by starting the task)

  • bootstrap (launcher entry)

We mainly did a few more important things:

  • Start tasks to be broken up, re-split into fine-grained parts, and follow a single responsibility

  • Yasuo framework construction

  • Reorganize tasks and try to achieve lock-free scheduling through various means

For external students, there are already many open source solutions for the DAG scheduling framework. You can choose a suitable framework to access and implement a launcher similar to "Yasuo". Most of the ideas are common.

Starting tasks and breaking them up is a delicate task, which is easier said than done. First of all, we need to have a relatively comprehensive understanding of all startup tasks and find the starting point for "dismantling" them. Since our APP has been developed for many years, basically no one can explain what is done when starting the task. It took a lot of effort to look at and dismantle it bit by bit. In the end, from the original less than 40 tasks, it was dismantled. Arrived more than 70 .

Tasks are reorganized, which is more energy-intensive. As we all know, most people do not dare to touch the "ancestral code" for fear of causing problems, especially on the startup link. If there is a problem, it may lead to an online accident. This time, we went deep into the ocean of ancestral code and operated as fiercely as a tiger. There are three most important points to consider when planning:

  • Unlocked

  • Sequence (explicit dependencies and implicit dependencies)

  • Arrange solutions in multiple scenarios

First, go deep into the group's second-party library to do splitting and lock-free orchestration, and determine sequence dependencies based on internal code. Secondly, we carefully studied our own startup tasks, worked with classmates in the group, conducted countless experiments, and went through two or three versions of iterations before we finally determined the final arrangement plan (tears). During the startup phase, in addition to possible lock competition between our own tasks, common problems that everyone may encounter include:

  • SharedPreferences lock for reading data. The first time SharedPreferences reads a file, it will be read by new Thread.

  • Lock of loadLibrary0 (load so)

  • Cache read file lock

  • If you inflate the view in advance, please note that if the same Context is used, the inflate is also locked.

In response to these problems, the main solution is SharedPreferences and cache prefetching, and the load so task is spread out to each stage for execution as much as possible. It is recommended to use the official Android analysis tool systrace for analysis. The internal modules of Yasuo have trace in all startup tasks and stages. After turning on the trace switch, you can analyze it in the release package, which is closest to the real data. It is also recommended that you use Google's new tool: https://ui.perfetto.dev/, which can not only display the html file of systrace, but also record trace on high-version Android systems. For more functions, you can use the documentation to customize it. Discover.

The latest version of Alibaba APP online (9.10.3.0) starts the task scheduling part:

It can be seen that especially the front link is basically unlocked. Lock-free does not mean that there must not be any lock competition, but from a phased perspective, it solves the time-consuming peak, makes the phase time-consuming smoother, and achieves the overall optimum. At the same time, the compatibility issues of multiple models must be considered . Pay full attention to ROI. At the same time, single-stage task arrangement must also consider thread scheduling capabilities, and implicit dependencies (dependencies between stages) can also be used to solve some difficult lock competition problems. Remember to look at the issue holistically and strike a balance.

Finally, during the cold start phase of the main process, we enabled the following phases (including calling):

  • Application attach
  • Application onCreate (which contains 3 small stages)
  • onCreate of the first Activity
  • External link call start
  • push Activity onCreate (offline push call start)
  • Triggers a stage after launching the interactive
  • After the main thread is idle
  • After the main thread idles for 5 seconds

In addition, there are stages of other processes such as small program processes. A total of more than 70 tasks are scheduled, 80% of which are executed before the interaction is started. In other words, these tasks have been initialized and can be used before the user enters the homepage and can interact, which is the same as the old startup framework. In comparison, the throughput is increased by 4 to 5 times .

The latest online version of Alibaba APP (9.10.3.0) startup phase diagram (ordinary cold start):


For the major stage of application initialization, the overall time consumption of the new solution is compared with the old solution. There is not much room for optimization. From the overall market average, the time consumption of 100-200ms is reduced, but this is compared with our old solution. The framework only schedules 10 tasks in the Application phase. It can provide a very large operational space for subsequent home page display and other stages, achieve full start-up controllability, bayonet during the R&D period, and can be perceived after going online, and provides a foundation for subsequent flexible startup solutions.

Here is a small tip: If the startup image is similar to Alibaba APP, in the form of a large background color + logo or content, it is recommended to use layer-list to implement it. Use the item shape color value instead of the background color, and use a small image for the logo or content. , this will reduce the time spent decoding windowBackground, the changes will be small, and the benefits will be good.

Home page display optimization

First, let’s introduce the basic structure of the homepage:

The homepage consists of a second-floor container, a static main page (including search components, multi-Tab components), dynamically built homepage tabs and industry tab pages (built with CyberT containers).

The characteristics of the 1688 homepage are that the business form is complex, there are many responsive tasks, there are many nested containers, and the requirements for dynamic capabilities are high. Moreover, after years of development, the various marketing capabilities of the homepage, such as pop-up layers/floating windows/floating bars, etc. are intertwined. , without affecting business capabilities and interaction, taking into account ROI, borrowing the large amount of space that the Yasuo startup framework in the previous article has won for us, the most immediate strategic policy is to trade space for time .

Therefore, in the above context, relying on the deterministic startup phase and the programmable task system provided by the Yasuo framework , although it cannot reduce the startup time too much, it provides a lot of operating space for the optimization of our homepage. , so that we can make full use of multi-core capabilities to buy time for homepage optimization.

The main optimization ideas are divided into the following three points:

  • Ensure the main thread environment at startup without too many interfering factors** (time-consuming tasks before/after)**

  • The rendering container transformation puts part of the logic of home page rendering in the Application initialization stage for execution** (folding of rendering time)**

  • Rendering data control/refresh control to prevent a large number of repeated operations from occupying the main thread at startup (to ensure rendering content)

1. Ensure the main thread environment at startup

First of all, when the application is just started, it is the busiest period of time for the CPU. The initialization of a large number of second-party libraries causes the number of sub-threads to increase sharply. At this stage, the competition between the main thread and the time slice becomes extremely fierce. Therefore, when executing to the home page display stage, it is necessary to ensure that the main thread is not occupied by other tasks. For example, tasks such as WebView initialization and Flutter initialization that require main thread operations can be appropriately diverted and the tasks can be triggered after the homepage is rendered or is idle.

Problem that the WebView initialization task is executed in advance:

When entering the homepage, a marketing pop-up function that provides red envelopes or attracts traffic to the venue will often pop up. This elastic layer is implemented by H5 and relies on the WebView initialization task. When debugging the ancestral code, we found that the design principle of the old startup is to ensure "as much as possible" that the relevant initialization modules have been executed before the module is executed, but its order may not be guaranteed. Therefore, check init will be called before the old code is executed, that is, if If the relevant initialization module has not yet been executed, the initialization task will be forcibly awakened and executed. As a result, WebView is initialized before entering the homepage Activity. The initialization task of WebView takes a long time, which wastes CPU resources in the homepage display stage.

Therefore, we have made several improvements to address this phenomenon:

  • Split the WebView initialization task and separate the modules required for the marketing elastic layer to reduce time consumption and prevent it from becoming a stage bottleneck and execute it in the Activity onCreate stage in advance

  • Place the triggering logic of the marketing pop-up layer behind the home page and trigger the callback after 80% rendering.

  • Pre-judgment of marketing layer pop-up logic ---- Execute the logic of whether the pop-up layer needs to be displayed in advance to prevent the waste of resources caused by the page not being displayed after initialization, so as to ensure the business effect of the marketing pop-up layer arousal rate and performance consumption hour. (After the program was launched, it brought an improvement of about 200ms to the market)

2. Home page rendering in advance

The homepage rendering framework uses the CyberT container, which is a componentized page delivery system developed by 1688. The page is built and rendered by dynamically delivering component templates, styles, and data. It also meets the strong dynamics of e-commerce business and the experience requirements of native performance on the homepage. The homepage tab and industry tab pages are built using CyberT containers.

In order to make full use of the space brought to us by Yasuo's orchestration capabilities, we transformed and precipitated the rendering optimization logic on CyberT, and carried out part of the logic in advance in the application initialization phase, respectively implementing:

  • CyberT protocol analysis, View pre-generation

  • Pre-generation of layout files

  • Template pre-creation of home page components

CyberT protocol analysis, View pre-generation:

During the debugging process, we found that the main thread callback of protocol parsing was in the home page display stage. Due to the busy main thread, it was idle for a long time and was unable to generate the CyberT page. The generation of the CyberT page can only be done after the CyberT initialization task is completed. Start, so without becoming a bottleneck in the startup phase, we use the cached protocol and component data in the application initialization phase to perform protocol parsing and View creation in advance and save them in memory, and directly add them when waiting for the homepage Fragment to be generated. Enter the View tree and execute subsequent rendering logic. Use the Application context throughout the entire process to expose the Context and implement Use Once logic, and recycle it in time to prevent memory leaks. We also monitor the online success rate, correlate the model scores with the success rate, issue configuration switches, and adjust optimization strategies on models with different scores to maximize program benefits. (After this solution is launched, it can bring an improvement of 150ms to 200ms to the market)

Pre-generation of layout files:

By analyzing the trace file, we found that parsing xml takes a certain amount of time on the main thread. Therefore, for the layout files of Activity, Fragment, and second-floor View that must be executed on the homepage, we also pre-created them in the Application stage to avoid thread interruption. To solve the problem of application lock competition during excessive creation and inflate, we placed it in a single-thread pool for sequential execution. We also do a good job in monitoring the success rate and flexing the optimization strategy.

Template pre-creation:

When CyberT renders and builds a page, it often first requests or reads the component template to generate the View of the corresponding block. We formulated an initialization task for template pre-creation, generated the cached data of template files and components into Views in advance and put them into the cache pool, reducing the waste of resources caused by refreshes caused by subsequent IO waiting, and compressing the component generation time to microseconds. .

3. Rendering data control/refresh control

In order to achieve the purpose of allowing users to interact as quickly as possible, 1688 currently adopts a rendering strategy of giving priority to rendering cached data, and then performing diff refresh after the network data is called back. The control here is actually to check the rationality of the code. Are there unreasonable callbacks in the code logic that cause the page to be refreshed frequently and users cannot interact? Is there cached data that has not yet been rendered, and network data is forcibly refreshed, causing the user interaction time to be delayed? We have addressed the above issues and have strong control over the rendering logic before and after interactivity to prevent resource waste caused by logical irrationality during the rendering process. In the early stages of optimization, turning the unreasonable aspects of the ancestral code into reasonable ones can often bring unexpected benefits.

Enable performance scaling

As mentioned above, in the process of homepage display optimization, some strategies have problems with success rate. Through offline model observation and online data comparison, we found that for low-end models with weak CPU capabilities, View The pre-generation and pre-creation will have a high probability of cache failure, resulting in a waste of time slice resources and even negative optimization. Therefore, for long-tail users, we have designed a dynamic performance scaling solution. The general process of the entire solution is based on user model rating/population selection/network environment, etc., combined with the configuration issued by the remote platform, input to the decision-making engine on the client, and the decision-making strategy is stored locally to take effect at the next startup. And observe the success rate of online View Cache, the startup time of low-end machines, the startup time of different groups of people and other indicators to verify the strategic benefits.

For example, on low-end machines, we have adopted strategies such as turning off preloading and pregeneration capabilities, controlling the number of video playbacks on the same screen, controlling gif animations, and postponing time-consuming tasks such as WebView Init/Flutter Init to idle scenarios after 5 seconds to prioritize low-cost scenarios. experience on the terminal. Not only enable relevant strategies on high-end models to ensure that the benefits of optimization strategies are maximized, but also preload marketing capabilities/marketing pages without becoming a time-consuming bottleneck in the stage to quickly respond to marketing strategies, in terms of user experience and business transformation. Find a balance point.

How to prevent and control

We precipitated the starter container: Yasuo. On the container side, it has the capabilities of online monitoring, R&D bayonet, and test platform integration, which can control all stages from development to launch to prevent task corruption. On the homepage side, part of the optimization is based on CyberT container development. The container side itself has a bayonet, and part of it is the optimization of the homepage itself. With the help of the full-link log system we are developing, we have made cross-cutting points at key positions, with monitoring capabilities. When an exception occurs, we can be notified in time.

During the development period, we use git to manage the code warehouse, and the warehouse permissions related to the startup library and startup pool are tightened. Changes can be clearly perceived when they go online. In conjunction with the code review mechanism, the possibility of core link problems is reduced. At the same time, the startup orchestration class and startup pool class are generated by corresponding scripts. The script content is simple and easy to maintain, and can also reduce maintenance costs.

Effect

As of the writing of this article, the Alibaba APP startup time has been optimized from about 3 seconds in August and September to the current 1.9 seconds, a decrease of 36.67% .

Compared with the data in August and September, the cold start bounce rate dropped by 75% .

A drill-down analysis of Type B buyer users and potential Type B buyer users found that the overall data is better than the market, and there will be crowd optimization strategies in the future.

With the launch of the elastic and shrinking strategy, better results will be achieved by issuing strategies based on machine models, groups of people, and environments.

Android study notes

Android performance optimization article: https://qr18.cn/FVlo89
Android Framework underlying principles article: https://qr18.cn/AQpN4J
Android vehicle article: https://qr18.cn/F05ZCM
Android reverse security study notes: https://qr18.cn/CQ5TcL
Android audio and video article: https://qr18.cn/Ei3VPD
Jetpack family bucket article (including Compose): https://qr18.cn/A0gajp
OkHttp source code analysis notes: https://qr18.cn/Cw0pBD
Kotlin article: https://qr18.cn/CdjtAF
Gradle article: https://qr18.cn/DzrmMB
Flutter article: https://qr18.cn/DIvKma
Eight knowledge bodies of Android: https://qr18.cn/CyxarU
Android core notes: https://qr21.cn/CaZQLo
Android interview questions from previous years: https://qr18.cn/CKV8OZ
The latest Android interview questions in 2023: https://qr18.cn/CgxrRy
Android vehicle development position interview exercises: https://qr18.cn/FTlyCJ
Audio and video interview questions:https://qr18.cn/AcV6Ap

Guess you like

Origin blog.csdn.net/maniuT/article/details/132782222