iQIYI APP Android low-end machine performance optimization

Wuhan Yuan Chuanghui Returns, Let’s Talk About Large Models on April 20th”

backgroundintroduce

In the smartphone market, high-end models often attract the most attention, but low-end models also occupy a share that cannot be ignored. In order to meet the needs of the low-end market, many manufacturers continue to launch low-end series of mobile phones. In addition, mid-to-high-end models in the past few years have now been classified as low-end models with the rapid iteration of system hardware. iQiyi APP has a huge user base, among which users of low-end models also account for a considerable part. Low-end machine optimization can bring a stable, smooth, and efficient user experience to these users. The following will introduce iQiyi APP’s optimization strategy for low-end machines from three dimensions: cold start, fluency, and loading speed.

Low-end machine grading strategy

Before introducing optimization, let’s look at the standards of low-end machines. The judgment of low-end machines is usually based on factors such as device model, memory size, and system version. iQiyi APP has its own low-end machine grading strategy, which can configure optimization strategies by scenario (startup, fluency, etc.) and tiers (memory, model, system, etc.) in the policy background to ensure the best experience in different scenarios. .

Start optimization

Launch is the first door that an APP opens to users. Its time-consuming time directly affects the user's subsequent viewing experience and retention, and has a significant impact on business indicators. Therefore, startup optimization is a key work content in the direction of technical optimization.

Related introduction

The starting point and end point of the startup phase: iQiyi APP uses Application.attachBaseContext as the starting point; the home page data displays the end point. The duration of this phase is regarded as the normal online cold start time. This mainly goes through the Application creation stage, MainActivity creation and display stage, advertising, homepage top and bottom navigation data loading and rendering, homepage data loading and rendering.

In this process, it is necessary to sort out what work the business layer has done, evaluate the necessity and timing of its execution, and the rationality of the scheduling thread; it is necessary to monitor the status of the main thread to see whether it has fallen into long-term sleep; the main thread message/background Message monitoring and management, whether any tasks that do not meet expectations are triggered; whether system resources are fully utilized, whether idle opportunities are fully utilized for preloading, etc.

Atomization of business functions: In order to schedule tasks in the startup phase in an orderly manner and allocate resources reasonably, a set of task management framework TaskManager was developed, and the business function implementation was packaged in custom tasks, split into enough details, and designed Determine the execution dependencies between Tasks, the threads expected to be scheduled for execution, execution timing, etc., and then hand them over to the TaskManager for unified processing. This layer of task management is the basis for our implementation of startup optimization.

Optimization practice

Open screen home page merge

In the early days, iQiyi APP had two activities: the opening screen and the home page. The two activities brought some user experience problems: the low-end machine lags obviously when the screen is opened and the home page is entered. The home page is not displayed immediately, and the user can see the home page data. And pictures have a loading process from scratch.

Most scenes in the screen-opening phase have screen-opening ads. Merge the screen-opening and homepage into one activity. Use the screen-opening advertising phase to load the homepage below the screen-opening page, and separate the homepage data and UI display for parallel processing. By maximizing this stage, the homepage can be displayed immediately at the end of the boot screen, and lags on low-end machines are significantly improved .

The rendering of the homepage also has a certain impact on the opening of the screen advertisement. The ad countdown display is unstable and the effects of a few types of ads are not smooth. The loading of the home page is divided into many steps to solve the problem of triggering the callback of the advertisement. The countdown display uses surfaceView rendering to ensure its stability.

Task scheduling optimization

Based on the startup type, arrange the task execution sequence in the startup phase to advance, delay, or not execute, so that users can see the target page as early as possible. The following are adjustments made by iQiyi to tasks under the normal startup path of low-end machines.

Resolve lock contention

Native library loading lock competition: The C layer library loading is locked. The Java layer opens multiple threads to load the Lib library. After reaching the C layer, the loading will still be performed in sequence, which will cause the Java layer thread to block and wait. iQIYI needs to load the Lib library in the Application stage, and needs to call the relevant JNI method after the main thread waits for its loading to complete; and when encountering the situation where the playback module pulls up the playback page for external purposes, there is a need to preload playback The related Lib library is required, which will cause the main thread to enter a waiting state. By identifying the target landing page during the startup phase, you can decide whether to perform playback-related Lib library preloading, thereby avoiding the lag of most users who normally start to the home page. (Test machine Redmi K40, Android12 system)

Resource lock competition: Before the iQiyi homepage is displayed, other modules preload layout files in sub-threads, resulting in fierce competition for LayoutInfalter / ResourceManager / AssertsManager layer locks. Schedule the task of preloading the layout to be executed after the homepage is displayed, and restrict the preloaded layout to be executed in the same sub-thread. You can see that the number of lock conflicts is greatly reduced after the improvement, which allows the homepage to be displayed faster.

Baseline Profiles

Baseline Profiles: Google will launch Baseline Profiles in 2022, allowing developers to build customized hotspot code baseline profiles into apk. During APP installation, the system pre-compiles hotspot codes in advance through configuration files. You can skip the steps of interpreting and just-in-time compilation of the code paths included in the runtime to improve the speed of first startup code execution.

Startup Profiles: Startup profiles are a subset of the above-mentioned baseline profiles. Using startup profiles can improve the code layout in the APK's DEX file, thereby further optimizing the included classes and methods. iQiyi APP builds the startup phase code into the same DEX file through the startup configuration file. Using the above two strategies, the first launch speed of iQiyi APP on some models is increased by about 10%.

External link launch optimization

Pulling up external links is also an important way to start. It is usually started by H5, sharing, third-party APP, etc. The difference from normal cold start is that the external links are often not the home page, but a specific target page. The most common scenario for iQiyi is to pull up the play page. If we identify the landing page in advance (application stage), we can adjust the task priority for the landing page. When the external link pulls up the play page, we can identify the play page in advance and associate the player with it. The task is initialized in advance. Through this strategy, low-end on-board external links can speed up broadcasting by about 1.5 seconds.

Fluency optimization

Related introduction

Most pages of iQiyi APP are developed based on the self-developed Card framework. Card framework is a highly reusable UI framework. On the basis of using native code to implement basic UI layout and business logic, the basic container style is controlled through CSS control issued by the backend to achieve the overall reuse of the page block. and dynamic fine-tuning of content styles. It is a solution for us to achieve overall page reuse and local fine-tuning on both ends (Android and IOS). Based on this framework, the smoothness of the pages in the APP is optimized. The Card framework has the following features:

High reusability: Content blocks (Blocks) are composed of controls, rows are composed of multiple blocks; Cards are composed of multiple rows, and the entire list page is composed of multiple Cards. The smallest reusable unit of business is Block.
Highly dynamic: Supports configuring CSS files in the background and dynamically modifying the style of a certain UI (text size, color, rounded corners, etc.).

Optimization practice

Style Native

The dynamics and reusability of the Card page lead to complexity in the layout of the UI. A type of Block needs to be compatible with multiple styles. For example, the four corners of a picture need to be embedded with various types of subscript logic. The subscript types include pure pictures, plain text, pictures + text, and optional Select the medium form. Implementing various styles will result in a large number of views and deep nesting levels, making some pages not smooth enough to slide on low-end machines.

In order to optimize this situation, some Cards with stable business forms were selected, the styles of these Cards were solidified, and the layout was greatly streamlined. This brings about a significant frame rate increase when sliding up and down. For example, in the waterfall flow card, we reduced the number of implemented Views from 40+ to 17, and reduced the layout level from 6 to 2 layers; on different low-end machines , it brought about 10% to 20% sliding Frame rate increased .

View merge drawing

The improvement effect brought by the above layout simplification strategy is obvious, but due to the diversification of business forms, some necessary views cannot be deleted. In order to further reduce the number and level of Views, multiple views in commonly used Block layouts are merged into a custom View, and the View's canvas is used to draw text, pictures, buttons and other style information. This method can effectively reduce the number of Views and nesting levels, but it still needs to handle the click event and press effect of each element. On low-end machines , this strategy can bring about a sliding frame rate increase of about 1~2fps .

Pre-creation & asynchronous loading

Layout pre-creation: Among the three styles of Cards introduced in the picture above, the same block type is used (picture above and below). We preload these commonly used and highly reusable Block layouts into the cache pool during the startup phase, so that the pre-created layouts can be directly used in list sliding, thereby reducing the inflate time in UI drawing.

Layout asynchronous creation : Pre-creation has a good effect on commonly used layouts, but uncommon layouts still account for the majority. This type of layout uses AsyncLayoutInflater to asynchronously create the layout that will appear during the scrolling process, reducing the creation time of the scroll layout in the UI thread. At the same time, it also It can improve the efficiency of RecyclerView prefetching.

RecyclerView prefetching: Many iQiyi APP pages use nested RecyclerView to create horizontal scrolling product forms. Most cards in this form will display 3-5 items on one screen. Different prefetching settings are set according to different forms. Counting (setInitialPrefetchItemCount, default is 2) can reduce the lag when this type of card is exposed.

The main thread reduces the execution of non- UI tasks : It takes time to detect the main thread during the scrolling process. It is found that some non-drawing tasks are executed on the main thread. The UI thread parses JSON, establishes database links, etc. and puts them into asynchronous tasks for execution.

Cold start UI message scheduling

During the cold start process of low-end machines, resource consumption will gradually reach a peak state; there are a large number of UI messages (which need to be executed on the UI thread) and other background tasks that need to be executed. Through interception and buried point analysis, more than 4,000 messages (within 15 seconds) need to be executed on the UI thread at this stage. The execution time of these messages ranges from 1ms to 150ms on low-end machines. When these messages are executed, the system's UI rendering messages will be delayed. On low-end machines, users face problems such as sliding lag and slow click response when the APP is first launched.

To address this problem, our solution is to intercept all messages sent to the UI thread and add them to a custom message queue; then monitor whether the system UI message queue is idle, and when idle, take out messages from the custom queue and redirect them to the system UI message queue execution; in addition, a whitelist mechanism is added to release some high-quality messages. There is a fallback mechanism for handling exceptions.

By scheduling UI messages, the lag of low-end machines during the cold start phase has been significantly improved; through online big data monitoring, the number of frozen frames and dropped frames has been significantly reduced. During the cold start phase , the frame rate is increased by about 8fps .

Performance degradation strategy

On low-end machines, downgrading some effects can effectively reduce lagging in specific scenarios. iQiyi APP has implemented the following downgrade strategy.

Motion effects downgrade: top and bottom navigation motion effects downgraded to static images, playback control motion effects turned off, motion effects simplified for some product functions, etc.

Playback downgrade: strategies such as rolling delayed playback & not starting some scenes.

ViewPager preloading downgrade: disables the preloading of left and right tabs, reducing view drawing time and memory overhead.

Image downgrade: Some page animations are not played, and the images use 565 pixel format.

Loading speed optimization

Related introduction

除了前述的启动优化外，我们还专注于优化一些重要页面，因为这些页面的用户访问频率极高，对它们的优化能够显著提升用户体验。举例来说，搜索是用户经常使用的功能之一，因此我们对搜索行为进行了精细化拆解，并对搜索的每个步骤都进行了优化处理。

Optimization practice

pre-request

Usually, the page rendering process usually starts from the Activity's onCreate method. Then make a network request to obtain the necessary data. After obtaining the specific data, the page is rendered. We also followed this process in the search scenario before.

Is it possible to obtain the required data in advance?

In fact, when the user clicks the search box on the homepage, he already has the parameters required for the network request. Then you can initiate a network request in advance when clicking. The network request and page jump will be performed concurrently, which shortens the network request time. If the machine performance is worse, the longer it takes for the page to click to the onCreate method, and the more time it takes to optimize. The verification time on low-end machines is reduced by about 200ms.

Issued in batches

When the user enters the page, only the first-screen data will be displayed, while the data at the bottom of the page cannot be displayed until the user performs a sliding operation. Therefore, we give priority to ensuring the display of first-screen data. By reducing the size of the first data delivery, we reduce the time required for data acquisition, data transmission, and data analysis. After the first screen data rendering is completed, we will initiate an interface request again to ensure that subsequent data can be presented immediately when the user performs a sliding operation. This solution is used for optimization on many key pages such as the home page, half-play page, and search. For example, when using this solution in the search scenario, the verification time is reduced by about 200ms.

pre-created

Layout pre-creation: When the search intermediate page is idle, we create highly reusable layouts into the cache in advance. When the page is actually rendered, the pre-created layout is directly used to avoid the time of inflating the view. .

Fragment pre-creation: When the search intermediate page is idle, the fragment container of the result page is created in advance. There is no need to create the corresponding container when the result page is displayed, reducing the time required to create the container.

Main thread optimization

When the page is loaded, it is hoped that the main thread tasks related to the page will be executed first as much as possible. If the important task scheduling is preempted, the page rendering effect will be affected. Through our internal self-developed Tracepeed tool, which takes over Looper.loop() of the main thread, we can discover the time-consuming tasks in the main thread. If you find that low-priority tasks that take a long time are executed first, you can adjust the task schedule so that important tasks are executed first.

For example, during the loading process of the search results page, image loading is an important task. However, on low-end machines, it is found that in some cases another task will preempt the main thread, causing the image loading time to sometimes take up to 1 second. For this scenario, adjust the task scheduling so that the image loading task is executed first, and the time consumption stabilizes to 100+ms.

Business logic optimization

For different businesses, we also analyzed the specific business logic and optimized the relevant logic:

Empty image optimization: When the selection is loaded, in some scenarios, the backend will deliver an empty image, and this empty image will also be loaded, which will increase the page loading time. Therefore, we limit the loading of empty images.
High-frequency logic optimization: When optimizing, high-frequency methods are a point that needs to be focused on. By optimizing some commonly used high-frequency methods, the time consumption of each page can be optimized. For example: in the initialization of basic controls, it avoids the execution of useless methods and reduces the time-consuming of such high-frequency methods.
Asynchronous execution of time-consuming methods: During the page loading process, there will also be some business logic with low priority. When these logics are discovered, they can be executed through the asynchronous framework, thereby reducing the page loading time.

Anti-deterioration

After continuous optimization, the page loading time has reached a stable level. However, during the ongoing development iterations, we noticed some increases in time consumption. How to effectively prevent this kind of deterioration from happening?

By embedding key methods in the code, the pipeline tasks are executed regularly every day, and the average value is calculated after multiple executions. Use visual methods to detect fluctuations in page loading. If there is deterioration, you can intuitively find the difference in time-consuming between the two tasks through task comparison, and analyze and optimize the difference in time-consuming.

Summary and Outlook

Low-end machine optimization includes many aspects. Some of the optimization methods introduced above in several core business scenarios prioritize key performance issues and effectively improve the operating performance of low-end machines. Among them, tool analysis, online monitoring, and measurement standards are required. Not mentioned here, these are also important tools for performance optimization. Android is seriously fragmented, and there is a long way to go to optimize low-end phones. In the future, we will continue to refine and find new breakthrough points for optimization, use technological innovation to provide users with a stable and smooth user experience, and promote high-quality growth.

This article is shared from the WeChat public account - iQIYI Technology Product Team (iQIYI-TP).
If there is any infringement, please contact [email protected] for deletion.
This article participates in the " OSC Source Creation Plan ". You who are reading are welcome to join and share together.