iOS performance optimization: Optimize App startup speed

Python combat community

Java combat community

Long press to identify the QR code below, add as required

Scan QR code to follow to add customer service

Enter the Python community▲

Scan QR code to follow to add customer service

Enter the Java community

Author: Damonwong, iOS developers

Source丨Old Driver Technical Weekly (ID: LSJCoding)

Sessions: https://developer.apple.com/videos/play/wwdc2019/423/

Apple is a company that pays special attention to user experience. In the past few years, it has been optimizing the start-up time of App. Especially last year’s WWDC 2019 keynote [1] mentioned that the Apple development team has increased the start-up time by 200% in the past year.

Although it is an increase of 200%, some issues are still not clear, such as:

  • Why is so much time optimized?

  • As developers, what other optimizations can we do for startup speed?

So today we combine WWDC2019-423-Optimizing App Launch [2] to talk about startup related things

Glossary

First introduce some terms related to startup.

Mach-O

Mach-O is a collective name for the file types of executable files in different operating periods of the iOS system. Mainly divided into the following three categories:

  • Executable -executable file, the main binary file in App

  • Dylib -dynamic library, also called DSO or DLL on other platforms

  • Bundle -A type unique to the Apple platform, Dylib that cannot be connected. Can only be loaded via dlopen() at runtime

The basic structure of Mach-O is shown in the figure below, divided into three parts:

  • Header contains the basic information of the Mach-O file, such as CPU architecture, file type, number of loading instructions, etc.

  • Load Commands is the load command area following the Header, which contains the organization structure of the file and the layout in the virtual memory. It knows how to set and load binary data when calling

  • Data contains the data of each segment needed in Load Commands.

Most Mach-O files include the following three segments :

  • __TEXT -Code segment, including header files, codes and constants. Read only and cannot be modified.

  • __DATA -Data segment, including global variables, static variables, etc. Read and write.

  • __LINKEDIT -How to load the program, including method and variable metadata (location, offset), and code signature information. Read only and cannot be modified.

Image

Refers to one of Executable, Dylib or Bundle.

Framework

There are many things called Framework, but in this article, Framework refers to a dylib, and there is a special directory structure around it to save the files needed by the dylib.

Virtual Memory

Virtual memory is an intermediate layer built between physical memory and processes. It is a continuous logical address space, and the logical address may not have a corresponding actual physical memory address, or multiple logical addresses can correspond to one physical memory address.

Page Fault

When a process accesses a logical address that does not correspond to a physical address, a Page Fault will occur

Lazy Reading

If a page you want to read is not in the memory, a Page Fault will be triggered. The system reads the specified page by calling the mmap() function. This process is called Lazy Reading

COW(Copy-On-Write)

When the process needs to modify the content of a certain page, the kernel will first copy the part that needs to be modified, then modify it, and remap the logical address to the new physical memory. This process is called Copy-On-Write

Dirty Page & Clean Page

After the image is loaded, the page whose content has been modified is called Dirty Page, which contains process-specific information. The opposite is called Clean Page, which can be regenerated from disk.

Shared RAM (Share RAM)

When multiple Mach-Os rely on the same Dylib (eg. UIKit), the system will make the logical addresses of these Mach-O calls to Dylib all point to the same physical memory area, thereby realizing memory sharing. Dirty Page is unique to the process and cannot be shared.

Address space layout randomization (ASLR)

When the Image is loaded into the logical address space, the system will use ASLR technology to make the starting address of the Image always random to avoid hackers from finding the address of the function through the starting address + offset

When the system uses ASLR to allocate a random address, the entire range from 0 to the address will be marked as inaccessible, meaning that it cannot be read, written, or executed. This area is the __PAGEZEROsegment, its size in the 32-bit system 4KB +, and 64-bit systems is 4GB +

Code Sign

IOS code signing allows the system to ensure the safety of Image to be loaded, when setting signature Code Sign, page content will generate a single encrypted hash value, and stores the __LINKEDITgo, the system will check when loading each Ensure that the content of the page has not been tampered with.

dyld(dynamic loader)

dyld is a binary loader on iOS for loading Image. Many people think that dyld is only responsible for loading all dynamic link libraries the application depends on. This understanding is wrong. The specific process of dyld work is as follows: Reference: dyld startup process [3]

Load dylibs

Before loading Mach-O, dyld will parse the Header and Load Commands, and then know the dylibs that Mach-O depends on, and so on, recursively load all required dylibs.

Generally speaking, the dylib that an App depends on is around 100-400, most of which are system dylibs, because of caching and sharing, the reading speed is relatively high.

Fix-ups

Because of ASLR and Code Sign, the newly loaded dylibs are in a relatively independent state. In order to bind them together, a Fix-ups process is required. There are two main types of Fix-ups: Rebase and Bind.

PIC(Position Independent Code)

Because of the code signature, dyld cannot directly modify the instructions, but in order to achieve fix-ups at runtime, the dynamic PIC (Position Independent Code) technology is used in code gen to make the code that cannot be modified due to code signature restrictions. Can be loaded onto indirect addresses. When you want to call a method, it will first in __DATAestablishing a pointer to this method segment, then called indirectly achieved through this pointer.

Rebase

Rebase is a process of data correction for the problem of "Mach-O is a random first address when loaded into memory because of ASLR". An offset will be added to the internal pointer address. The offset calculation method is as follows:

  Slide = actual_address - preferred_address

Rebase of all required information has been encoded into the pointer __LINKEDITinside. It is then repeated for the __DATArequired Rebase pointer plus the offset. Page Fault and COW may continue to occur during this process, leading to I/0 performance loss. However, because Rebase deals with continuous addresses, the kernel will read data in advance to reduce I/O consumption.

Binding

Binding is the process of binding the called external symbols. For example, we want to use UITableViewthe symbol _OBJC_CLASS_$_UITableView, but this symbol is not in Mach-O and needs to be obtained from UIKit.framework, so we need to bind this correspondence together through Binding.

At runtime, dyld needs to find the implementation corresponding to the symbol name. This requires a lot of calculations, including looking in the symbol table. Will be found after the corresponding value recorded __DATAin the pointer inside. Although Binding has more calculations than Rebasing, it actually requires few I/O operations because Rebasing has been done before.

dyld2 & dyld3

Before iOS 13, all third-party apps used dyld 2 to start the app. The main process is as follows:

  • Parse Mach-O's Header and Load Commands, find its dependent libraries, and recursively find all dependent libraries

  • Load Mach-O file

  • Symbol search

  • Binding and rebasing

  • Run the initialization program

When all the above process have taken place in the App starts, including a large amount of computation and I / O, so Apple development team in order to speed up the startup speed, in WWDC2017 - 413 - App the Startup Time: Past, Present, and Future [4] on official Proposed dyld3.

dyld3 is divided into three components:

  • An out-of-process MachO parser

    • Pre-processed all search path, @rpaths and environment variables that may affect the startup speed

    • Then analyze the Header and dependencies of Mach-O, and complete all symbol search

    • Finally, these results are created into a startup closure

    • This is an ordinary daemon process, you can use the usual test architecture

  • An in-process engine to run the startup closure

    • This part is handled in the process

    • Verify the safety of the startup closure, then map it to dylib, and then jump to the main function

    • There is no need to resolve Mach-O's Header and dependencies, and no symbol search is required.

  • A startup closure cache service

    • The startup closure of the system App is built in a Shared Cache, we don’t even need to open a separate file

    • For third-party apps, we will build this startup closure when the app is installed or upgraded.

    • In iOS, tvOS, watchOS, all of this is done before the app starts. On macOS, due to the Side Load App, the in-process engine will start a daemon process when it is first started, and then it can be started using the startup closure.

dyld 3 pre-processes many time-consuming search, calculations and I/O in advance, which greatly improves the startup speed.

App launch

After introducing this bunch of nouns, we formally enter the topic.

Why is app activation so important?

  • App startup is the first interaction process with the user, so try to shorten the time of this process and give the user a good first impression

  • Startup represents the overall performance of your code. If the startup performance is not good, the performance of other parts may not be too good.

  • Startup consumes CPU and memory, which affects system performance and battery

So we have to optimize the startup time.

Startup type

App startup types are divided into three categories

  • Cold Launch is cold start, cold start needs to meet the following conditions:

    • After restart

    • App is not in memory

    • No related processes exist

  • Warm Launch is also a hot start, and hot start needs to meet the following conditions:

    • App has just been terminated

    • App has not been completely removed from memory

    • No related processes exist

  • Resume Launch refers to the process of continuation of the suspended App, which needs to meet the following conditions:

    • App is suspended

    • All apps are still in memory

    • There are related processes

App startup phase

App startup is divided into three stages

  • Preparation for initializing the App

  • Draw the first frame of App preparation and drawing (the first frame here is not the first frame after the data is obtained, it can be a placeholder view), at this time the user and the App can already interact, such as tabbar switching

  • After obtaining all the data of the page, the first frame of the page is drawn completely

In this place, Apple once again emphasized that it is recommended that the time between the user clicking the App icon and being able to interact again, that is, the end of the second stage, should be within 400ms . So far, most apps have not reached this goal.

Next, we divide the above three stages into the following six parts, and talk about what these stages have done and what can be optimized.

System Interface

To initialize the app, the system mainly does two things: Load dylibs and libSystem init

In 2017, Apple introduced how many optimizations dyld3 brings to system apps. This year dyld3 was officially developed for developers to use, which means that the iOS system will cache your warm-start runtime. In order to achieve the purpose of reducing startup time. This is one of the reasons for the 200% increase.

The video only says that the hot start time is optimized. In theory, dyld3 should be able to optimize the cold start time for the iOS system, so I don’t know if it’s because of the multitasking function added to the iPad or the reason why all the functions are not open. The author only mentioned that the reason for the hot start is not yet clear.

In addition, during the Load dylibs stage, developers can also make the following optimizations:

  • Avoid linking to useless frameworks. Check in Xcode to see if there are useless links in the "Linked Frameworks and Librares" section of the project.

  • Avoid loading dynamic libraries at startup, and package the Pods of the project in a statically compiled way, especially for Swift projects, where the time loss is very large.

  • Hard link your dependencies, and cache optimization is done here.

Some people may be confused if dyld3 is used. We don't need to do Static Link. In fact, it is still needed. If you are interested, you can read this article on Static linking vs dyld3 [5] . There is a detailed data comparison.

The libSystem init part is mainly to load some low-priority system components. This part of the time is a fixed cost, so we developers do not need to care.

Static Runtime Initializaiton

This stage is mainly to initialize and Swift Runtime Objective-C, it will call all the +loadmethods, class registration information to the runtime.

At this stage, developers are not recommended to do anything in principle, so in order to avoid some loss of startup time, you can do the following things:

  • During framework development, open proprietary initialization API

  • Reduce +loadthings to do

  • Use initializeconduct lazy load initialization

UIKit Initializaiton

Two things are mainly done at this stage:

  • Instantiate UIApplication and UIApplicationDelegate

  • Start event processing and system integration

So the optimization at this stage is relatively simple, you need to do two things:

  • Minimize the work of initializing UIApplication subclasses, and even not subclass UIApplication

  • Reduce UIApplicationDelegate initialization work

Application Initializaiton

This stage is mainly the callback of the life cycle method, which is the part that developers are most familiar with.

Call the App life cycle method of UIApplicationDelegate:

  application:willFinishLaunchingWithOptions: 
  application:didFinishLaunchingWithOptions:

And the UI life cycle method of UIApplicationDelegate:

  applicationDidBecomeActive:

At the same time, iOS 13 adds a new callback for UISceneDelegate:

  scene:willConnectToSession:options:
  sceneWillEnterForeground:
  sceneDidBecomeActive:

Will also be called at this stage. Those who are interested can pay attention to the Session of Getting the Most out of Multitasking. There are no video resources for the time being. It is suspected that the live demo has overturned, so the video resources are not released.

At this stage, the optimization developers can do:

  • Postpone work not related to the start-up

  • Sharing resources between Senens

Fisrt Frame Render

This stage mainly does the work of creating, laying out and drawing the view, and submits the prepared first frame to the rendering layer for rendering. The following functions are frequently called:

 loadView
 viewDidLoad 
 layoutSubviews

At this stage, the optimization developers can do:

  • Reduce the view hierarchy, lazily load some unwanted views

  • Optimize layout and reduce constraints

More details can be from WWDC2018 - High Performance Auto Layout - 220 [6] understanding in

Extend

Most apps will obtain data asynchronously and finally present it to the user. We call this part Extend.

Because the performance of each app in this part is different, Apple recommends that developers use os_signpost to measure and then analyze and optimize slowly.

Measuring App startup time

To find the problem in the startup process, it is necessary to perform multiple measurements and compare before and after. But if the variables are not well controlled, errors will result.

Therefore, in order to ensure that the measured data can truly reflect the problem, we must reduce instability factors and ensure that the measurement is performed in a controllable and similar environment. Finally use consistent results for analysis.

Conditional consistency

In order to ensure consistent environment, we can do the following things:

  • Restart the phone and wait 2-3 minutes

  • Enable airplane mode or use simulated network

  • Do not use or change iCloud account

  • Use release mode to build

  • Measuring hot start time

iColud account switching will affect performance, so do not switch accounts or turn on iCloud.

Points to note when measuring

  • Use representative data for testing as much as possible

    If you do not use representative data for testing, deviations will occur

  • Use different old and new equipment for testing

  • Finally, you can also use XCTest to test, run it several times and take the average result

For information on using XCTest to test the startup time, you can look at WWDC2019-417-Improving Battery Life and Performance [7] , but I tested it and it seems that there are still some APIs that have not been opened yet and cannot be used for the time being.

Use Instruments to analyze and optimize the app startup process

Optimization method

Apple has given us three suggestions for optimization methods. The overall idea is similar to the optimization of the various stages mentioned above.

Minimize Work

  • Postpone work not related to the first frame

  • Remove blocking work from the main thread

  • Reduce memory usage

Prioritize Work

  • Define the priority of the task.

  • Make good use of GCD to optimize your startup speed.

  • Keep important things first

For a more in-depth understanding of the GCD, you can take a look at WWDC2017-706-Modernizing Grand Central Dispatch Usage [8]

Optimize Work

  • Simplify existing tasks, such as requesting only necessary data.

  • Optimize algorithm and data structure

  • Cache resources and calculations

Use Instruments to analyze the app startup process

When we know how to optimize, we need to analyze our startup process. Xcode 11's Instruments has added an App launch template for this, allowing developers to better analyze the startup speed of their App.

After running, you can see the specific time of each stage, optimize according to the data, and see the time-consuming function calls.

System Optimization

Last year Apple made a lot of optimizations, the following highlights are optimizations related to startup speed

But I don't know if it is due to time. There are very few explanations for this part in the session, and it is difficult to understand what 200% did.

But Craig Federighi said in The Talk Show Live From WWDC 2019, With Craig Federighi and Greg Joswiak [9] why the optimization is 200%:

Isn’t that crazy that was quite a discovery for us. No it turns out that over times as in terms of the way the apps were encrypted and the way fair play worked and so forth. The encryption became part of the critical path actually of launching the apps. I mean the processors are capable or up and through the thing that actually it was a problem. And then there are other optimizations that based on what was visible to system at certain things. And so it actually cut out optimization opportunities and so when we really identified that opportunity we said okay. We can actually come up with better format that’s gonna eliminate that being on the critical path, It’s going to enable all these pre-binding things. And then we did a whole bunch of other work to optimize the objective-c runtime to optimize the linker the dynamic linker a bunch of other things and you put it all together. And yeah that I mean a cold launch this is we’ve never had a win like this to launch time in a single release.

From this passage, in addition to the contribution of dyld3, reducing code signing encryption is also one of the optimizations.

Monitor the launch of online user apps

Xcode 11 has added a new monitoring panel in Xcode Organizer, in which you can view user data in multiple dimensions, including the average startup time.

After you have analyzed your startup process through Instruments and made a lot of optimizations, you can use Xcode Organizer to analyze how your optimization effect is.

Of course, you can get some customized data through the new MetricKit [10] released last year, please refer to WWDC2019-417 -Improving Battery Life and Performance [11]

[1]

WWDC 2019 keynote: https://developer.apple.com/videos/play/wwdc2019/101/

[2]

WWDC2019 - 423 - Optimizing App Launch: https://developer.apple.com/videos/play/wwdc2019/423/

[3]

dyld startup process: https://leylfl.github.io/2018/05/28/dyld startup process/

[4]

WWDC2017 - 413 - App Startup Time: Past, Present, and Future: https://developer.apple.com/videos/play/wwdc2017/413/

[5]

Static linking vs dyld3: https://allegro.tech/2018/05/Static-linking-vs-dyld3.html

[6]

WWDC2018 - 220 - High Performance Auto Layout: https://developer.apple.com/videos/play/wwdc2018/220/

[7]

WWDC2019 - 417 - Improving Battery Life and Performance: https://developer.apple.com/videos/play/wwdc2019/417/

[8]

WWDC2017 - 706 - Modernizing Grand Central Dispatch Usage: https://developer.apple.com/videos/play/wwdc2017/706/

[9]

The Talk Show Live From WWDC 2019, With Craig Federighi and Greg Joswiak: https://daringfireball.net/2019/06/the_talk_show_live_from_wwdc_2019

[10]

MetricKit: https://developer.apple.com/documentation/metrickit

[11]

WWDC2019 - 417 -Improving Battery Life and Performance: https://developer.apple.com/videos/play/wwdc2019/417/

程序员专栏 扫码关注填加客服 长按识别下方二维码进群


Recommended recent exciting content:  

 Comparison of programmer income in China, the United States, Japan and India

 A sad day for programmers

 SringMVC from entry to source code, this one is enough

 10 Python visual animations, carefully and beautifully


Watch the good article here to share it with more people↓↓

Guess you like

Origin blog.csdn.net/Px01Ih8/article/details/109252032