Python combat community
Java combat community
Long press to identify the QR code below, add as required
Scan QR code to follow to add customer service
Enter the Python community▲
Scan QR code to follow to add customer service
Enter the Java community ▲
Author: Damonwong, iOS developers
Source丨Old Driver Technical Weekly (ID: LSJCoding)
Sessions: https://developer.apple.com/videos/play/wwdc2019/423/
Apple is a company that pays special attention to user experience. In the past few years, it has been optimizing the start-up time of App. Especially last year’s WWDC 2019 keynote [1] mentioned that the Apple development team has increased the start-up time by 200% in the past year.
Although it is an increase of 200%, some issues are still not clear, such as:
Why is so much time optimized?
As developers, what other optimizations can we do for startup speed?
So today we combine WWDC2019-423-Optimizing App Launch [2] to talk about startup related things
Glossary
First introduce some terms related to startup.
Mach-O
Mach-O is a collective name for the file types of executable files in different operating periods of the iOS system. Mainly divided into the following three categories:
Executable -executable file, the main binary file in App
Dylib -dynamic library, also called DSO or DLL on other platforms
Bundle -A type unique to the Apple platform, Dylib that cannot be connected. Can only be loaded via dlopen() at runtime
The basic structure of Mach-O is shown in the figure below, divided into three parts:
Header contains the basic information of the Mach-O file, such as CPU architecture, file type, number of loading instructions, etc.
Load Commands is the load command area following the Header, which contains the organization structure of the file and the layout in the virtual memory. It knows how to set and load binary data when calling
Data contains the data of each segment needed in Load Commands.
Most Mach-O files include the following three segments :
__TEXT -Code segment, including header files, codes and constants. Read only and cannot be modified.
__DATA -Data segment, including global variables, static variables, etc. Read and write.
__LINKEDIT -How to load the program, including method and variable metadata (location, offset), and code signature information. Read only and cannot be modified.
Image
Refers to one of Executable, Dylib or Bundle.
Framework
There are many things called Framework, but in this article, Framework refers to a dylib, and there is a special directory structure around it to save the files needed by the dylib.
Virtual Memory
Virtual memory is an intermediate layer built between physical memory and processes. It is a continuous logical address space, and the logical address may not have a corresponding actual physical memory address, or multiple logical addresses can correspond to one physical memory address.
Page Fault
When a process accesses a logical address that does not correspond to a physical address, a Page Fault will occur
Lazy Reading
If a page you want to read is not in the memory, a Page Fault will be triggered. The system reads the specified page by calling the mmap() function. This process is called Lazy Reading
COW(Copy-On-Write)
When the process needs to modify the content of a certain page, the kernel will first copy the part that needs to be modified, then modify it, and remap the logical address to the new physical memory. This process is called Copy-On-Write
Dirty Page & Clean Page
After the image is loaded, the page whose content has been modified is called Dirty Page, which contains process-specific information. The opposite is called Clean Page, which can be regenerated from disk.
Shared RAM (Share RAM)
When multiple Mach-Os rely on the same Dylib (eg. UIKit), the system will make the logical addresses of these Mach-O calls to Dylib all point to the same physical memory area, thereby realizing memory sharing. Dirty Page is unique to the process and cannot be shared.
Address space layout randomization (ASLR)
When the Image is loaded into the logical address space, the system will use ASLR technology to make the starting address of the Image always random to avoid hackers from finding the address of the function through the starting address + offset
When the system uses ASLR to allocate a random address, the entire range from 0 to the address will be marked as inaccessible, meaning that it cannot be read, written, or executed. This area is the __PAGEZERO
segment, its size in the 32-bit system 4KB +, and 64-bit systems is 4GB +
Code Sign
IOS code signing allows the system to ensure the safety of Image to be loaded, when setting signature Code Sign, page content will generate a single encrypted hash value, and stores the __LINKEDIT
go, the system will check when loading each Ensure that the content of the page has not been tampered with.
dyld(dynamic loader)
dyld is a binary loader on iOS for loading Image. Many people think that dyld is only responsible for loading all dynamic link libraries the application depends on. This understanding is wrong. The specific process of dyld work is as follows: Reference: dyld startup process [3]
Load dylibs
Before loading Mach-O, dyld will parse the Header and Load Commands, and then know the dylibs that Mach-O depends on, and so on, recursively load all required dylibs.
Generally speaking, the dylib that an App depends on is around 100-400, most of which are system dylibs, because of caching and sharing, the reading speed is relatively high.
Fix-ups
Because of ASLR and Code Sign, the newly loaded dylibs are in a relatively independent state. In order to bind them together, a Fix-ups process is required. There are two main types of Fix-ups: Rebase and Bind.
PIC(Position Independent Code)
Because of the code signature, dyld cannot directly modify the instructions, but in order to achieve fix-ups at runtime, the dynamic PIC (Position Independent Code) technology is used in code gen to make the code that cannot be modified due to code signature restrictions. Can be loaded onto indirect addresses. When you want to call a method, it will first in __DATA
establishing a pointer to this method segment, then called indirectly achieved through this pointer.
Rebase
Rebase is a process of data correction for the problem of "Mach-O is a random first address when loaded into memory because of ASLR". An offset will be added to the internal pointer address. The offset calculation method is as follows:
Slide = actual_address - preferred_address
Rebase of all required information has been encoded into the pointer __LINKEDIT
inside. It is then repeated for the __DATA
required Rebase pointer plus the offset. Page Fault and COW may continue to occur during this process, leading to I/0 performance loss. However, because Rebase deals with continuous addresses, the kernel will read data in advance to reduce I/O consumption.
Binding
Binding is the process of binding the called external symbols. For example, we want to use UITableView
the symbol _OBJC_CLASS_$_UITableView
, but this symbol is not in Mach-O and needs to be obtained from UIKit.framework, so we need to bind this correspondence together through Binding.
At runtime, dyld needs to find the implementation corresponding to the symbol name. This requires a lot of calculations, including looking in the symbol table. Will be found after the corresponding value recorded __DATA
in the pointer inside. Although Binding has more calculations than Rebasing, it actually requires few I/O operations because Rebasing has been done before.
dyld2 & dyld3
Before iOS 13, all third-party apps used dyld 2 to start the app. The main process is as follows:
Parse Mach-O's Header and Load Commands, find its dependent libraries, and recursively find all dependent libraries
Load Mach-O file
Symbol search
Binding and rebasing
Run the initialization program
When all the above process have taken place in the App starts, including a large amount of computation and I / O, so Apple development team in order to speed up the startup speed, in WWDC2017 - 413 - App the Startup Time: Past, Present, and Future [4] on official Proposed dyld3.
dyld3 is divided into three components:
An out-of-process MachO parser
-
Pre-processed all search path, @rpaths and environment variables that may affect the startup speed
Then analyze the Header and dependencies of Mach-O, and complete all symbol search
Finally, these results are created into a startup closure
This is an ordinary daemon process, you can use the usual test architecture
An in-process engine to run the startup closure
-
This part is handled in the process
Verify the safety of the startup closure, then map it to dylib, and then jump to the main function
There is no need to resolve Mach-O's Header and dependencies, and no symbol search is required.
A startup closure cache service
-
The startup closure of the system App is built in a Shared Cache, we don’t even need to open a separate file
For third-party apps, we will build this startup closure when the app is installed or upgraded.
In iOS, tvOS, watchOS, all of this is done before the app starts. On macOS, due to the Side Load App, the in-process engine will start a daemon process when it is first started, and then it can be started using the startup closure.
dyld 3 pre-processes many time-consuming search, calculations and I/O in advance, which greatly improves the startup speed.
App launch
After introducing this bunch of nouns, we formally enter the topic.
Why is app activation so important?
App startup is the first interaction process with the user, so try to shorten the time of this process and give the user a good first impression
Startup represents the overall performance of your code. If the startup performance is not good, the performance of other parts may not be too good.
Startup consumes CPU and memory, which affects system performance and battery
So we have to optimize the startup time.
Startup type
App startup types are divided into three categories
Cold Launch is cold start, cold start needs to meet the following conditions:
-
After restart
App is not in memory
No related processes exist
Warm Launch is also a hot start, and hot start needs to meet the following conditions:
-
App has just been terminated
App has not been completely removed from memory
No related processes exist
Resume Launch refers to the process of continuation of the suspended App, which needs to meet the following conditions:
-
App is suspended
All apps are still in memory
There are related processes
App startup phase
App startup is divided into three stages
Preparation for initializing the App
Draw the first frame of App preparation and drawing (the first frame here is not the first frame after the data is obtained, it can be a placeholder view), at this time the user and the App can already interact, such as tabbar switching
After obtaining all the data of the page, the first frame of the page is drawn completely
In this place, Apple once again emphasized that it is recommended that the time between the user clicking the App icon and being able to interact again, that is, the end of the second stage, should be within 400ms . So far, most apps have not reached this goal.
Next, we divide the above three stages into the following six parts, and talk about what these stages have done and what can be optimized.
System Interface
To initialize the app, the system mainly does two things: Load dylibs and libSystem init
In 2017, Apple introduced how many optimizations dyld3 brings to system apps. This year dyld3 was officially developed for developers to use, which means that the iOS system will cache your warm-start runtime. In order to achieve the purpose of reducing startup time. This is one of the reasons for the 200% increase.
The video only says that the hot start time is optimized. In theory, dyld3 should be able to optimize the cold start time for the iOS system, so I don’t know if it’s because of the multitasking function added to the iPad or the reason why all the functions are not open. The author only mentioned that the reason for the hot start is not yet clear.
In addition, during the Load dylibs stage, developers can also make the following optimizations:
Avoid linking to useless frameworks. Check in Xcode to see if there are useless links in the "Linked Frameworks and Librares" section of the project.
Avoid loading dynamic libraries at startup, and package the Pods of the project in a statically compiled way, especially for Swift projects, where the time loss is very large.
Hard link your dependencies, and cache optimization is done here.
Some people may be confused if dyld3 is used. We don't need to do Static Link. In fact, it is still needed. If you are interested, you can read this article on Static linking vs dyld3 [5] . There is a detailed data comparison.
The libSystem init part is mainly to load some low-priority system components. This part of the time is a fixed cost, so we developers do not need to care.
Static Runtime Initializaiton
This stage is mainly to initialize and Swift Runtime Objective-C, it will call all the +load
methods, class registration information to the runtime.
At this stage, developers are not recommended to do anything in principle, so in order to avoid some loss of startup time, you can do the following things:
During framework development, open proprietary initialization API
Reduce
+load
things to doUse
initialize
conduct lazy load initialization
UIKit Initializaiton
Two things are mainly done at this stage:
Instantiate UIApplication and UIApplicationDelegate
Start event processing and system integration
So the optimization at this stage is relatively simple, you need to do two things:
Minimize the work of initializing UIApplication subclasses, and even not subclass UIApplication
Reduce UIApplicationDelegate initialization work
Application Initializaiton
This stage is mainly the callback of the life cycle method, which is the part that developers are most familiar with.
Call the App life cycle method of UIApplicationDelegate:
application:willFinishLaunchingWithOptions:
application:didFinishLaunchingWithOptions:
And the UI life cycle method of UIApplicationDelegate:
applicationDidBecomeActive:
At the same time, iOS 13 adds a new callback for UISceneDelegate:
scene:willConnectToSession:options:
sceneWillEnterForeground:
sceneDidBecomeActive:
Will also be called at this stage. Those who are interested can pay attention to the Session of Getting the Most out of Multitasking. There are no video resources for the time being. It is suspected that the live demo has overturned, so the video resources are not released.
At this stage, the optimization developers can do:
Postpone work not related to the start-up
Sharing resources between Senens
Fisrt Frame Render
This stage mainly does the work of creating, laying out and drawing the view, and submits the prepared first frame to the rendering layer for rendering. The following functions are frequently called:
loadView
viewDidLoad
layoutSubviews
At this stage, the optimization developers can do:
Reduce the view hierarchy, lazily load some unwanted views
Optimize layout and reduce constraints
More details can be from WWDC2018 - High Performance Auto Layout - 220 [6] understanding in
Extend
Most apps will obtain data asynchronously and finally present it to the user. We call this part Extend.
Because the performance of each app in this part is different, Apple recommends that developers use os_signpost to measure and then analyze and optimize slowly.
Measuring App startup time
To find the problem in the startup process, it is necessary to perform multiple measurements and compare before and after. But if the variables are not well controlled, errors will result.
Therefore, in order to ensure that the measured data can truly reflect the problem, we must reduce instability factors and ensure that the measurement is performed in a controllable and similar environment. Finally use consistent results for analysis.
Conditional consistency
In order to ensure consistent environment, we can do the following things:
Restart the phone and wait 2-3 minutes
Enable airplane mode or use simulated network
Do not use or change iCloud account
Use release mode to build
Measuring hot start time
iColud account switching will affect performance, so do not switch accounts or turn on iCloud.
Points to note when measuring
Use representative data for testing as much as possible
If you do not use representative data for testing, deviations will occur
Use different old and new equipment for testing
Finally, you can also use XCTest to test, run it several times and take the average result
For information on using XCTest to test the startup time, you can look at WWDC2019-417-Improving Battery Life and Performance [7] , but I tested it and it seems that there are still some APIs that have not been opened yet and cannot be used for the time being.
Use Instruments to analyze and optimize the app startup process
Optimization method
Apple has given us three suggestions for optimization methods. The overall idea is similar to the optimization of the various stages mentioned above.
Minimize Work
Postpone work not related to the first frame
Remove blocking work from the main thread
Reduce memory usage
Prioritize Work
Define the priority of the task.
Make good use of GCD to optimize your startup speed.
Keep important things first
For a more in-depth understanding of the GCD, you can take a look at WWDC2017-706-Modernizing Grand Central Dispatch Usage [8]
Optimize Work
Simplify existing tasks, such as requesting only necessary data.
Optimize algorithm and data structure
Cache resources and calculations
Use Instruments to analyze the app startup process
When we know how to optimize, we need to analyze our startup process. Xcode 11's Instruments has added an App launch template for this, allowing developers to better analyze the startup speed of their App.
After running, you can see the specific time of each stage, optimize according to the data, and see the time-consuming function calls.
System Optimization
Last year Apple made a lot of optimizations, the following highlights are optimizations related to startup speed
But I don't know if it is due to time. There are very few explanations for this part in the session, and it is difficult to understand what 200% did.
But Craig Federighi said in The Talk Show Live From WWDC 2019, With Craig Federighi and Greg Joswiak [9] why the optimization is 200%:
Isn’t that crazy that was quite a discovery for us. No it turns out that over times as in terms of the way the apps were encrypted and the way fair play worked and so forth. The encryption became part of the critical path actually of launching the apps. I mean the processors are capable or up and through the thing that actually it was a problem. And then there are other optimizations that based on what was visible to system at certain things. And so it actually cut out optimization opportunities and so when we really identified that opportunity we said okay. We can actually come up with better format that’s gonna eliminate that being on the critical path, It’s going to enable all these pre-binding things. And then we did a whole bunch of other work to optimize the objective-c runtime to optimize the linker the dynamic linker a bunch of other things and you put it all together. And yeah that I mean a cold launch this is we’ve never had a win like this to launch time in a single release.
From this passage, in addition to the contribution of dyld3, reducing code signing encryption is also one of the optimizations.
Monitor the launch of online user apps
Xcode 11 has added a new monitoring panel in Xcode Organizer, in which you can view user data in multiple dimensions, including the average startup time.
After you have analyzed your startup process through Instruments and made a lot of optimizations, you can use Xcode Organizer to analyze how your optimization effect is.
Of course, you can get some customized data through the new MetricKit [10] released last year, please refer to WWDC2019-417 -Improving Battery Life and Performance [11]
[1]
WWDC 2019 keynote: https://developer.apple.com/videos/play/wwdc2019/101/
[2]WWDC2019 - 423 - Optimizing App Launch: https://developer.apple.com/videos/play/wwdc2019/423/
[3]dyld startup process: https://leylfl.github.io/2018/05/28/dyld startup process/
[4]WWDC2017 - 413 - App Startup Time: Past, Present, and Future: https://developer.apple.com/videos/play/wwdc2017/413/
[5]Static linking vs dyld3: https://allegro.tech/2018/05/Static-linking-vs-dyld3.html
[6]WWDC2018 - 220 - High Performance Auto Layout: https://developer.apple.com/videos/play/wwdc2018/220/
[7]WWDC2019 - 417 - Improving Battery Life and Performance: https://developer.apple.com/videos/play/wwdc2019/417/
[8]WWDC2017 - 706 - Modernizing Grand Central Dispatch Usage: https://developer.apple.com/videos/play/wwdc2017/706/
[9]The Talk Show Live From WWDC 2019, With Craig Federighi and Greg Joswiak: https://daringfireball.net/2019/06/the_talk_show_live_from_wwdc_2019
[10]MetricKit: https://developer.apple.com/documentation/metrickit
[11]WWDC2019 - 417 -Improving Battery Life and Performance: https://developer.apple.com/videos/play/wwdc2019/417/
程序员专栏 扫码关注填加客服 长按识别下方二维码进群
Recommended recent exciting content:
Comparison of programmer income in China, the United States, Japan and India
SringMVC from entry to source code, this one is enough
10 Python visual animations, carefully and beautifully
Watch the good article here to share it with more people↓↓