Optimization practice of Baidu APP iOS package size 50M (7) Compiler optimization

I. Introduction

The first six articles in the Baidu APP iOS package volume optimization series focus on the overall package volume optimization plan, image optimization, resource optimization, code optimization, useless class optimization, HEIC image optimization practice and useless method cleaning. Image optimization is to start from useless images. In-depth optimization is done from three perspectives: , Asset Catalog and HEIC format; resource optimization includes large resource optimization, useless configuration files and duplicate resource optimization; code optimization includes useless class optimization, useless module slimming, useless method slimming, duplicate code streamlining, and tool class slimming. and AB experimental curing. This article focuses on compiler optimization. In Baidu APP practice, compiler optimization includes GCC language compilation optimization, Swift compilation optimization, LTO optimization, stripping debugging symbols, stripping symbol tables, removing unreferenced code, Asset optimization, and C++ virtual function optimization. and third-party SDK compiler direction slimming down. In addition, we focused on instruction set architecture optimization, XCode upgrade optimization and Swift built-in dynamic library optimization. The basic principles of these three module optimizations all involve the compiler, so we introduce them together in this chapter.

Review of Baidu APP iOS package volume optimization practice series of articles:

1, "Baidu APP iOS terminal package size 50M optimization practice (1) Overview"

2, "Baidu APP iOS terminal package size 50M optimization practice (2) Image optimization"

3, "Baidu APP iOS terminal package size 50M optimization practice (3) Resource optimization"

4, "Baidu APP iOS terminal package size 50M optimization practice (4) code optimization"

5, "Baidu APP iOS terminal package size 50M optimization practice (5) useless class optimization and HEIC image optimization practice"

6, "Baidu APP iOS package size 50M optimization practice (6) Useless method cleaning"

2. Compiler optimization

2.1 Program Overview

picture

2.2 GCC language compilation optimization

2.2.1 Overview

Through GCC compilation and optimization, smaller binary products are produced, which is effective for OC, C, and C++.

2.2.2 Objective C++ compilation optimization

For Objective C++, use XCode to edit and compile. The compilation optimization configuration path is: Build Settings -> Apple Clang - Code Generation -> Optimize Level. The optional parameters are as follows:

picture

picture

The default optimization level of Xcode is -Os, but we use the -Oz optimization method. In "What's New in Clang and LLVM" at WWDC 2019, link address: https://developer.apple.com/videos/play/wwdc2019/409/ a> , the principle of this optimization is introduced in detail. It reduces code size by identifying identical code sequences across functions in a compilation unit. Repeated continuous machine instructions are externally linked into functions, and the original code sequence is replaced by an externally connected function to achieve the slimming down of the same machine code, but it will increase the depth of the function call stack, so it will have a certain impact on performance. As time goes by, and the hardware configuration of iPhone devices becomes higher and higher, this performance loss is affordable. Below is an official demo example to illustrate the principle of Oz optimization. The hasse function and the kakutani function have the same machine instructions. -Oz optimization will generate the OUTLINED_FUNCTION_O function, and hasse and kakutani point to this function, thus reducing the package size.

picture

picture

The official benefit is 25%. From the practical effect, the compilation optimization parameter -Oz has a 10% volume benefit for code written in Objective C++, and a 30% volume benefit for C&C++.

2.2.3 C&C++ compilation optimization

On the iOS side, many low-level modules are implemented using C and C++, such as network libraries, playback kernels, visual processing, and end intelligence. At the same time, these modules also support multiple platforms such as Android and iOS. In order to achieve cross-platform, these modules usually use the two compilation tools Cmake and GN. Cmake is a common cross-platform compilation tool. Its main working method is to generate the corresponding project file by reading the instructions in the CMakeLists.txt file. The GN compilation tool is the abbreviation of Generate Ninja, which is a compilation tool that replaces Cmake. It is open sourced by Google and written in C++. It mainly implements cross-compilation and can specify the output platform target.

Whether using CMake or GN, the compiler's optimization configuration is the same. For the C++ language, the cppFlags option is set to "-Oz", and for the C language, the cFlags option is set to "-Oz".

2.3 Swift compilation optimization

Swift compilation optimization has two parameters, Optimization Level and Compliation Mode. The configuration path is: Build Settings -> Swift Compiler - Code Generation.

picture

Optimization Level optional parameter values ​​are as follows:

picture

The core principle of Optimize for Size is the same as the -Oz optimization principle in the GCC language compilation optimization introduced earlier. They both reduce the size of the compiled product by outlining and reusing repeated consecutive machine instructions. However, this optimization method will also have a certain impact on performance, but under current hardware conditions, this impact can be ignored.

Compliation Mode optional parameter values ​​are as follows:

picture

Optimize for Size[-Osize] and Whole Module will be turned on at the same time to achieve the best results. From practice, it can be seen that it will reduce the size of the swift package by 10%.

2.4 LTO optimization

LTO, or Link Time Optimization, is an optimization strategy officially proposed by Apple. According to the official explanation, LTO is an optimization method for the entire program code. It is a cross-module optimization during the link phase in the LLVM compiler. Through this optimization, the compiler can inline some functions, remove redundant code that has not been called, and perform overall optimization to make the program run faster. These optimization measures can effectively reduce the code size of the program and improve the program. effectiveness.

The configuration path is: Build Settings -> Apple Clang - Code Generation -> Link-Time Optimization. The setting value is Incremental. It needs to be turned on in both the main project and the Framework to be optimized.

picture

The optimization effect of LTO is reflected in the following three aspects:

1. Function inlining:

LTO can inline some functions, that is, embed the function call code directly into the call point at compile time to reduce the overhead of function calls. This can improve program execution efficiency.

2. Remove useless code:

LTO can identify and remove useless code in the program, such as unused variables, functions, and classes. This can reduce the size of the generated binary files, thereby making the program load faster and run more efficiently.

3. Global optimization effect:

LTO performs global optimization on the program and can identify and optimize code branches that are impossible to execute in the program. For example, if a certain branch of an if statement will never be executed, LTO will remove it from the generated binary file, which can improve program execution efficiency and code quality.

Negative impacts of LTO include:

1. Reduce the readability of Link Map:

Link Map is a file generated by the linker, which describes the link relationship between target files. When using LTO, due to global optimization, the class names in the generated Link Map may start with a number, such as 0.arm64.thinlto.o, which makes the Link Map significantly less readable. If you need to read the Link Map, you need to close LTO first.

2. Increase compilation and linking time:

Enabling LTO will cause the compilation and linking process to become more time-consuming. This is because during the link phase, LTO performs a large number of global optimizations, which requires more computing resources and time. For online packaging or offline compilation, this will lead to longer time consumption.

2.5 Strip debugging symbols

picture

Symbols Hidden by Default is used to set the default visibility of symbols. If set to YES, XCode will define all symbols as "private extern" and the package size will be slightly reduced. Dynamic library is set to NO, otherwise there will be link errors.

2.6 Strip symbol table

The configuration path is: Build Settings -> Strip Linked Product, and the selected attribute value is YES.

picture

Strip Linked Product to remove unnecessary symbol information. After removing the symbol information, we can only use the dSYM file for symbolization, so we need to change the "Debug Information Format" to "DWARF with dSYM file".

picture

The principle of Strip Debug Symbols During Copy is similar to that of Strip Linked Product. It mainly removes the symbol table of the third-party library copied to the project. Just set it to "YES" in Release mode and "NO" in debug mode, otherwise you cannot debug third-party libraries with symbolic breakpoints.

picture

2.7 Eliminate unreferenced code

The configuration path is: Build Settings -> Dead Code Stripping, and the selected attribute value is YES.

picture

This optimization is mainly to remove useless code of static languages ​​​​such as C, C++, and Swift from the installation package during linking, but it is ineffective when processing Objective-C because Objective-C is a dynamic language and is compiled based on the Runtime mechanism. Static compilation is determined to be useless. Code may be used at runtime.

2.8 Asset optimization

The Asset compilation optimization configuration path is: Build Settings -> Asset Catalog Compiler -> Optimization.

picture

Optimization optional parameter values ​​are as follows:

picture

Selecting Space can optimize the package size to a certain extent, but the benefits are small.

2.9 C++ reduces the use of virtual functions

Reducing the use of virtual functions can actually reduce the space occupied by the virtual function table, thereby reducing the size of the package. The virtual function table is a data structure used to implement dynamic binding, which stores pointers to virtual functions of a class. Therefore, reducing the use of virtual functions reduces the number of these pointers, thereby reducing the size of the virtual function table and ultimately the package size.

2.10 Third-party SDK compiler slimming down

The compiler configuration method and its optimization principles have been introduced in detail above, but just modifying the optimization settings of the main project is not enough to achieve the best results. In order to achieve the best optimization effect, each framework must be adjusted accordingly according to the above configuration. This means that optimizations need to be enabled in each framework's build configuration and compiler parameters set to appropriate values ​​to achieve the specific optimization effects required. At the same time, you also need to ensure that the libraries and dependencies used by each framework are also configured correctly to ensure that they work properly with compiler optimizations. In summary, in order for compiler optimizations to really work, each framework needs to be configured and fine-tuned as necessary.

As a flagship application, Baidu APP integrates many third-party SDKs, such as Baidu Maps, Baidu Netdisk, Du Xiaoman, etc. Therefore, these third-party SDK business parties need to be pushed to optimize their compilers to achieve application slimming. These optimizations can include but are not limited to image optimization, resource optimization, code optimization, etc. Through these optimization measures, the size of the application can be effectively reduced and its performance improved, giving users a better experience.

3. Instruction set architecture optimization

3.1 Common instruction set architecture of iPhone

iPhone phones all use low-power arm processors. The arm instruction set architecture is divided into four types: armv6, armv7, armv7s and arm64. They remain backward compatible. For example, the device iphone13 supports arm64, but it also supports armv7, but armv7 Unable to take advantage of the better hardware attributes of the iPhone 13 device. The simulator cannot run the arm instruction set. It runs the x86 instruction set. The 32-bit processor supports the I386 instruction set. The 64-bit simulator supports the x86_64 architecture. The instruction set architecture supported by different devices is as follows.

picture

As hardware devices continue to be updated, the market share of early devices (such as iPhone4, iPhone5 and iPad) has become insignificant. Therefore, for mobile devices, we only need to support the arm64 architecture. Similarly, for the simulator, we only need to support the x86_64 architecture. From the perspective of package volume optimization, currently each of our libraries only needs to support the arm64 and x86_64 architectures, and there is no need to support other architectures.

Optimizing the instruction set architecture can reduce the size of packages uploaded to the AppStore, but it has no optimization effect on the size of packages downloaded by users. This is because Apple's App Thinning mechanism generates different compiled products based on the hardware architecture of different device models, so the packages downloaded from the AppStore by users of different devices will also be different.

3.2 Instruction set architecture settings

  • Architectures option, Build Settings -> Architectures, the value is Standard architectures - $(ARCHS_STANDARD), which is actually (armv7 and arm64) when compiled on the real machine, and (x86_64, i386, arm64) when compiled on the simulator.

  • Build Active Architectures Only option, Build Settings -> Build Active Architectures Only, when its value is Yes, it means that only the current architecture will be compiled. The real machine is usually arm64, and the simulator is x86_64. If it is No, that is Compile the first supported architecture at the same time;

  • Excluded Architectures option, Build Settings -> Excluded Architectures, its value is the architecture to be excluded. For example, if it is set to arm64, it means that there is no arm64 architecture in the product;

picture

3.3 Remove useless architecture

Use the lipo command to split the specified architecture binary file from the mach-o file in the old framework, then merge it, and finally replace the mach-o file of the old framework with the merged binary file.

  • Use the lipo -info command to view the instruction set architecture information contained in the framework, as shown below. The instruction sets supported by AbcSDK.framework are x86_64, i386, arm64 and armv7;
 lipo -info AbcSDK.framework/AbcSDK Architectures in the fat file: AbcSDK.framework/AbcSDK are: x86_64 i386 arm64 armv7

  • The lipo command extracts the specified architecture, as shown below. The arm64 architecture is extracted from AbcSDK.framework and placed in AbcArm64. The x86_64 architecture is extracted and placed in AbcArmX86_64.
lipo AbcSDK.framework/AbcSDK -thin arm64 -output AbcArm64lipo AbcSDK.framework/AbcSDK -thin x86_64 -output AbcArmX86_64

Verify AbcArm64 and AbcArmX86_64 architecture information

lipo -info AbcArm64Non-fat file: AbcArm64 is architecture: arm64lipo -info AbcArmX86_64Non-fat file: AbcArmX86_64 is architecture: x86_64

  • The lipo command merges the architectures, merges AbcArm64 and AbcArmX86_64, and generates a new newAbc. As expected, newAbc has two architectures, x86_64 and arm64. Use the lipo -info command to verify.
// 合成x86_64和arm64lipo -create AbcArm64 AbcArmX86_64 -output newAbclipo -info newAbcArchitectures in the fat file: newAbc are: x86_64 arm64
  • Replace the original binary file. After the above operations, the AbcSDK.framework with four instruction set architectures of x86_64, i386, arm64 and armv7 will be slimmed down into a component that only supports two instruction sets: x86_64 and arm64.
mv -f newAbc AbcSDK.framework/AbcSDK

4. XCode upgrade and optimization

Apple has always been committed to improving developer productivity, launching a new version of XCode every year and making a lot of optimizations to it. They've also taken proactive steps when it comes to package volume. For example, Xcode 14, released in October 22, not only has new enhancements, but also has more powerful parallel compilation capabilities, which can significantly improve project construction speed. At the same time, the optimization of package volume is also quite obvious.

In order to find the specific technical points of optimizing package size in Xcode 14, I read a lot of WWCD information and finally found the answer in the official document "Improve app size and runtime performance", the link address is https://developer.apple.com/videos/play/wwdc2022/110363/#, Xcode 14 has optimized the package size from the following three aspects:

  • Meesage send function call usage is reduced from 12 bytes to 8 bytes;

  • Retain and release function call usage is reduced from 8 bytes to 4 bytes;

  • Autorelease optimization, remove the mov instruction in autorelease omission, and reduce the size by 4 bytes;

5. Swift built-in dynamic library optimization

Since its release at WWDC in 2014, the Swift language has achieved significant development, driven by Apple. Its advantages represent the development trend of iOS development. As usage continues to increase, Swift is expected to eventually replace Objective-C as the preferred language for iOS development. Currently, Swift has become a must-have development language for major companies and applications. Among the top 20 domestic apps with daily active users, except for Pinduoduo, all other companies have adopted Swift for development.

However, as soon as you start developing in Swift language, you will find that the Swift system library has been added to the iPA package. This is because there is no built-in Swift system library for systems lower than iOS12.2, so XCode will include the Swift library when packaging and generating the iPA package. The following Swift system library is found in the Frameworks dynamic library directory of the iPA package.

picture

Furthermore, if the WatchApp that comes with the APP also uses the Swift language, then there will be a Swift system library in the Watch's dynamic library, so the iPA package will contain two built-in libraries.

The optimization method is very simple, just change the minimum version supported by the APP to 12.2. Because systems 12.2 and above come with the Swift system library, there is no need to build it into the APP. In the practice of Baidu APP package size optimization, it was found that after optimization, the iPA package size was reduced by 30M+, and the 30M dynamic library no longer existed in the iPA package. After submitting the AppStore, looking at the data from the connect background, there are the following benefits:

  • For iPhoneX and below models, such as iPhoneX, iPhone8, and iPhone7, the installation package size is reduced by 20M, and the download package size is reduced by 10M;

  • Models above iPhone

For Baidu APP, iPhoneX and below models account for less than 5%, but increasing the minimum version number supported by the APP will lead to the loss of some users. Taking these two factors into consideration, we decided not to use the optimization solution of Swift's built-in dynamic library.

6. Summary

Compared with code optimization, resource optimization and image optimization, compiler optimization has the highest return on investment (ROI) in package volume optimization. However, compiler optimizations also have the largest impact, because compiler configuration modifications for each library affect all code for that library. Therefore, optimization quality must be strictly controlled. During the optimization practice of Baidu APP, the optimization in the compiler direction successfully reduced the package size by 30M, realizing all the benefits of its own library. In addition, the top 15 third-party SDKs ranked by volume also achieved this benefit.

This article systematically introduces Baidu APP's compiler optimization solutions, including GCC language compilation optimization, Swift compilation optimization, LTO optimization, stripping debugging symbols, stripping symbol tables, removing unreferenced code, Asset optimization, C++ virtual function optimization and third-party SDK compilation Various methods such as weight loss in the direction of the device. In addition, other optimization solutions such as instruction set architecture optimization, XCode upgrade optimization, and Swift built-in dynamic library optimization are also introduced. We will introduce its principles and implementation in detail for other optimizations in the future, so stay tuned.

——END——

References:

[1]gcc compiler configuration:https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

[2]LTO使用方法:https://llvm.org/docs/LinkTimeOptimization.html

[3]XCode :https://developer.apple.com/library/archive/documentation/DeveloperTools/Reference/XcodeBuildSettingRef/1-Build_Setting_Reference/build_setting_ref.html#//apple_ref/doc/uid/TP40003931-CH3-SW102

[4]What's New in Clang and LLVM:https://developer.apple.com/videos/play/wwdc2019/409/

[5]XCode14介绍:https://developer.apple.com/documentation/xcode-release-notes/xcode-14-release-notes

[6]improve app size and runtime performance:https://developer.apple.com/videos/play/wwdc2022/110363/#

Recommended reading:

Baidu search content HTAP table storage system

In the era of big models, what does the Baidu developer platform that “everyone can do AI” look like?

Hundreds of thousands of QPS, Baidu's stability guarantee practice for hot event search

Baidu search trillion-scale feature calculation system practice

Support OC code reconstruction practice through Python script (3): Adaptation of data item use module to access data path

Tang Xiaoou, founder of SenseTime, passed away at the age of 55 In 2023, PHP stagnated Wi-Fi 7 will be fully available in early 2024 Debut, 5 times faster than Wi-Fi 6 Hongmeng system is about to become independent, and many universities have set up “Hongmeng classes” Zhihui Jun’s startup company refinances , the amount exceeds 600 million yuan, and the pre-money valuation is 3.5 billion yuan Quark Browser PC version starts internal testing AI code assistant is popular, and programming language rankings are all There's nothing you can do Mate 60 Pro's 5G modem and radio frequency technology are far ahead MariaDB splits SkySQL and is established as an independent company Xiaomi responds to Yu Chengdong’s “keel pivot” plagiarism statement from Huawei
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4939618/blog/10319259