The road to optimization of small program compiler performance

Author | Marco

Introduction

The applet compiler is a compilation building module in Baidu developer tools, which is used to convert applet code into runtime code. Due to business development, the old version of the compiler had problems with slow compilation and high memory usage. We made a large-scale reconstruction of the compiler, adopted a self-developed architecture, and made many optimizations such as multi-threading, code caching, and sourcemap. Performance and memory usage have been greatly improved. The full text introduces the design ideas and optimization methods of the new version of the compiler, as well as some technical points that can be used in general packaging tools.

The full text is 6629 words, and the estimated reading time is 17 minutes.

01 Preface

Mini program compilers are needed at all stages of mini program development, preview, and release. Therefore, compiler performance will directly affect developer development efficiency and the experience of using developer tools.

Because the old version of the compiler (based on webpack4) is very slow when building large projects and uses high memory, it has been complained by developers. After a lot of research and development, we finally adopted a completely self-developed architecture for new compilation, made a lot of optimizations for the construction of small program projects, and basically solved the problems of the old compilation.

The following figure is a comparison of the construction times of some projects:

The new version of the compiler achieves a performance improvement of 2 to 7 times compared to the old version, and supports features such as real-time compilation and hot reloading. It takes up less memory and builds better products.

The following introduces the development path of the new compiler from the aspects of framework selection, working principle of the new compiler, performance and product optimization methods.

02 Frame selection

When designing a new version of the compiler, it is necessary to clarify the current pain point issue: performance, and prioritize solving performance issues. Other new techniques and ideas that are helpful to the compiler are also implemented.

The old version of the compiler based on webpack4 has the following problems:

  • Large projects build too slowly.

  • dev startup is slow and incremental compilation is slow. It only supports loader caching, and bundles without caching are also slower.

  • Extension development based on webpack4 requires patching some modules to work, making maintenance difficult.

  • Part of the webpack bundle process cannot be optimized for the applet code structure, and there are invalid builds.

Newly compiled design goals:

  • Faster full compilation speed and eliminate the invalid build process of webpack.

  • Supports full caching to speed up first-time and incremental compilation.

  • Supports real-time compilation, reducing dev startup and secondary compilation time.

  • Supports multi-threaded compilation acceleration and page hot reloading.

  • Optimize product structure and reduce product volume.

2.1 Mainstream build tools

Introduced below are the mainstream front-end construction tools we have investigated. Each tool has applicable scenarios, advantages and disadvantages.

When designing a new version of the compiler architecture, the design concepts and technical features of other build tools are worth referring to.

Webpack build process:

Webpack advantages : complete functions, active community, strong configurability, and strong scalability.

Webpack disadvantages : complex configuration, slow construction speed, and difficulty in secondary development.

Parcel building process:

Advantages of Parcel : No configuration required, fast construction speed, native support for multi-threading and full cache, data sharing between multi-threads is done through lmdb, avoiding cross-thread communication overhead.

Disadvantages of Parcel : small ecosystem, limited customization, extensive use of Node plug-ins, and poor compatibility.

Vite build process:

Advantages of Vite : relatively simple configuration, compilation on demand, fast startup, and a good experience during dev.

Disadvantages of Vite : The ecosystem is small, and there are two sets of build processes for dev and release.

Other mini program platforms:

  • WeChat builds small programs based on gulp and C++ modules, and pre-builds npm modules, which provides better performance and development experience.

  • Alipay builds small programs based on webpack and uses esbuild to speed up code compression.

  • The Douyin mini program uses a self-developed compiler, and the construction process is relatively simple.

2.2 New version of compiler

When designing the new compilation framework, we drew lessons from the workflow of mainstream packaging tools, combined with the characteristics of the mini program code, and decided not to make a universal packaging tool, focusing on optimizing the packaging performance of the mini program.

Finally, we chose the solution of self-developed compiler and did a lot of optimization work. The optimization points of the new version of the compiler include the following aspects:

1. Support multiple Compilers to work together to decouple the construction of multiple types of projects such as dynamic library development.

2. The entire process is cached during the compilation phase, saving more than 90% of the secondary build time.

3.dev development uses on-demand compilation by default to improve single-page compilation performance.

4. Supports babel and swc multi-threaded compilation, increasing the full compilation speed by 2 to 7 times.

5. Adopt the new version of sourcemap protocol, remove unnecessary parsing and merging, and greatly reduce the time spent in the bundle phase.

6. Build-time markup optimization has been done for js, css, and swan template compilation to reduce bundle merging time.

7. For js compression and obfuscation in the preview and release stages, a parallel solution of terser and esbuild is adopted. esbuild is used to quickly generate preview packages, and terser can ensure the compression rate for release packages.

Judging from the results, the new compiler has significantly improved speed, resource usage and maintainability compared to the old version.

03 How the new version of the compiler works

The processing flow of the new compiler is similar to that of parcel. The Compiler controls the processing flow, and the Processor performs code conversion. The basic flow is as follows:

Several important modules:

  • The CompileEntry compiler is the entry module, including cli communication, dev server communication, command invocation, etc.

  • CompileManager is a compilation manager, used for downloading and managing dependent resources and collaborative building of multiple Compilers.

  • Compiler is a compiler module, used to compile project source code into runtime code. There may be multiple Compilers when building a project.

  • Processor is a unit processor used to handle individual compilation tasks such as code conversion and code merging.

Note : There is 1 Compiler for the Mini Program App project, and 2 Compilers for the dynamic library and dynamic extension projects.

3.1 Compiler compiler

Used to compile a single small program project and compile the developer's original code into runnable code.

job's duty:

1. Create a running context and provide config, fs file processing, watcher monitoring, logger and other modules for use by the Processor.

2. Full compilation and second compilation when files are changed; the second compilation here also goes through the full compilation process, but most of them use cached results.

3. Manage, schedule, and run the Processor processing unit.

4. Maintain Processor dependencies and result cache.

Features:

1. Implement full-process caching and write the input parameters and output results of each Processor into the cache. With caching, the secondary compilation time can be reduced by 90%.

2. Supports on-demand compilation. Each on-demand single page compilation, incremental compilation, and full compilation all follow the same Processor processing flow.

3. Automatically calculate cache parameter dependencies through the Proxy mechanism, eliminating the need to manually generate cache hashes for each Processor, reducing bugs compared to webpack or parcel.

4. Only the Processor dependency is maintained, and the ModuleGraph is not maintained, simplifying the processing process.

Regarding full-process caching, each packager has its own implementation plan. The basic principle is to generate a unique hash for the processing unit based on the current input parameters and dependencies. If the hashes are consistent, the results will be consistent.

Since webpack and parcel maintain ModuleGraph, the calculation and reuse of cache will be more complicated. The applet compiler only performs calculations based on Processor input parameters and call dependencies.

3.2 Processor unit processor

Processor has the following characteristics:

1. When the input parameters are consistent, the output is guaranteed to be consistent. Both input and output must be serializable into json, realizing full Processor cache.

2. The uri in the Processor is the build ID. If the ID is consistent during a single build, the processing results will be consistent. For example, when processing the app.js file, the uri is: js:app.js. The advantage is that the Processor resource processing path can be unified.

3. Processors support calling each other: processWith calls and continues execution, processWithResult calls and waits for the return result.

Note : The input parameters here include uri, app config, contextFreeData.

Several commonly used Processors:

1.JS Processor converts es6 code into es5 code, which is the most time-consuming module.

2.Swan Processor converts swan template code into view layer js code.

3.Css Processor uses postcss to handle unit conversion, dependency collection and other work in css.

4. The Bundle Processor merges the files processed by the previous transformer according to the bundle algorithm and outputs the results.

Processor workflow:

The Processor processing flow needs to go through the process of transform -> bundle. In the mini program, the bundles of js, css, and swan templates can be processed separately and in parallel. This is different from the processing mode of webpack, and is similar to the pipeline of parcel.

3.3 Performance and product optimization methods

3.3.1 Multi-core compilation optimization

Since the initialization speed and communication efficiency of the multi-threaded module in Node are better than those of multi-process, the new compilation chooses to use multi-threading for multi-core optimization.

There are 2 options for multi-threaded compilation:

  • Option 1: Multi-thread scheduling based on processors. Since processors support mutual calls, the actual processing will be very complicated and involve communication costs.

  • The old compiler made a workerthread-loader based on webpack, and the performance improvement was limited (10%~15%).

  • Parcel is a better solution based on the lmdb public cache to eliminate inter-thread communication and ensure reading and writing efficiency.

  • Option 2: Only perform multi-thread scheduling for js translation, with only two communication costs.

  • Use jest-worker and babel transform for js multi-thread translation or use swc multi-thread for js translation.

Since most of the build time is spent in js translation (there are a lot of node_modules dependencies in js, all of which need to be converted), css and swan module conversion takes less time.

The final option 2 only does js multi-thread translation. The processing process is simple and the benefits are better. The overall improvement is as follows:

  • Using jest-worker multi-threaded babel translation, 4 threads can increase the speed by more than 1 times.

  • Using swc for js translation, 4 threads increase the speed by more than 4 times.

JS Processor multi-threading:

in:

uri : Build the ID for the processor

contextFreeData : immutable data in a single build, such as configuration items in app.json

context args : global parameters, such as optimization experiment switch, multi-thread switch, etc.

The transformer unified conversion interface is stipulated in js conversion processing. Based on the interface, three types of processors: babel single-thread, babel multi-thread, and swc conversion are implemented, and processor switching can be done at any time.

Flexible settings can be made for different compilation environments:

1. In the developer tools, developers can switch between multi-threading and swc compilation modes according to the machine configuration to improve efficiency.

2. The cloud compilation pipeline enables multi-threaded compilation by default to improve performance.

3. webIDE opens a single thread by default to reduce resource consumption.

3.3.2 SWC compilation optimization

Compared with the old compilation, the multi-threading mode of the new compiler has been improved by about 1 times. During dev development, the first compilation of some large project pages is still a bit slow, taking more than 10 seconds, mainly in js transform.

swc is now basically mature in js translation, and most scenarios can increase the translation speed by more than 4 times. Therefore, swc multi-threaded translation support has been added, and the first compilation of large project pages can be controlled within 5 seconds.

Two swc plug-ins need to be written to adapt to swc translation:

  • @swanide/swc-require-rename will extract the path information of the module in require/import/export to facilitate subsequent analysis of module dependencies in js.

  • @swanide/swc-web-debug performs instrumentation processing on js code to support breakpoint debugging in real machine debugging.

The performance improvement brought by swc compilation is huge, and some problems have also been discovered during use:

1. There is a memory leak in swc. If the number of full compilations is too many during the dev stage, it will cause high memory usage and the compiler needs to be restarted manually.

2. The swc plug-in supports fewer APIs, and some functions that are easy to implement with babel are difficult to handle in swc.

3. Since swc uses rust to write plug-ins, the plug-ins cannot be used between different @swc/core versions. Swc plug-ins need to be generated for different platforms, which will be more troublesome in deployment.

In actual use, for some scenarios that swc cannot handle well, it will be downgraded to babel for processing.

3.3.3 Code compression and runtime caching

In the dev stage, the compiled code is not compressed and can be run in the simulator. Due to the limitation of package size during the preview release stage, code compression is required to reduce the product size.

There are three optional code compression tools:

1.terser has high compression rate, small product volume and the slowest speed.

2.swc compresses quickly, mangle support is incomplete, and the compression rate is poor.

3.esbuild has the fastest compression (more than 10 times faster than terser), supports mangle, and the code compression rate is not as good as terser.

Finally, after comparison and consideration, the following compression scheme was selected:

1. Since there is no need for sourcemap in the preview phase, remove the sourcemap and use esbuild for code compression to improve the preview speed (a great improvement for automatic preview scenes).

2. Use terser for multi-thread compression during the release phase and retain the sourcemap.

Runtime caching means that the intermediate results of the build process are cached in memory, including Processor processing results and code compression results, which can save most of the rebuilding time during the second build. Since strings and json objects are retained in the cache, there is a memory saving of 40% to 60% compared to the old compiler based on webpack, which is within an acceptable range in terms of memory usage.

3.3.4 Swan template processing optimization

The old swan template processing uses swan-loader for template conversion. Since the template import scope is not properly handled during the design, the <template> tag and filter filter function can only be inlined into the page code. If template is used extensively in the template and filter, the final generated code size will be very large.

The new compiler corrects the import scope relationship, changes the template and filter generation modes in the compiled product from inline to require reference, and then merges the code in the bundle stage so that the same modules can be reused, which fills a big hole. .

New compiler swan template processing flow:

The possible products after a single swan file is processed by Processor are:

  • component component module, used to generate pages and custom components

  • template module

  • filter filter function, sjs filter function

  • transformed document intermediate code

Convert swan templates into different types of js modules and maintain dependencies to facilitate more refined control during subsequent code merging.

Due to historical reasons, the template module cannot be generated directly when import/include contains sjs or template references. It needs to be generated in the final entry template. The new compilation also provides template static compilation options, which will strictly limit the import scope and directly generate template module code. For small program projects generated by taro, it can save about 30% of the product size.

3.3.5 Sourcemap optimization

Since the compiler needs to support js code debugging and runtime error tracking, sourcemaps need to be generated during both the dev and release phases.

When generating code in webpack, sourcemaps need to be merged and calculated. For larger projects, sourcemap merging will take a long time, and sourcemaps must be recalculated every time they are recompiled.

During the research, I found that the browser devtools has very good support for the index map of the sourcemap protocol. The new compiler has done sourcemap merge optimization based on the index map protocol. The previous multi-file sourcemap merge calculation has become a calculation to generate an offset map and splice the content. This way The time taken by js bundle has changed from a few seconds to tens of seconds to a fixed time of less than 3 seconds.

An interesting thing is that vscode's js-debugger did not support index map debugging until June 22 (index map was released in 2011), and Microsoft's actions were slightly slower.

3.3.6 Follow-up work

In the promotion of the new compiler after the development is completed, a progressive promotion method is adopted:

In the first stage , the new and old compilers of developer tools coexist, dev and preview use the new compiler, and release uses the old compiler.

In the second stage , all internal pipeline previews and releases use the new compilation.

In the third stage , all developer tools are switched to the new compiler.

There are still some minor compatibility issues after the new version is compiled and actually goes online. It is necessary to expose the issues as early as possible before full replacement can be released.

For small program projects, the new compilation has done a lot of optimization work, and some optimization work has not yet been completed, including:

HMR hot reload : During development, since the runtime framework and developer tools require interface adaptation, it takes a long time to debug to achieve expectations.

Tree-shaking code elimination : For es6 modules, tree-shaking code can be eliminated during the transform stage.

scope-hoisting scope hoisting : theoretically feasible, the code reduction effect needs to be verified.

Since the new version of the compiler needs to be fully compatible with the build results of the old version of the compiler, there is still room for optimization in the bundle packaging scenario. We can do more optimization of the packaged products in conjunction with the runtime framework in subsequent work.

04 Summary

The new version of the compiler adopts a self-developed packaging solution. Compared with the old compiler based on webpack, it has achieved huge performance improvements, completely solving the problems of slow compilation and high resource usage. It also has good performance advantages over the compilers of competitors.

Some optimization methods introduced by new compilation, such as swc translation, esbuild compression, and sourcemap optimization, can also be used in the construction of other front-end projects and have an acceleration effect.

In the new compiler project, every student worked very hard and contributed many wonderful ideas, and most of the problems encountered were effectively solved. We will continue to adhere to the two directions of performance and product optimization, and continuously improve developer experience and runtime efficiency.

——END——

Recommended reading

Optimization practice of Baidu APP iOS package size 50M (6) Useless method cleaning

Real-time interception and problem distribution strategy based on abnormal online scenarios

Extremely optimized SSD parallel read scheduling

The practice of AI text creation and publishing on Baidu App

DeeTune: Design and application of Baidu network framework based on eBPF

Fined 200 yuan and more than 1 million yuan confiscated You Yuxi: The importance of high-quality Chinese documents Musk's hard-core migration server Solon for JDK 21, virtual threads are incredible! ! ! TCP congestion control saves the Internet Flutter for OpenHarmony is here The Linux kernel LTS period will be restored from 6 years to 2 years Go 1.22 will fix the for loop variable error Svelte built a "new wheel" - runes Google celebrates its 25th anniversary
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4939618/blog/10114374