Centralized Decision Management - Cloud Analysis

Author: Qian Jiawei, R&D Engineer, Product R&D and Engineering Architecture Department-Client Infrastructure-App Infra-DevOps-Developer Tools

foreword

CocoaPods cloud analysis capability is one of a series of cloud infrastructure provided by the Developer Tools department under ByteDance's Client Infrastructure team. The Developer Tools team is committed to building the next-generation mobile cloud infrastructure. The team uses the cloud IDE Technology, distributed construction, compilation link and other technologies to optimize the quality, cost, safety, efficiency and experience of the company's business development and delivery process.

1. Background

Under the iOS componentized development model, CocoaPods has become the standard dependency management tool in the iOS industry. However, with the continuous expansion and iteration of business capabilities, the number of components continues to increase, resulting in a sharp increase in the complexity of App engineering, a serious decline in the efficiency of dependency management, and even potential stability problems. In order to manage the component dependencies of large-scale projects more quickly and stably, the iOS build department has created a set of centralized dependency management services - cloud dependency analysis, which converges the dependency management process from the level of the tool chain, accelerates the resolution speed, and aggregates failure problem.

picture

2. What is Cloud Dependency Analysis?

picture

For iOS project management based on CocoaPods, every time you execute pod install, you need to synchronize the component index information Spec repository to the local, generally relying on the clone of the git repository, and then read the Podfile, Lockfile and other configuration files, and start to enter the dependency analysis. , dependency download, project integration and other steps.

picture

Cloud Analysis is a cloud service that relies on ByteDance's self-developed product library platform, uploads local engineering construction materials through the tool chain, quickly returns dependency analysis results, and centrally manages iOS engineering dependencies. The cloud analysis service will rely on the product library to provide all component index information; and through the cloud analysis local tool to obtain local engineering materials during the environment preparation process, upload them to the cloud for dependency resolution tasks, the cloud uses a series of optimization methods and server performance, Quickly return a resolution result, and perform subsequent dependency download and project integration process after receiving the resolution result locally.

The access method of cloud analysis is also extremely easy, without adding configuration files or modifying the original R&D model . The only thing that needs to be done is to add the RubyGem plugin for cloud analysis to the CocoaPods toolchain, and add a control switch parameter to enable optimization in the pod install command.

3. How to speed up the resolution

3.1 Product library (full component index information)

The iOS development system based on Cocoapods is very extensive in product management of iOS. Different git repositories are directly used as index repositories for build products (podspec files), and play the role of product repositories. With the complexity of iOS projects, the increase of file information in the git repository makes it difficult to query component index information, and the synchronization speed of the repository is slow. BitNest product library is the company's self-developed mobile product management system, which is used to manage the construction products generated in the continuous integration process. The product library centrally manages the podspec sources separated in each git repository, and can quickly pull and query podspec information through a complete set of CLI commands. With the help of the product library capability, the cloud analysis service can access a full and complete podspec source information in the cloud in real time. Each CocoaPods task does not need to update the podspec source information, and it will not find the latest version of the component podspec information because the podspec source information is not updated in time.

3.2 Cache mechanism

picture

Before introducing the caching mechanism, let's briefly introduce the running process of dependency analysis in pod install. At the first execution (ignoring lockfile), CocoaPods will read the specific plugin, source, target, pod and other content from the Podfile through the DSL, and create corresponding objects to complete the preparation phase. In each Target object, each pod is created as a Dependency object, and there will be a specific Requirements object. All Dependency objects of all Target objects are added to the stack one by one, and a Graph dependency node graph is created. Each Dependency object searches the corresponding Source repository according to its Requirements to find the corresponding pod. If there is no repository information in the Requirements, it traverses the public Source of the podfile to find it. After the corresponding pod is found, a version list will be created first, and all pods that meet the Requirements will be found from the version list, and then the content of the corresponding podspec file will be read. The resolution will create a new Dependency for the implicit pod in the Spec object and add it to the analysis stack and Graph. If a version of Spec does not meet the requirements of another dependency with the same name when traversing the Graph dependency graph, it will perform pop-back and dependency graph withdrawal until all Dependencies have been found to the corresponding Spec object, and the analysis is completed. . It can be seen that in the process of CocoaPods dependency management, there are a large number of repeated object creation and sorting and search processes, which greatly reduces the research and development efficiency. Imagine that the objects required by the CocoaPods task are always in a ready state, and whenever a task request is received, the dependency analysis work is performed immediately, and the result can be returned quickly. The cloud analysis service centralizes all CocoaPods dependency management tasks, and builds an object caching mechanism for repetitive work. The lazy loading mode is adopted to cache the newly added objects, and immediately enter the dependency resolution process after the next task comes in.

3.2.1 Sorting the Version cache

picture

When analyzing each pod, in order to obtain the latest version of pod dependencies, CocoaPods will create corresponding Version objects for all version numbers in the source repository and sort them. At present, most of the company's internal product versions have reached the order of magnitude of tens of thousands, and without specifying the source source, the binary version and the source code version will be sorted and read, and finally the latest version that meets the requirements is obtained. Since the component version numbers are segmented with "." and "-", most component versions have more than 4 or 5 fields. This also causes tens of thousands of components to be traversed more than 4 times for each sorting and comparison in the process of sorting, which increases the time complexity several times and greatly increases the time consumption.

In order to obtain an ordered version list faster, the product library service maintains the version files of all pod components sorted from large to small; each time a new pod version is added, the product library will insert a new version into the file; delete , the corresponding version field is deleted.

With orderly version files, the main purpose of adding version cache to Cloud Analytics is to maintain version segmentation information in the Version object, so as to quickly determine whether the current version meets the requirements of dependencies. Version  caching can speed up the dependency management process by about 10-12 seconds .

In the case of no version cache, Cloud Analytics will preferentially read the data in the version file and directly obtain an ordered version list; if the length of the version list is inconsistent with the length of the component version directory in the source, it will fall back to the original method (version list error to ensure the correctness of the analysis). In the case of a cache hit, it is also necessary to judge whether the length of the cache version list is equal to the length of the pod version directory (there is a new version, but the cache is not new), then the difference version will be found from the version list array, and the cache will be corrected. .

3.2.2 Spec Object Cache

picture

When CocoaPods finds the podspec that meets the dependency requirements from the sorted versions, it will read all the podspec versions that meet the dependency requirements, and perform dependency resolution traversal. If the specific version is not specified, all versions of the podspec file will be read, and if the specific source source is not specified, all pods where the source exists will also be read. It takes about 30 seconds to read 10,000  podspec  files (depending on different disks) .  

Cloud Analytics will cache the contents of the podspec file read by each IO of the analysis task. When the next task obtains the Spec object, the corresponding Spec object can be directly obtained according to the three fields of source, pod_name, and version.

At the same time, in order to ensure the correctness of Spec, to prevent Spec from changing the content without changing the version. The Spec object cache exists in the form of a multi-dimensional array. By judging the modification time of the podspec file, the podspec content in the cache is updated to the latest submission, ensuring that the checksum calculation is the same as the calculation value of the local pull-in dependency analysis, and the cloud dependency analysis is realized. correctness. In the future, the number of Spec cache hits, Spec object expiration time, etc. will also be increased to implement the Spec cache cleanup strategy.

3.2.3 Cache reuse

picture

Cloud analysis will also cache the analysis results, and the same analysis task can be directly reused next time. After the cloud obtains a material, it will perform a global hash calculation and a segmented hash calculation for the material, and cache the 完整的分析结果sum respectively 分析结果图 Graph. For the next analysis task, if it is the exact same material, it can directly return an available complete analysis result; if it does not match, it will calculate the first level through some target, platform and other information 平台信息 keyto determine the specific app information; The components depend on the calculation of the hash value one by one, obtain the second level  hash 数组 key, and correspond to an analysis result graph Graph value; match the hash array key through fuzzy matching, and match the similar graph with the same number of dependencies to replace the locked_dependencies in the material , to speed up the analysis. Of course, the fuzzy matching ability also has certain limitations, and it cannot accelerate the analysis task of uploading lockfile materials.

3.3 Material pruning

Cloud Analytics converts CocoaPods objects into byte streams for transmission. The specific uploaded materials and analysis results are as follows:

picture

1. Upload materials

The cloud analysis toolchain will use the Podfile object, the Molinillo Graph object generated by the lockfile, the specified Source object, the plug-in adapter, and all the external source Specs objects (specifically, the pre-release objects of the specified git, path, and podspec) as upload materials. But in fact, cloud analysis does not need all the information of these local objects, and these objects can be pruned. For example, the Podfile object only needs the linked list of target_definitions; the Molinillo Graph object only needs the nodes corresponding to all pods, and does not need to record operations The node's log; the Source object only needs to know the name and repo_dir, and so on. Among them, some resolution optimization plug-ins need to additionally transmit some configuration Config objects through the plug-in adapter.

2. The result is returned

The result returned by Cloud Analytics is a hash object with Target as the key and the corresponding Specs array as the value. Before the result is returned, all Spec sources will be pruned. Because the Source corresponding to each Spec only uses the url field to classify and generate lock files in the subsequent process. Therefore, other useless fields of the Source object can be removed to minimize the transfer content and speed up the response time. After pruning the returned results, the size of the transmitted content can be reduced by more than 10MB .

picture

3.4 Resolution Policy Compatible

In order to ensure the correctness and uniqueness of the decision results (single truth), Cloud Analysis is compatible with the toolchain of ByteDance's internal CocoaPods decision strategy optimization. According to the construction configuration parameters in the project, the cloud analysis local plug-in identifies the specific decision strategy, transmits it to the cloud analysis server, and activates the corresponding decision strategy algorithm for quick decision. At the same time, combined with the existing decision optimization strategy and the optimization acceleration mechanism of the cloud, the dependency management process of CocoaPods can be returned in seconds .

4. Summary

This article mainly shares an optimization solution for CocoaPods cloudification in ByteDance. It converges and reuses a large number of repetitive iOS engineering pipeline construction tasks, and accelerates the dependency management rate on the premise of ensuring the correctness of dependency resolution. Improved R&D efficiency. At present, the cloud analysis service has completed the first stage of development and has been put into use, and has been used by several core production lines within the company. For example, after Toutiao is connected to the cloud analysis service, the time-consuming of the dependency analysis phase of the pipeline is accelerated by more than 60% . In the follow-up, for the download optimization of CocoaPods, the engineering cache service is also under technical exploration, and related technical articles will be shared one after another, so stay tuned!

Extended reading

Detailed explanation of CocoaPods principle: https://mp.weixin.qq.com/mp/appmsgalbum?__biz=MzA5MTM1NTc2Ng==&action=getalbum&album_id=1477103239887142918&scene=173&from_msgid=2458325057&from_itemidx=1&count=3&nolastread=1#wechat_redirect

CocoaPods optimization: https://www.infoq.cn/article/adqsbwtvsyzuvh429p8w

join us

We are the Developer Tools team under the Client Infrastructure department of Byte. The team members are composed of IDE experts and build system experts. The team is committed to optimizing the development and delivery process of the company's various businesses through client cloud technology and compilation and construction technology. Quality, cost, safety, efficiency and experience. At the same time, we have also seen many exciting new opportunities in the process of practice. We hope that more students who are interested in the compilation toolchain technology will join us to explore together.

Job link

https://jobs.bytedance.com/referral/pc/position/detail/?token=MTsxNjYzMTM3NTEwNTU2OzY2ODgyMDc4MjQ2MDQyMzUyNzI7Njc5OTgyMjIzMjczNTQ1MTQwMA

picture

【Scan the code to submit your resume】

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4180867/blog/5580617