AutoNavi Android high-performance and high-stability code coverage technology practice

foreword

Code coverage (Code coverage) is a measurement method in software testing, which is used to reflect the proportion and degree of code being tested.

In the process of software iteration, in addition to paying attention to the code coverage rate in the testing process, the code coverage rate in the process of user use is also a very valuable indicator, which cannot be ignored. Because along with business expansion and function updates, a large number of outdated and obsolete codes are generated. These codes are seldom or even completely no longer used, or are "in disrepair". Lack of maintenance not only affects the size of the application package, but also may bring to stabilize the risk. At this time, it is very important to be able to collect the code coverage of the production environment, understand the usage of online code, and provide a basis for offline useless code.

Target

Our goal is very clear: According to the cloud configuration, collect the touch and usage frequency of each category online, upload it to the cloud, process it on the platform, and provide query and report display capabilities .

As shown in the figure above, we expect that the code coverage data can be queried and displayed intuitively on the platform, and can be viewed directly when needed, so as to provide decision-making basis for offline old code, resource scheduling and allocation, etc., and ultimately provide users with smaller App installation package, better functional experience.

Through the cloud control center, we can control whether to enable coverage collection, and also dynamically adjust the scheduling and allocation strategies of diamond bits, threads and other resources in the app according to the coverage (class usage frequency). Among them, the coverage collection solution is the most important part. There are many mature solutions in the industry, but they all have their own suitable scenarios. Our appeal is to collect class-granularity code on the premise of not affecting user use and App operation as much as possible. Use coverage. The collection scheme used should be less hacky, simple to implement, taking into account stability and performance, and will not invade the packaging process and bring impact on package volume. After in-depth exploration, we have developed a set of solutions that perfectly meet these requirements. Brand new solution.

Scheme comparison

The table below shows the comparison of various indicators between the common scheme and the self-developed scheme. Green means better.

From the table it can be seen that:

Jacoco scheme

Emma, ​​Cobertura, etc. are similar. They are all implemented by instrumentation and can support the collection of all versions and granularities. However, instrumentation brings a certain impact on package size and performance, and is not suitable for large-scale online use.

Hook PathClassLoader scheme

The implementation is simple, no source code intrusion, and supports all Android versions, but Hook PathClassLoader not only brings performance impact, but may even affect App stability.

Hack access to ClassTable solution

It can be collected on demand and has almost no impact on App performance, but Hack may cause compatibility issues and the implementation is more complicated.

Self-developed program

  • Excellent performance, support on-demand collection, without compromising App performance
  • Simple implementation, no "black technology", excellent stability and compatibility
  • Support cross-process and plug-in collection

By comparison, we know that the self-developed solution can better meet our demands for collecting online code coverage, because it not only has good stability, but also has excellent performance, and hardly has any impact on users. So how does it achieve high performance and high stability? Please see the introduction below.

An Introduction

principle

To collect code coverage at class granularity, it is actually necessary to know which classes are loaded and used during the running of the App. In a Java application, this can be directly queried by calling the findLoadedClass method of ClassLoader, but it is not so simple in an Android App. The reason is that the Android system has made such an optimization:

In order to improve the startup performance, for the class customized by App, that is, the class loaded by PathClassLoader, if you directly call findLoadedClass to query, even if the class is not loaded, the loading operation will be performed.

This is not what we expected.

Although we can't directly call the FindLoadedClass method to query the loading status of the class, after in-depth research and analysis, we found that ClassLoader finally gets the class loading status by querying its ClassTable field. If we can also access the ClassTable, the problem will be solved ? Along this line of thinking, we innovatively proposed a solution to copy the ClassTable pointer and indirectly access the class loading status through standard APIs .

This solution cleverly realizes Hack-free access to ClassTable; at the same time, it perfectly bypasses the class loading optimization we don’t need, and realizes the acquisition of class loading status with just a few lines of code. It is clever and concise, and it also has the following advantages:

  • The acquisition speed is more than 5 times that of the common solution , and the performance is excellent
  • Use standard API to access ClassTable, excellent compatibility and stability
  • Only one reflection is used, without any "black technology", simple and stable
  • Does not affect class loading and App running
  • Perfect support for multi-process and plug-in collection

But there is one thing to note:

The ClassTable field is introduced from Android N, so this method is only applicable to Android N and above. For reasons of necessity and ROI, we did not adapt to versions below Android N.

Collection process

Based on the above solution, we designed a complete code coverage collection function, the key process is as follows:

It can be seen that the entire terminal-side acquisition process is serial, which is very convenient for process control and data integration. Let's explain the design idea:

  • During collection, the App is divided into two parts, one part is the host class data used by the main process and sub-processes, and the other part is the plug-in class data.
  • Based on query collection, the main process, sub-processes, and plug-ins respectively provide interfaces for querying the class loading status.
  • The process is based on the serial method, controlled by the main process, which calls the corresponding interface in turn to collect data of the main process, sub-processes and plug-ins.
  • Each version only collects and reports unloaded class data. When collecting for the first time, the complete set of classes is used as input; for each subsequent collection, classes that have not been loaded in the previous version are used as input. The more times you collect, the more classes you need to query. few.
  • The main process and sub-processes query in turn, and the queries are all unloaded classes remaining after the previous query as input, so the number of queries required for the later sub-processes is less, and the query of the same plug-in instance in different processes is similar to this .

As shown below:

  • At the end of the collection, one copy of host data and N copies of plug-in data (if there are N plug-ins) will be generated. These data will be diffed with the previous collection results, and the incremental data will be uploaded to the service.
  • The service platform performs storage, de-mapping, module association and other processing, and finally aggregates and displays them in the form of reports.

It is worth noting that:

  • The classes used by the main process and sub-processes belong to the host, and the collection results should be combined into one data; similarly, no matter how many processes a plug-in is loaded in, only one data of the plug-in should be generated in the end.
  • When collecting, we divide the data into two parts, which can improve the collection efficiency and facilitate subsequent de-obfuscation; when displayed on the platform, it is more meaningful to merge and display.

version management

Most Android App codes are obfuscated, and the obfuscated class names vary with versions, which requires management of coverage data according to the App version.

After managing data by version, each version will clear the data of the previous version to avoid data confusion; a specific class will be recorded after the current version has been used, and subsequent collections of this version will not repeatedly query its use Condition.

When each version is collected for the first time, it needs to use the complete set of App class names as input, and each collection will generate a collection of unused classes as the input for the next collection. In this way, the number of classes that need to be paid attention to in each collection in a version will be gradually reduced, which can avoid meaningless queries and improve collection performance.

Class name data acquisition

Class name data can be obtained in two ways:

1. Obtain from the installation package

The class name data in the installation package can be obtained from PathClassLoader, and the plug-in can be obtained from the corresponding BaseDexClassLoader, using the following method:

public static List<String> getClassesFromClassLoader(BaseDexClassLoader classLoader) throws ClassNotFoundException, IllegalAccessException {
    //类名数据位于BaseDexClassLoader.pathList.dexElements.dexFile中,可以通过反射获取

    //先获取pathList字段
    Field pathListF = ReflectUtils.getField("pathList", BaseDexClassLoader.class);
    pathListF.setAccessible(true);
    Object pathList = pathListF.get(classLoader);

    //获取pathList中的dexElements字段
    Field dexElementsF = ReflectUtils.getField("dexElements", Class.forName("dalvik.system.DexPathList"));
    dexElementsF.setAccessible(true);
    Object[] array = (Object[]) dexElementsF.get(pathList);

    //获取dexElements中的dexFile字段
    Field dexFileF = ReflectUtils.getField("dexFile", Class.forName("dalvik.system.DexPathList$Element"));
    dexFileF.setAccessible(true);
    ArrayList<String> classes = new ArrayList<>(256);
    for (int i = 0; i < array.length; i++) {
        //获取dexFile
        DexFile dexFile = (DexFile) dexFileF.get(array[i]);
        //遍历DexFile获取类名数据
        Enumeration<String> enumeration = dexFile.entries();
        while (enumeration.hasMoreElements()) {
            classes.add(enumeration.nextElement());
        }
    }
    return classes;
}

This method is simple and direct, but it will load all the class names in the DexFile into the memory at one time, and according to our test, every 10,000 classes occupy about 0.8mb of memory, for large apps with tens of thousands of classes at every turn That said, there would be a non-trivial memory overhead. So the second way can also be considered.

2. Cloud download

Obtain class name data from the construction platform, upload it to the cloud platform, and download and use the App when needed.

As for which method to choose, just choose according to the number of classes. When there are a lot of classes, such as large-scale app scenarios, it is recommended to use the cloud method; ordinary apps or plug-ins can be obtained directly from the installation package class.

subprocess collection

For classes that are not loaded by the main process, we will hand them over to the child process to query again. This requires the child process to provide a query interface that supports cross-process calls. We chose a simple, reliable, and easy-to-reuse AIDL solution to achieve this.

The specific method is:

Define the query interface through AIDL, and define the corresponding Action, and return the Binder implementation class of the query interface according to the Action in the onBind method of Service for remote calling.

At the same time, considering the high cost of cross-process, it is undoubtedly unacceptable to call the query interface once for each class. So we thought of the method of file + batch query: use the file as a data carrier, write both loaded and unloaded classes into the file, and pass the file path between the interfaces. File operations can also use BufferedReader and BufferedWriter to improve performance.

The calling process is shown in the figure:

The benefits of doing so are also obvious:

  • Only one cross-process call is required to collect a process, and the cost is extremely low
  • Avoid the memory overhead of data serialization
  • Avoid the problem that large data cannot be directly transferred across processes
  • The collection process is simpler, and the required process can be collected on demand
  • Facilitate data filtering, avoid repeated query of loaded classes, and improve collection performance

Plug-in collection

For the host class, query the ClassTable corresponding to PathClassLoader.

Plug-ins are generally loaded through BaseDexClassLoader or its derived classes, and need to query the ClassTable of the corresponding ClassLoader.

For the plug-ins used in the child process, there are only more cross-process interface calls, and the operation of returning the loaded class and the remaining class to the main process for processing.

The collection steps are as follows:

  • When querying subprocess classes, the plug-in classes running in the process will be queried at the same time, and the data will be written into files divided by plug-in names.
  • The collection of plug-ins in the main process is the last link in the whole process. At this time, the data files corresponding to each plug-in (generated by sub-processes) will be detected, merged and processed, and finally the data files will be deleted.
  • Finally, the remaining plug-in data files are processed. This part of the file belongs to the plug-in that only runs in the child process.

At this point, the class loading data of all plug-ins is obtained.

Solution Mapping

When viewing code coverage data, we expect to see the original class names, so unmapping is the only way to go.

The unmapping operation can be performed on the client or on the service side. For security reasons, we chose the service side.

Mapping files are generated by the packaging process, and each installation package corresponds to one copy. Our approach is to generate a mapping file between the obfuscated class and the plaintext class through scripts when building the official package on the platform. When needed, the server obtains the corresponding mapping file through the App version information, deciphers the original class name, and compares it with the module to associate.

What is finally displayed on the platform is the code coverage data that has been solved after mapping and associated with modules and plug-ins.

Data Storage and Incremental Computing

The collected data needs to be stored. In order to facilitate the calculation of incremental data, we chose the database as the storage solution because it has the functions of deduplication and sorting, and its performance is also good. The specific method is:

  • To create a data table, you only need to include a column named class, which is declared as the primary key and does not accept null values ​​and duplicates.
  • Before each collection, the number of rows in it is obtained. During the collection process, the loaded class name data is updated to the table, so that the database can automatically complete deduplication. After the collection is complete, get the number of data rows again, and subtract the offset from the number of rows before collection is the incremental part. We only need to upload this part of data to the service.

performance and stability

After our repeated testing and tuning, the average collection time for 5w+ classes is about 0.5s/time, the memory increases by about 500kb during the collection period, and the CPU does not increase significantly.

At the same time, it has also been verified by multiple versions of Gaode map online, and no related crashes and ANRs have been found.

other

Bypass the black and gray list

After Android P, the official ClassTable member variables have been added to the black and gray list. Before using reflection access, SDK restrictions need to be bypassed. We use the method of meta-reflection + setting exemption. For the specific implementation, please refer to the open source project FreeReflection on GitHub. If you want to know more, you can Google it yourself.

Collection Timing and Frequency

Although the collection process is short and insensitive, in order to minimize the impact on the running of the App, we put the collection work in the sub-thread, and choose to start the execution after the App exits the background for a period of time.

At the same time, since we only need to know the proportion and general situation of code usage, we only need to collect it once after each cold start.

The data after multiple cold starts by multiple users is enough to reflect the real code usage. If you need the usage frequency data of each class, you can also get aggregated statistics on the server side.

write at the end

As a measurement method, code coverage can not only provide a basis for us to offline old codes, but also reflect the popularity of a certain function, which can provide a basis for resource allocation, scheduling decisions, etc., which is an indispensable part of software development. important tool missing.

Our brand-new solution is concise but not simple. It cleverly realizes Hack-free collection. On the premise of ensuring high stability and not intruding source code, it elegantly realizes high-performance collection of production environment code coverage, which is already too high. German map multi-version verification is a mature, stable and efficient solution. I share it here, hoping to provide some reference and ideas for students who have the same appeal.

Android study notes

Android Performance Optimization: https://qr18.cn/FVlo89
Android Vehicle: https://qr18.cn/F05ZCM
Android Reverse Security Study Notes: https://qr18.cn/CQ5TcL
Android Framework Principles: https://qr18.cn/AQpN4J
Android Audio and Video: https://qr18.cn/Ei3VPD
Jetpack (including Compose): https://qr18.cn/A0gajp
Kotlin: https://qr18.cn/CdjtAF
Gradle: https://qr18.cn/DzrmMB
OkHttp Source Code Analysis Notes: https://qr18.cn/Cw0pBD
Flutter: https://qr18.cn/DIvKma
Android Eight Knowledge Body: https://qr18.cn/CyxarU
Android Core Notes: https://qr21.cn/CaZQLo
Android Past Interview Questions: https://qr18.cn/CKV8OZ
2023 Latest Android Interview Question Collection: https://qr18.cn/CgxrRy
Android Vehicle Development Job Interview Exercises: https://qr18.cn/FTlyCJ
Audio and Video Interview Questions:https://qr18.cn/AcV6Ap

Guess you like

Origin blog.csdn.net/weixin_61845324/article/details/132691460