Android high-performance and high-stability code coverage technology practice

Preface

Code coverage is a measurement method in software testing that reflects the proportion and extent of code being tested.

In the software iteration process, in addition to paying attention to the code coverage during the testing process, the code coverage during user use is also a very valuable indicator and cannot be ignored. Because along with business expansion and function updates, a large amount of outdated and abandoned code is generated. These codes are either rarely or even completely no longer used, or are "in disrepair" and lack maintenance. This not only affects the application package volume, but may also bring to stability risks. At this time, it is very important to be able to collect the code coverage of the production environment, understand the usage of online code, and provide a basis for offline useless code.

Target

Our goal is very clear: according to the cloud configuration, collect the reach and usage frequency of each category online, upload it to the cloud, process it on the platform, and provide query and report display capabilities .

8d64ed55b5b677838bb3a6538e486e1d.jpeg

As shown in the figure above, we expect that code coverage data can be queried and intuitively displayed on the platform, and can be viewed directly when needed, providing a basis for decision-making for offline old code, resource scheduling and allocation, etc., and ultimately providing users with smaller App installation package for better functional experience.

Through the cloud control center, we can control whether to enable coverage collection, and we can also dynamically adjust the scheduling and allocation strategy of resources such as diamond positions and threads in the App based on coverage (class usage frequency). Among them, the coverage collection solution is the most important part. There are many mature solutions in the industry, but they all have their own suitable scenarios. Our goal is to collect code with class granularity without affecting user use and App operation as much as possible. Use coverage. The collection solution used should be less hacky, simple to implement, take into account both stability and performance, and at the same time, it should not invade the packaging process and affect the package size. After in-depth exploration, we have developed a set of solutions that perfectly meet these requirements. Brand new solution.

Scheme comparison

The following table shows the comparison of various indicators between common solutions and self-developed solutions. Green indicates better.

8cb78dd622aecd2a96c90f661e2c0904.png

As can be seen from the table:

Jacoco solution

Similar ones include Emma, ​​Cobertura, etc. They are all implemented through instrumentation and can support collection of all versions and granularities. However, instrumentation brings a certain impact on package size and performance, and is not suitable for large-scale online use.

Hook PathClassLoader scheme

It is simple to implement, has no source code intrusion, and supports all Android versions. However, Hook PathClassLoader not only brings performance impact, but may even affect App stability.

Hack access ClassTable scheme

It can be collected on demand and has almost no impact on App performance, but Hack may cause compatibility issues and is more complex to implement.

Self-research plan

  • Excellent performance, supports on-demand collection without compromising App performance

  • It is simple to implement, does not use any "black technology", and has excellent stability and compatibility.

  • Support cross-process and plug-in collection

From the comparison, we learned that the self-developed solution can better meet our requirements for collecting online code coverage, because it not only has good stability, but also has excellent performance and will hardly have any impact on users. So how does it achieve high performance and high stability? Please see the introduction below.

An Introduction

principle

To collect class-granular code coverage, you actually need to know which classes are loaded and used during the running of the App. In Java applications, this can be directly queried by calling the findLoadedClass method of ClassLoader, but in Android App it is not that simple. The reason is that the Android system has made such an optimization:

In order to improve startup performance, for App-customized classes, that is, classes loaded by PathClassLoader, if you directly call findLoadedClass for query, the loading operation will be performed even if the class is not loaded.

This is not what we expected.

Although we cannot directly call the FindLoadedClass method to query the loading status of the class, after in-depth research and analysis, we found that the ClassLoader finally obtains the class loading status by querying its ClassTable field. If we can also access the ClassTable, the problem will be solved. ? Along this line of thinking, we innovatively proposed a solution to copy the ClassTable pointer and indirectly access the class loading status through the standard API .

This solution cleverly achieves Hack-free access to ClassTable; at the same time, it perfectly bypasses the class loading optimization that we do not need. It achieves the acquisition of class loading status in just a few lines of code, which is clever and concise. It also has the following advantages:

  • The collection speed is more than 5 times that of ordinary solutions , with excellent performance

  • Use standard API to access ClassTable, with excellent compatibility and stability

  • Only one reflection is used, no "black technology", simple and stable

  • Does not affect class loading and App running

  • Perfectly supports collection of multiple processes and plug-ins

But there is one thing to note:

The ClassTable field was introduced starting with Android N, so this method is only applicable to Android N and above. Out of necessity and ROI considerations, we have not adapted to versions below Android N.

Collection process

Based on the above solution, we designed a complete code coverage collection function. The key processes are as follows:

cd9255384372a8dec3894020860fde49.jpeg

It can be seen that the entire end-side collection process is serial, which is very convenient for process control and data integration. The following explains the design ideas:

  • When collecting, the App is divided into two parts. One part is the host data used by the main process and sub-processes, and the other part is the plug-in data.

  • Based on query mode collection, the main process, sub-process, and plug-in respectively provide interfaces for querying class loading status.

  • The process is based on the serial method and is controlled by the main process. The corresponding interfaces are called in sequence to collect data from the main process, sub-processes and plug-ins.

  • Each version only collects and reports unloaded class data. When collecting for the first time, the complete set of classes is used as input; for each subsequent collection, classes that have not been loaded in the previous version are used as input. The more times of collection, the more classes need to be queried. few.

  • The main process and sub-processes are queried in sequence. The queries are all based on the remaining unloaded classes after the last query. Therefore, the later sub-processes require fewer queries. The same plug-in is also queried on instances of different processes. .

  • As shown below:

outside_default.png

  • At the end of the collection, one copy of host class data and N copies of plug-in class data will be generated (if there are N plug-ins). These data will be Diffed with the previous collection results, and the incremental data will be uploaded to the service.

  • The service platform performs storage, de-mapping, module association and other processing, and finally aggregates and displays them in the form of reports.

It is worth noting:

  • The classes used by the main process and sub-processes belong to the host, and the collection results should be merged into one piece of data. Similarly, no matter how many processes a plug-in is loaded in, only one piece of data for the plug-in should be generated in the end.

  • When collecting, we divide the data into two parts, which can improve the collection efficiency and facilitate subsequent deobfuscation; when displayed on the platform, the combined display is more meaningful.

Version management

Most Android App codes have been obfuscated, and the obfuscated class names will vary depending on the version. This requires managing coverage data based on the App version.

After managing data by version, each version will clear the data of the previous version to avoid data confusion; a specific class will be recorded after the current version has been used, and subsequent collection of this version will not repeatedly query its use. Condition.

When each version is collected for the first time, the complete set of App class names needs to be used as input. Each collection will generate a collection of unused classes as the input for the next collection. In this way, the number of classes that need to be paid attention to in each collection in a version will be gradually reduced, which can avoid meaningless queries and improve collection performance.

Class name data acquisition

Class name data can be obtained in two ways:

1. Obtain from the installation package

The class name data in the installation package can be obtained from PathClassLoader, and the plug-in can be obtained from the corresponding BaseDexClassLoader. Use the following method:

public static List<String> getClassesFromClassLoader(BaseDexClassLoader classLoader) throws ClassNotFoundException, IllegalAccessException {
    //类名数据位于BaseDexClassLoader.pathList.dexElements.dexFile中,可以通过反射获取


    //先获取pathList字段
    Field pathListF = ReflectUtils.getField("pathList", BaseDexClassLoader.class);
    pathListF.setAccessible(true);
    Object pathList = pathListF.get(classLoader);


    //获取pathList中的dexElements字段
    Field dexElementsF = ReflectUtils.getField("dexElements", Class.forName("dalvik.system.DexPathList"));
    dexElementsF.setAccessible(true);
    Object[] array = (Object[]) dexElementsF.get(pathList);


    //获取dexElements中的dexFile字段
    Field dexFileF = ReflectUtils.getField("dexFile", Class.forName("dalvik.system.DexPathList$Element"));
    dexFileF.setAccessible(true);
    ArrayList<String> classes = new ArrayList<>(256);
    for (int i = 0; i < array.length; i++) {
        //获取dexFile
        DexFile dexFile = (DexFile) dexFileF.get(array[i]);
        //遍历DexFile获取类名数据
        Enumeration<String> enumeration = dexFile.entries();
        while (enumeration.hasMoreElements()) {
            classes.add(enumeration.nextElement());
        }
    }
    return classes;
}

This method is simple and direct, but it will load all the class names in DexFile into the memory at once. According to our tests, every 10,000 classes occupy about 0.8mb of memory. For large apps with tens of thousands of classes, That said, there will be a lot of memory overhead. So you can also consider the second way.

2. Cloud download

Obtain the class name data from the construction platform and upload it to the cloud platform. The App can be downloaded and used when needed.

As for which method to choose, just choose based on the number of classes. When the number of classes is particularly large, such as in large-scale App scenarios, it is recommended to use the cloud method; for ordinary Apps or plug-ins, you can obtain them directly from the installation package class.

Subprocess collection

For classes that are not loaded by the main process, we will hand them over to the child process for query again. This requires the sub-process to provide a query interface that supports cross-process calls. We chose the AIDL solution that is simple, reliable, and easy to reuse to implement this.

The specific steps are:

Define the query interface through AIDL and define the corresponding Action. In the onBind method of Service, the Binder implementation class of the query interface is returned according to the Action for remote calling.

At the same time, considering the high cost of cross-process, it is undoubtedly unacceptable to call the query interface once for each class. So we thought of the file + batch query method: using files as data carriers, writing both loaded classes and unloaded classes into files, and passing file paths between interfaces. File operations can also use BufferedReader and BufferedWriter to improve performance.

The calling process is as shown in the figure:

6fdc20333382ec192af61b60ec951954.jpeg

The benefits of doing this are also obvious:

  • Collecting a process only requires one cross-process call, and the cost is extremely low

  • Avoid memory overhead of data serialization

  • Bypassing the problem that big data cannot be transferred directly across processes

  • The collection process is simpler and the required processes can be collected on demand.

  • Facilitates data filtering, avoids repeated queries of loaded classes, and improves collection performance

Plug-in collection

For the host class, just query the ClassTable corresponding to PathClassLoader.

Plug-ins are generally loaded through BaseDexClassLoader or its derived classes, and the ClassTable of the corresponding ClassLoader needs to be queried.

For plug-ins used in child processes, there is only an additional cross-process interface call to return loaded classes and remaining classes to the main process for processing.

The collection steps are as follows:

  • When querying the sub-process class, the plug-in class running in the process will also be queried, and the data will be written to files divided by plug-in names.

  • The collection of main process plug-ins is the last step of the entire process. At this time, the data files corresponding to each plug-in (generated by sub-processes) will be detected, merged, and finally the data files will be deleted.

  • Finally, the remaining plug-in data files are processed. These files belong to plug-ins that only run in the child process.

At this point, you have obtained the class loading data of all plug-ins.

Solve Mapping

When looking at code coverage data, we expect to see the original class name, so unmapping is the only way to go.

The demapping operation can be performed on the end or on the service side. For security reasons, we chose the service side.

Mapping files are generated by the packaging process, one for each installation package. Our approach is to use a script to generate mapping files for obfuscated classes and plaintext classes when building the platform and building the official package. When needed, the server obtains the corresponding mapping file through the App version information, decodes the original class name, and compares it with the module Make an association.

What is finally displayed to the platform is the code coverage data after mapping has been solved and is associated with modules and plug-ins.

Data storage and incremental calculation

The collected data needs to be stored. In order to facilitate the calculation of incremental data, we chose the database as the storage solution because it has inherent deduplication and sorting functions, and its performance is also good. The specific method is:

  • To create a data table, you only need to include a column named class. This column is declared as the primary key and does not accept null values ​​​​and duplicates.

  • Before each collection, the number of rows is obtained. During the collection process, the loaded class name data is updated into the table, allowing the database to automatically complete deduplication. After the collection is completed, the number of data rows is obtained again. The offset obtained by subtracting the number of rows before collection is the incremental part. We only need to upload this part of the data to the service.

Performance and stability

After our repeated testing and tuning, the average collection time for 5w+ categories is about 0.5s/time. During the collection period, the memory increases to about 500kb, and the CPU does not increase significantly.

At the same time, it has been verified by multiple versions of Amap online, and no related crashes or ANRs were found.

other

Bypass the black and gray lists

After Android P, ClassTable member variables have been officially added to the black and gray lists. Before using reflection access, SDK restrictions need to be bypassed. We use the method of meta-reflection + setting exemption. For specific implementation, please refer to the open source project FreeReflection on GitHub. If you want to know more, you can search on Google.

Collection timing and frequency

Although the collection process is short and innocuous, in order to minimize the impact on the running of the App, we place the collection work in a sub-thread and choose to start execution after the App has exited the background for a period of time.

At the same time, since we only need to know the proportion and general situation of code usage, we can only collect it once after each cold start.

The data after multiple cold starts by multiple users is enough to reflect the real code usage. If you need usage frequency data for each category, you can also get it by aggregating statistics on the server side.

write at the end

As a measurement method, code coverage not only provides a basis for us to remove old code, but also reflects the popularity of a certain function. It can provide a basis for resource allocation, scheduling decisions, etc. It is an indispensable factor in software development. missing important tool.

Our new solution is concise but not simple. It cleverly achieves hack-free collection. It elegantly achieves high-performance collection of production environment code coverage while ensuring high stability and not intruding into the source code. It is already too high. Demap's multi-version verification is a mature, stable and efficient solution. I share it here in the hope that it can provide some reference and ideas for students who have the same appeal.

Follow "Amap Technology" to learn more

Recommended reading

Guess you like

Origin blog.csdn.net/amap_tech/article/details/132572876