Dead code and say goodbye! Ali entertainment destructive program code coverage statistics

Author |  Ali Baba advanced wireless entertainment development engineers of Sun Long

Zebian | Tu Min

background

In order to adapt to the rapid iteration of the product, usually a large number of R & D resources will be invested in the development of new features, functions and governance for useless but little attention. Over time, online application will accumulate a lot of dead code, together with the personnel change and the transfer function, cost management increasingly useless code. The final application installation package is too large, resulting in reduced application download conversion rate, limited shelf application platform (such as the application can not exceed 100M Google store shelves), reducing the efficiency of research and development and so on.

How useless governance codes? The first is a static code is scanned. For Android apps, ProGuard static analysis tools can code references relationship, automatically cut off the code has not been cited to reduce installation package size during the construction phase.

Of course, only the code scanning is still not enough, because it is not representative of the actual usage of online users, so it needs a line of code coverage statistics program user.

Line of code coverage statistics Here I will cut into the Android applications to share technical thinking and floor plan Youku dead code governance.

Traditional acquisition program

First of all, with statistical code in the code at the required statistics. When the code is executed, statistics and reporting. The number of lines of code applications typically are tens of thousands, manually add clearly unrealistic, it is usually to insert statistical code (hereinafter referred to as instrumentation) by Aspect Oriented Programming (AOP) in the construction phase, it can make use of some mature AOP middleware completed, e.g., Jacoco, ASM.

Secondly, the need to think about is, what we expect to collect the particle size? Generally, from fine to coarse particle size is divided into: an instruction, a branch, a method, class level, the finer the particle size, the more accurate the result code coverage, but the performance loss is larger. For example, if you want to capture the instruction level granularity, it is necessary for each instruction instrumented, but this can lead to cartridge also doubled the number of instructions, the installation package is increased and running performance degradation.

Youku has tried to branch size with Jacoco instrumentation, we had hoped to cover as many users because the more users covering more accurate results. However, after testing, this program increases the installation package 10M, a serious deterioration in performance running, determined to give this program.

In order to weigh the performance and size of the acquisition, we now generally taken the class level of granularity instrumentation, partly because such a small impact on performance, on the other hand too small size of the acquisition will add to the difficulty of the business side of governance. However, this scheme is not perfect:

1) performance runtime: when the class will be executed when first loaded statistical code, App boot process will load thousands of class, cause some impact on the start-up performance; 

2) packet size: How many classes, how many lines of code will be inserted statistics, for such a large App like Youku, will increase the number of install package size;

3) Construction of Processed: because of the need for each instrumented class build process, increase the construction time-consuming;

The new acquisition program -SlimLady

▐   target

Yoqoo wanted a collection line scheme may losslessly code coverage, the core of the following objectives:

  1. Run-time performance: no effect; 

  2. Package size: no effect;

  3. Construction of time: no effect;

▐   achieve

By studying the source code, you can find information about the class has been loaded by the dynamic query DVM virtual machine to obtain a class-level code coverage, under the figure "coverage Acquisition" section is the schematic SlimLady collection, here we focus only on this part, other portions will be explained later in the overall scheme.

ClassTable

Java Virtual Machine specification, the class must first be loaded virtual machine before use. Android, the class loading is done by ClassLoader, last saved in ClassTable Native layer, so if we get the ClassTable all ClassLoader objects, it is possible to determine the virtual machine which classes are loaded.

First, get all the ClassLoader object. For APK classes, if not specifically stated, the default will be generally PathClassLoader loading; to dynamic loading, requires a custom ClassLoader loaded, for example, creates a corresponding Atlas ClassLoader the Bundle for each, through which ClassLoader Bundle of class to load. Once you know what App used in ClassLoder, get a piece of cake

Secondly, to obtain the address of the object ClassTable by ClassLoader. Java source code through the layers ClassLoader class can be seen, there is a ClassLoader member variable classTable (7.0 and above), this variable contains the address of Native layer ClassTable object, we can obtain this address by reflection:

ClassLoader classLoader = XXX;
Field classTableField = ClassLoader.class.getDeclaredField("classTable");
classTableField.setAccessible(true);
long classTableAddr = classTableField.getLong(classLoader);

However, in the system 9.0 classTable member variable is added to the dark gray list, limiting direct reflection, reflected around this limitation is required by the system class:

ClassLoader classLoader = XXX;
Method metaGetDeclaredField = Class.class.getDeclaredMethod("getDeclaredField", String.class);
Field classTableField = (Field) metaGetDeclaredField.invoke(ClassLoader.class, "classTable");
classTableField.setAccessible(true);
long classTableAddr = classTableField.getLong(classLoader);

At this point, we get the addresses of all the ClassTable object, which holds all of the class loading information.

Class name list 

By reading the source code found ClassTable there is a way if loaded by the class name query class too (The following section describes in detail), so we just need to get a list of all class names, then call that method that can determine whether a class is loaded too .

APK in the class name list can be obtained by DexFile, as follows:

List<String> classes = new ArrayList<>();
DexFile df = new DexFile(context.getPackageCodePath());
for (Enumeration<String> iter = df.entries(); iter.hasMoreElements(); ) {
    classes.add(iter.nextElement());
}

Similarly, dynamic loading can be obtained by DexFile;

Whether the class is loaded

Class_table.cc found in a reading source, ClassTable Lookup there, passing class names and class names of the hash value, returning the address of the class object, as follows:

mirror::Class* ClassTable::Lookup(const char* descriptor, size_t hash)

If the return value nullptr, had such description is not loaded, otherwise, indicating some charged.

mirror::Class* ClassTable::Lookup(const char* descriptor, size_t hash)

This method is a method to obtain the address:

  1. Load so: class_table.cc in libart.so, all we need to load libart.so with dlopen get this so the handler. In fact, before loading, libart.so must have been loaded in the current process, just to get the load handler, not time-consuming;

  2. Symbol table: Query by readelf Lookup symbol: _ZN3art10ClassTable6LookupEPKcj;

  3. Method pointer: call dlsym, incoming handler and symbol tables that can be found in the address Lookup methods;

Note: From 7.0 system, Google prohibits the calling system of Native API, here we find libart.so address via / proc / self / maps, which will copy the symbol table, and thus circumvent this limitation;

At this point, we can by calling ClassTable Lookup method, passing the class name and the hash value to determine whether the class is loaded before.

to sum up

In this way, we will know at some point what class is loaded before, upload them, and the polymerization process, and then by comparing the list of all the class name, you can get the code coverage data. This scheme does not require instrumentation, coverage can be collected without loss.

The overall design of the new program

Acquisition scheme mentioned above are the core supporting process, in addition to the downstream of the program as a whole, the overall design scheme as shown below:

1) APK Distribution: The latest APK constructed by building centers, distributed to users; 

2) Trigger: user installed application, during use, the APP after Going back 10s, the sampling rate is calculated by the hit, if hit, is triggered acquisition code coverage

3) Configure issued: when needed, can be sent configured to dynamically adjust the switching function, the sampling rate by the following configuration arranged at the center;

4) Data collection: intermediate code coverage acquisition (SlimLady) statistics class that is loaded, the loaded class names stored in a file, compressed, and the compressed data is transmitted to upload middleware;

5) Data Upload: uploading data uploaded to the cloud middleware;

6) Data Download: cloud servers periodically to download the data;

7) provide class information: Class server obtains information from the construction of the center, including a list of all classes and confusing file names;

8) data analysis: according to the version of the server code coverage data is decompressed, anti-aliasing, aggregation statistics, statistical results after polymerization include the class and the number of loaded earlier, compared with a list of names of all classes that can know which classes It had not been loaded, save the results to the database;

9) Results of polymerization: polymerization results page read from the database side, press code coverage display module, the module heat, block size and other information.

to sum up

The program breaks through the traditional instrumentation Buried statistics, dynamic access to virtual machine information, non-destructive collect code coverage. With code coverage data, we do have a lot of control, for example: offline useless code module; thin call or offline low, a large volume module; add bayonet code coverage during the integration phase and so on.

【End】

Recommended Reading 

deep learning "Big Three", Turing Award winner Yann LeCun: I have no talent, so just follow the wise

64-% of companies did not realize intelligent algorithm 5 into the company engineers team size is less than 10 people, where opportunities for engineers in AI?

2020 years, the five kinds of programming language will die

withstood million people live, recommended the United Nations since the end of the book fly migration path technology!

do not know what AWS that? This 11 key with you know AWS!

written contract Solidity of intelligent design patterns

You look at every point, I seriously as a favorite

Released 1861 original articles · won praise 40000 + · Views 16,860,000 +

Guess you like

Origin blog.csdn.net/csdnnews/article/details/105021188