Implementation principles of hot repair Class genre and Dex genre

Class genre principle

Basic principle: When loading a class, elements are found, and each element corresponds to a dex. I want to put the class I repaired separately in front of the dex insertion dexlist. When you load the class from front to back, the first thing to load from your dex is your repaired class. This is

Implement code

  1. Get the pathClassLoader through context and generate a dexclassloader based on the dex you issued.

  2. Get two pathlists, get the elements of the two pathlists, and then put the generated element of dexclassloader in front of the element of pathclassloader. Then assign the merged element to the element of pathclassloader

Problems encountered on Davlik virtual machine

unexpectDex crash

UnexpectDex crash() is thrown on davlik virtual machine

Business situation: A references class B (issued class) to be repaired

Throw unexpectedDex crash when three conditions are met simultaneously

Three conditions need to be met simultaneously to throw this crash :

  1. The patch class is not referenced through a static class or instance of
  2. The referenced class to which the patch is issued is verified successfully in the dexopt phase, and the referenced class is marked with the CLASS_ISPERVRIFYIED flag.
  3. These two classes are not on the same dex

When your app loads the referenced class (A references B, that is, when loading class B), it will do such a check. If you meet these three conditions at the same time, it will crash.

Since the patch class is placed separately in a dex, the third condition cannot be changed. You can only start with 1 and 2

Application installation requires a dexopt stage, which will optimize your dex into odex and then run the loaded odex to run.

The process of dexopt stage

Check whether the classes called by static methods, private methods, constructors, and virtual methods are in the same dex as the current class (whether the BCDE class called by A when calling the above method is on the same dex as class A)

On the same dex, the virtual machine will do some optimizations for class A and mark it with the CLASS_ISPREVERIFIED flag

For example, A refers to B. And when A and B are in the same dex, class A will be marked with the CLASS_ISPERVRIFYIED flag.

When exception is thrown

When class A (the class marked in the dexopt stage) is loaded later, the virtual machine will check the result of the Verfiy mark and perform reverse verfiy verification.

When the above three conditions are met at the same time during verification, the unexceptDex exception will not be thrown. Only when the verification passes will the class be loaded.

QZone instrumentation organization preverify solution

This plan definitely does not meet the third condition, so we can only start with the first or second condition.

QZone starts from the second condition to prevent preverify by inserting makeup

Solution: When the above special methods (constructors, static functions...) call classes on the same dex, they will be marked, so my cross-dex access will not be marked. The simplest thing is to access across dexes in the constructor, so that if they are not in the same dex, they will not be marked.

accomplish:

Create an empty class and put it on a separate dex

In the constructor of all classes, access the empty class in the independent dex. All classes have cross-dex access, so all classes in the entire app will not be marked.

But the independent dex needs to be loaded first, because the APP's PathClassLoader cannot find this class. Use the parent delegation model mechanism (first search from the buffer when loading a class) to load this empty class first, and then you can access this class later.


shortcoming:

There is a performance issue that affects the checksum optimization process of odex

Reduce APP startup performance and increase running memory

Qfix early constclass reference scheme

Start with the first condition thrown

Exceptions will be thrown for methods other than static class calling and instanceof.

If I use a static class to call the patch class, even if there is a cross-dex call damaged flag, no exception will be thrown. At the same time, when the classloader loads the class, as long as it has been loaded, it will preferentially read from the cache to utilize this mechanism.

The process of loading classes by davlik virtual machine:

First, it will be searched from the dex cache. If there is one, it will be returned directly. There will be no subsequent verification and loading process. After the subsequent loading and verification are completed, it will also be placed in the dex cache.

Implementation ideas

After the patch class is put in when the APP is started, the patch class is statically referenced in advance. This reference will not throw an exception (static class reference method) and will load the patch class into the cache of the virtual machine in advance, even if it is accessed later. It is non-static and does not need to be verified even if there is a flag conflict. You can directly return and subsequently read this class from the buffer.

Implementation code:
  1. We use static classes to load patch classes at the beginning of the application, but we don’t know which class to fix, so it’s impossible to load all the classes in the application (hahaha, unscientific)
  2. QFix directly calls the native method of the virtual machine loading class through nativehook . The dexId and classId of each class are saved when the APP is packaged. When running, find the dexid and classid where the patch class is located and actively call the virtual machine's method of parsing the class on the jni side (setting the formUnverifedConstant parameter to true means that this call is called in the form of constantof or instanceof. If this is true, (preverify verification will not be done), this time your patch is called and it will be in the cache. For subsequent use, you can just find it directly from the cache, and there is no need to verify it.

Because the native method of the virtual machine is called to load the class, there are many adaptations on different virtual machines, and there will also be stability issues. The shared document states that there is a problem on X86

Problems encountered on Art virtual machine

It is not only the issue of cascade optimization below, but also other issues that are being marked on the dex genre.

On the Art virtual machine, method inlining will cause greater problems. No matter which virtual machine it is, there is a dex optimization process at the installation stage.

Different Android versions have different odex compilers. Early compilers used QuickCompile, and later OptimizingCompare was used more frequently.

Different compilers have different method conditions for method inlining, and Optiminzing has cascading optimization operations (method1 calls method2, which calls method3, which calls method4) if these called methods meet the inlining conditions of the virtual machine.

The final compiled method1 directly contains the code of method2method3method4 (method

2 contains the code of 3 and 4, 3 contains the code of 4), inline means to write the code directly instead of calling it through the method id.

question

If ClassA happens to reference your patch class, and the patch class previously met the inline condition during virtual machine optimization, then the old method has been written into the reference class. At this time, the class can be loaded normally when a new class is issued for repair, but the method call is not called to your new class because your implementation has been written into the reference class. There will be problems

Due to inlining, the execution flow does not jump to the new method, and the method in the reference class uses the old method . For reference classes, the contents stored in the local variable table in the old method are still used, so the index of the old method is used to find member strings. However, the new patch class index may cause crash errors when accessing reference classes that may change.

Solution

Due to the existence of cascade optimization, the class you want to repair, your subclass, and the class that calls you must be placed entirely in the patch, and the entire patch is distributed, so the entire patch will be very large.

Dex genre hot repair principle

The system API that class interferes with is relatively low-level, so there are adaptation and compatibility issues.

Later, tinker embarked on the path of hot repair of dex stock.

Principle: Replace the entire dex, but it is impossible to deliver the entire dex, so the diff of the dex is delivered.

The diff of the new and old dex is generated on the server side through the diff algorithm:

Sigma uses the more common BsDiff

Tinker has done more in-depth work and invented a dexdiff algorithm based on the dex structure, which makes your diff difference package smaller and the synthesis efficiency higher.

step

  1. After the server generates the diff of the old and new dex, it will generate a difference package. The difference package will be requested by your patch process and merged into a new dex locally based on the installed dex, that is, restored to a new dex through patch.
  2. Create a new dexclassloader through the new dex, and set this new dexclassloader as the parent of the App's pathclassloader. According to the parent delegation model, what you load is the new dexclassloader, which is the repaired class.
Why should the repair be done in a separate process?
  1. Even if the business process crashes wirelessly, the patch process can fix your problem
  2. The business process may be iterating and merging may cause a crash.
  3. If done in an independent process, it does not depend on the startup of the main process. The startup of other business processes can also be done by pulling up the patch process for unified repair.
important point

Some operations in the PathCore merge core code in the parch process are loaded by PathClassLoader together with Application . If your pathcore calls your business logic without decoupling, then path will load your old business class at this time (by pathclassloader loading), because the parent delegation model's subsequent old business classes are taken from the pathClassLoader cache instead of from the dexclassloader after the merger of your patch process, problems will occur, causing the calling class and the loading class to be inconsistent, so it needs to be reconciled Business decoupling.

That is, if the previous class is accessed before generating a new dex to replace the parent of pathclassloader, then it is loaded by pathClassLoader, which will cause the loaded class to be the old dex. And because of the cache, the classes loaded by pathClassLoader are always taken instead of the classes of dexClassLoader that have been repaired after merging.

Basic common issues

There are some basic and typical problems that need to be solved in the hot repair of dex:

  1. The entrance of the patch and the core business of the patch need to be isolated from the business
  2. Patch merging needs to be done in a separate process
  3. The mapping of each package will change : if there is no intervention in obfuscation, the obfuscation rules of each package will change, so even a small change will lead to a very large difference in the dex of the two packages, so it is necessary to obfuscate. Save the mapping. When building a new package, applying this mapping will keep confusion and consistency and will not cause differences.
  4. The subcontracting results of each package will change : if the APP is large, there will be cross-dex access (for this multi-dex situation, even if you do not modify it, the subcontracting results will be different). Therefore, when building a baseline package, you must also save its subcontracting results (subcontract according to this result when building a new package)
  5. After the patch process completes the patch merging, the main process will immediately black screen or anr when using the patch . The virtual machine will not directly access dex and there is a dexopt stage (done during application installation, this stage will also be done when dynamically loading dex). dexopt is triggered by the system. So the black screen is because your main process directly uses the dynamically loaded dex to trigger dexopt, resulting in a black screen. Therefore, dexopt should be triggered immediately after the patch process merges the new dex.
How to trigger dexopt

Directly create a new dexclassloader manually, and then the virtual machine will do the full amount of dexopt in an independent process (although the dexopt process is placed in an independent patch process, there will still be some anr, and the problems will be listed later)

The impact of Art dex2oat on hot repair

dex2oat is a process for compiling dex. On the art virtual machine, your dex needs to be compiled into machine code before it can be loaded and run by the virtual machine.

dex2oat compilation mode

There are more than a dozen modes in the compilation process, of which only three are of greater concern:

  1. interpret only : This mode is performed during first boot or installation (first boot or installation). Only verification will be done, the code will still be interpreted and executed, and no compilation of machine code will be done. The performance is consistent with the davlik virtual machine
  2. speed : This mode is triggered when new DexClassLoader. Will do full machine code compilation
  3. Speed ​​profile : This mode only compiles the hot code saved in the profile corresponding to your app when the system is doing OAT upgrade or hybrid compilation (there is a background dexopt that will wake up to do dexopt when the system is idle). This part is hot code.

Compile full machine code: In order to improve performance, the ART virtual machine will compile the code in full machine code. This process will be automatically triggered when ClassLoader finds that the odex file does not exist on the incoming opt path when loading the class. Because this is the first time newclassloader has not been compiled before, there is no odex file, so it will be fully compiled.

Solution evolution
  • So if the main process starts, it will directly do full compilation and hang directly.
  • If you perform full compilation in the patch, because the dex2oat process is very long, and some models have to wait for a few minutes to complete, and it takes up a lot of resources, it is possible that your entire apt process will not be completed in time. For example, users always click on the news and watch it for a few seconds before killing it. As a result, you have never been able to complete the optimization and repairs, which may slow down the main process and cause ARN.
  • Tinker's solution: So the patch process first performs lightweight compilation. If it is completed, it will be used. If it cannot be completed, the old application will be made available to users first, and full compilation will be avoided (you are already using the old one, so there is no need to do it) Excessive full-scale compilation may lead to excessive resource usage and business process jams) . If the patch can be used for lightweight compilation, use it. If it cannot be used, avoid full compilation and let users run it first (how to avoid full compilation will be introduced later)

Lightweight compilation is also time-consuming, resulting in a slow first startup. Moreover, after you do lightweight compilation, your independent process is also unlimited. When doing full compilation, it may grab resources and cause the main process to become full and then ANR (the probability is small and Tinker is prepared to ignore it, because the APP performance is good enough)

  • Running the app in the foreground may also cause the patch process to seize resources and cause anr. Therefore, further optimization is carried out on this basis: the patch process first performs lightweight compilation after pulling down the patch , and the main process gives priority to using the lightweight compiled patch. Find the right time to do full compilation (right time: when my APP goes to the background and other apps are in the foreground/lock the screen) and the system does background dexopt when the system is not in use.
Avoid full compilation

There are three options:

  1. Atlas solution: Modify the execution mode of the Art virtual machine on the Native side and directly use the DexFile underlying interface to load the Dex file (affects the dex loading in the same process and DexFile is abandoned in version O and above), which has usability and compatibility issues.
  2. Tbs solution: It is found that if optDir is passed to null in new DexClassLoader, oat_location will be left blank and full compilation will not be done for you (the system on 8.0 will ignore the path you passed in)
  3. Tinker solution: dexopt is a command line for executing a virtual machine, so before your system triggers full compilation, manually call the dex2oat command to execute the compilation mode intercept-only and only do cool compilation. First use the results achieved by your lightweight compilation. The running effect after the first startup or installation is the same as that of the virtual machine. Letting it run first is also an optimization solution after problems occur.

Impact of Android N hybrid compilation on hot fixes

Hybrid compilation: AOT, interpretation, and JIT modes coexist.

There may only be a small part of the classes that users actually use. Why do we have to compile all the code for 20 to 30% of the code? no need

Installation on the Art virtual machine before N required full compilation, so the installation would take a long time, and Jit real-time compilation would be very slow.

Solved this problem on N by shortening the installation time through mixed compilation, and the system OAT upgrade is faster: installation and first startup are done in the interrupt-only way without compilation (the same effect as the davlik virtual machine) , which codes are compiled and when What about compilation: Let’s look at the incremental compilation process on N:

Android N virtual machine incremental compilation process

The virtual machine will collect the running code during the running of the APP code and put it in the profile file. The system will start the BackgroundDexOptService through jobSchedule. This Service will be started when the screen is off/charging. When you go to bed at night or when your phone is idle, a task will be started to compile the collected code (these hot codes are run frequently, so they will be faster). It will be very blocky when it is started later. In this way, the APP can be incrementally compiled. After compilation, base.odex and base will be generated. art (called the image of the App)

The virtual machine thinks this is hot code, so it will load this part of the code for you in advance when your APP starts. Load it into dexcache once when ClassLoader creates ClassLinker

So you just started the Application and already loaded some classes (previously compiled hot code) before doing anything else.

Analysis of the impact of Art hybrid compilation on hot repair in three situations
  • The class to be repaired is not in the appimage: The Dex genre uses parent delegation and is expected to be loaded through the parent. If the class you want to repair happens not to be in the appimage, that is, it has not been loaded in advance, then this mechanism is correct and the patch can take effect.
  • The class to be fixed is partly in appimage: If you have part of it in appimage. As a result, some use new ones and some use old ones. In this way, access will cause address confusion and crash.
  • The class to be repaired is already in appimage: If you are all in appimage, and the ones you repaired have been collected before, then your patch will not take effect.
solution

For devices above N, abandon the mode of setting the parent and directly replace our pathclassloader instead of setting its parent.

Implementation steps
  1. Create DexClassLoader for patch dex
  2. Get loadkedApk through contextimpl and then get the held PathClassLoader object. It is the pathClassloader created by the system for us.
  3. Replace this attribute with the patched classloader via reflection

Principle: Because the system's appimage is loaded in advance into the system's pathClassloader cache . What we subsequently run is the classloader we replaced, so the appimage no longer exists on this new classloader.

Impact: Since appimage no longer exists, performance will be sacrificed, but the purpose of repair can be achieved. Statistically speaking, the impact is very small.

This article is a reprinted article

Original link: Implementation principles of hot repair Class genre and Dex genre - Nuggets (juejin.cn)

Guess you like

Origin blog.csdn.net/m0_65909361/article/details/132939045