Has your debug package become stuck in Android 14? |Dewu Technology

1. Background

Why is my app so stuck? Who poisoned the code?

One day, I suddenly found that the debug package was running extremely laggy. After the following simple test, I found that there was a problem with the debug package on Android 14.

 

2. Problem troubleshooting records

Routine means of investigation

Used systrace and the internal debug package trace tool dutrace for troubleshooting.

Conclusion: The CPU is idle and the main thread is not obviously blocked. It seems that the pure method execution is time-consuming.

Found doubts

There was no big gain in the first step of troubleshooting, but I found an anomaly when I used the dutrace tool to troubleshoot. Here is a brief introduction to the implementation principle of dutrace:

Dutrace uses inline hook to add atrace points before and after the execution of artmethod and then displays it through the perfetto ui tool. It has the following advantages:

1. Support offline analysis of function execution process and function time-consuming.

2. Under the analysis function call process:

a. You can view the function calls of the entire process (including framework functions);

b. Ability to specify monitored functions and threads to effectively filter useless traces;

c. Dynamic configuration does not require repackaging.

3. You can use ready-made UI analysis tools, including function calls of key system threads, such as rendering time, thread locks, GC time, etc., as well as I/O operations, CPU load and other events.

 

flow chart

When hooking before and after the execution of artmethod, it involves processing three situations of interpretation and execution of the art method.

ART Runtime Interpreter

  1. The C++ interpreter, which is the traditional interpreter based on the switch structure, generally only takes this branch when the debugging environment, method tracing, instructions are not supported, or when an exception occurs in the bytecode (such as failed structured-locking verification).
  2. The mterp fast interpreter, at its core, introduces a handler table for instruction mapping, and implements fast switching between instructions through handwritten assembly, improving interpreter performance.
  3. Nterp is another optimization of Mterp. Nterp eliminates the need for maintenance of managed code stacks, uses the same stack frame structure as the Native method, and the entire decoding and translation execution process is implemented by assembly code, further narrowing the performance gap between the interpreter and compiled code.

I discovered an anomaly here, that is, the interpretation and execution of Android 14 actually uses the switch interpretation and execution method. I re-tested the interpretation and execution methods of several Android versions. Android 12 uses mterp, Android 13 uses nterp, and will only go to switch when debugging. In theory, Android 14 should also use nterp. How come it uses the slowest switch. The following are the methods of versions 12, 13, and 14 in order to execute backtrace.

 

 

 

Check for doubts

I began to suspect that the execution of the interpreter was causing the lag. I looked through the source code
art/runtime/interpreter/mterp/nterp.cc and found that there were indeed changes in it. If it was javaDebuggable, I would not use nterp. Next, try to prove that this problem is caused.

 

 

isJavaDebuggable is controlled by RuntimeDebugState runtime_debug_state_ in runtime.cc. We can find the runtime instance and modify the runtime_debug_state_ attribute through the offset. After looking at the source code, we can also
set it through _ZN3art7Runtime20SetRuntimeDebugStateENS0_17RuntimeDebugStateE.

void Runtime::SetRuntimeDebugState(RuntimeDebugState state) {
  if (state != RuntimeDebugState::kJavaDebuggableAtInit) {
    // We never change the state if we started as a debuggable runtime.
    DCHECK(runtime_debug_state_ != RuntimeDebugState::kJavaDebuggableAtInit);
  }
  runtime_debug_state_ = state;
}

I tried to verify it through the above method. I set the isJavaDebuggable of the test package to false and it still stuck. I set the isJavaDebuggable of the production package to true and it became a little stuck. So I overturned my conjecture that the execution method caused the lag.

Troubleshooting native time-consuming

I suspect that the nativie method is time-consuming to execute. Try using simpleperf again to locate the problem.

Conclusion: Basically, it is time-consuming to explain the stack in the execution code, and there is no other special stack.

 

Targeting

DEBUG_JAVA_DEBUGGABLE

Then think about starting from the source of debuggable and gradually narrowing the scope to locate the influencing variables.

The debuggable in AndroidManifest affects the system process to start a runtimeFlags in our process.


The sixth parameter of the start method in frameworks/base/core/java/android/os/Process.java is runtimeFlags. If it is debuggableFlag, runtimeFlags will be added with the following flags. Then narrow the label range first.

 if (debuggableFlag) {
                runtimeFlags |= Zygote.DEBUG_ENABLE_JDWP;
                runtimeFlags |= Zygote.DEBUG_ENABLE_PTRACE;
                runtimeFlags |= Zygote.DEBUG_JAVA_DEBUGGABLE;
                // Also turn on CheckJNI for debuggable apps. It's quite
                // awkward to turn on otherwise.
                runtimeFlags |= Zygote.DEBUG_ENABLE_CHECKJNI;


                // Check if the developer does not want ART verification
                if (android.provider.Settings.Global.getInt(mService.mContext.getContentResolver(),
                        android.provider.Settings.Global.ART_VERIFIER_VERIFY_DEBUGGABLE, 1) == 0) {
                    runtimeFlags |= Zygote.DISABLE_VERIFIER;
                    Slog.w(TAG_PROCESSES, app + ": ART verification disabled");
                }
            }

We need to modify the startup parameters of our process. Then you need to hook the system process. This involves rooting the phone, installing some operations of the hook framework, and then making some parameter modifications through the start of the hook process.

hookAllMethods(
        Process.class,
        "start",
        new XC_MethodHook() {
            @Override
            protected void beforeHookedMethod(MethodHookParam param) throws Throwable {
                final String niceName = (String) param.args[1];
                final int uid = (int) param.args[2];
                final int runtimeFlags = (int) param.args[5];
                XposedBridge.log("process_xx " + runtimeFlags);
                if (isDebuggable(niceName, user)) {
                    param.args[5] = runtimeFlags&~DEBUG_JAVA_DEBUGGABLE;
                    XposedBridge.log("process_xx " + param.args[5]);


                }
            }
        }
);

This time there were some obvious results. The test package runtimeflags is no longer stuck after removing DEBUG_JAVA_DEBUGGABLE. The production package, including the applications on the application market, all became stuck after adding the DEBUG_JAVA_DEBUGGABLE mark. Then it can be proved that it is caused by the variable DEBUG_JAVA_DEBUGGABLE.

Targeting

DeoptimizeBootImage

Continue to source code to observe the impact of DEBUG_JAVA_DEBUGGABLE.

if ((runtime_flags & DEBUG_JAVA_DEBUGGABLE) != 0) {
    runtime->AddCompilerOption("--debuggable");
    runtime_flags |= DEBUG_GENERATE_MINI_DEBUG_INFO;
    runtime->SetRuntimeDebugState(Runtime::RuntimeDebugState::kJavaDebuggableAtInit);
    {
      // Deoptimize the boot image as it may be non-debuggable.
      ScopedSuspendAll ssa(__FUNCTION__);
      runtime->DeoptimizeBootImage();
    }
    runtime_flags &= ~DEBUG_JAVA_DEBUGGABLE;
    needs_non_debuggable_classes = true;
  }

The logic here is the impact of DEBUG_JAVA_DEBUGGABLE, and SetRuntimeDebugState has been tested before. It's not
the impact of DEBUG_GENERATE_MINI_DEBUG_INFO. It's runtime->DeoptimizeBootImage()? So I used the package with debugable as false to actively call the DeoptimizeBootImage method through _ZN3art7Runtime19DeoptimizeBootImageEv, and then it reproduced!

Cause Analysis

DeoptimizeBootImage converts the AOT code method in bootImage into java debuggable. Reinitialize the method entry point and walk to interpreted execution without using AOT code. Tracing back to
the Instrumentation::InitializeMethodsCode method, we still reach the point of CanUseNterp(method) CanRuntimeUseNterp. Also, Android 13 can use nterp, and Android 14 can only use switch.

I hooked the code again and asked CanRuntimeUseNterp to directly return true, but it still stuck. I found that even if I hooked it. The following methods still go to switch interpretation and execution. Thinking about it the other way around, it's because my hook has lagged behind and DeoptimizeBootImage has been executed. When the basic method is called, the switch is executed.

 

I used the Android 13 debugable true package for testing, first hooked CanRuntimeUseNterp return false, and then executed DeoptimizeBootImage, and the lag reappeared.

Preliminary positioning: The method in the bootimage is nterp in Android 13 and the switch method in Android 14. The method in the bootimage is very basic and fragmented, so the execution of the switch method is seriously time-consuming.

Verification is a system issue

If it is a system problem, then everyone should encounter it, not only our app has this problem, so I found a few friends to help verify the problem with the debug package. Sure enough, they all have this problem. The experience of installing the same package on Android 14 and Android 13 is completely inconsistent.

Feedback question

Someone has reported on issuetracker that the android 14 debug package is slow
https://issuetracker.google.com/issues/311251587. But there was no result yet, so I made up for the problem I identified.

 

By the way, I also raised an issue
https://issuetracker.google.com/issues/328477628

3. Temporary solution

While waiting for Google's reply, I am also thinking about how the App layer can avoid this problem and make the experience of the debug package return to smoothness, such as how to re-optimize the method in the bootimage. With this idea in mind, I studied the art code again and found that Android 14 added a new
UpdateEntrypointsForDebuggable method. This method will reset the execution method of the method according to the rules, such as aot and nterp. Then I hooked CanRuntimeUseNterp before returning. True If you call UpdateEntrypointsForDebuggable again, won't you go to nterp again?

void Instrumentation::UpdateEntrypointsForDebuggable() {
  Runtime* runtime = Runtime::Current();
  // If we are transitioning from non-debuggable to debuggable, we patch
  // entry points of methods to remove any aot / JITed entry points.
  InstallStubsClassVisitor visitor(this);
  runtime->GetClassLinker()->VisitClasses(&visitor);
}

I tried it according to the above idea, and it became much smoother! ! !

In fact, there are still some remaining problems with the above solution. Compared with the package with debugable set to false, there is still some lag. I also discovered that the methods in bootImage have gone to nterp, but most of the code in the apk still went to switch interpretation and execution, so I changed my mind.
Is it okay if I set RuntimeDebugState to non-debugable before calling UpdateEntrypointsForDebuggable, and then set RuntimeDebugState to debugable after calling UpdateEntrypointsForDebuggable? The final code is as follows. The hook framework uses https://github.com/bytedance/android-inline-hook.

Java_test_ArtMethodTrace_bootImageNterp(JNIEnv *env,
                                                      jclass clazz) {
    void *handler = shadowhook_dlopen("libart.so");
    instance_ = static_cast<void **>(shadowhook_dlsym(handler, "_ZN3art7Runtime9instance_E"));
    jobject
    (*getSystemThreadGroup)(void *runtime) =(jobject (*)(void *runtime)) shadowhook_dlsym(handler,
                                                                                          "_ZNK3art7Runtime20GetSystemThreadGroupEv");
    void
    (*UpdateEntrypointsForDebuggable)(void *instrumentation) = (void (*)(void *i)) shadowhook_dlsym(
            handler,
            "_ZN3art15instrumentation15Instrumentation30UpdateEntrypointsForDebuggableEv");
    if (getSystemThreadGroup == nullptr || UpdateEntrypointsForDebuggable == nullptr) {
        LOGE("getSystemThreadGroup  failed ");
        shadowhook_dlclose(handler);
        return;
    }
    jobject thread_group = getSystemThreadGroup(*instance_);
    int vm_offset = findOffset(*instance_, 0, 4000, thread_group);
    if (vm_offset < 0) {
        LOGE("vm_offset not found ");
        shadowhook_dlclose(handler);
        return;
    }
    void (*setRuntimeDebugState)(void *instance_, int r) =(void (*)(void *runtime,
                                                                    int r)) shadowhook_dlsym(
            handler, "_ZN3art7Runtime20SetRuntimeDebugStateENS0_17RuntimeDebugStateE");
    if (setRuntimeDebugState != nullptr) {
        setRuntimeDebugState(*instance_, 0);
    }
    void *instrumentation = reinterpret_cast<void *>(reinterpret_cast<char *>(*instance_) +
                                                     vm_offset - 368 );

    UpdateEntrypointsForDebuggable(instrumentation);
    setRuntimeDebugState(*instance_, 2);
    shadowhook_dlclose(handler);
    LOGE("bootImageNterp success");


}

4. Finally

Recently I also saw an article by a Qualcomm engineer on the community. He made a more detailed analysis based on the problem I identified and confirmed that Google will fix this problem on Android 15. If it is an overseas version of Android 14 devices, Google plans to fix this issue through an update to the com.android.artapex module. However, due to network problems in China, Google's push cannot work, so each mobile phone manufacturer needs to actively incorporate these two changes. [1]

If you need to temporarily solve the problem of stuck debugable packages, you can also solve it through the above method.

 

Reference article:

[1] https://juejin.cn/post/7353106089296789556

 

*Text/ Wuyou

 

This article is original to Dewu Technology. For more exciting articles, please see: Dewu Technology official website

 

Reprinting without the permission of Dewu Technology is strictly prohibited, otherwise legal liability will be pursued according to law!

Linus took matters into his own hands to prevent kernel developers from replacing tabs with spaces. His father is one of the few leaders who can write code, his second son is the director of the open source technology department, and his youngest son is a core contributor to open source. Huawei: It took 1 year to convert 5,000 commonly used mobile applications Comprehensive migration to Hongmeng Java is the language most prone to third-party vulnerabilities. Wang Chenglu, the father of Hongmeng: open source Hongmeng is the only architectural innovation in the field of basic software in China. Ma Huateng and Zhou Hongyi shake hands to "remove grudges." Former Microsoft developer: Windows 11 performance is "ridiculously bad " " Although what Laoxiangji is open source is not the code, the reasons behind it are very heartwarming. Meta Llama 3 is officially released. Google announces a large-scale restructuring
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5783135/blog/11054175