[Android Performance Optimization] How to monitor ANR issues?

Preface

ANR stands for Application No Response, and the program does not respond. The Android system is designed with an ANR mechanism, whose purpose is to monitor the timeout of the components it interacts with (Activity, etc.) and user interaction (InputEvent). This can determine whether the application process (main thread) is stuck or responding too slowly.

Compared with Crash, ANR problems have complex causes and are difficult to locate. This article mainly includes the following contents

  1. ANR workflow
  2. How to monitor ANR?
  3. How to locate the cause of ANR?

ANR workflow

There are many times when ANR may be triggered, which can usually be divided into the following aspects:

Insert image description here

The basic principle is actually the idea of ​​WatchDog. If the event sent is not consumed within a certain period of time, ANR will be triggered.

Let’s talk about the overall process here, as shown in the figure below:

Insert image description here

  1. After an ANR occurs, the system will collect a lot of process data and perform stack dumps to generate an ANR Trace file. Among them, the first process to be collected must be the process where ANR occurs.
  2. The system will send the SIGQUIT signal to these application processes, and these application processes will start to perform stack dumps after receiving the signal.
  3. After the application process Dump stack is successful, it communicates with the system process through Socket and writes the Trace file.
  4. After the Trace file is written, if the process where the ANR occurred is the foreground process, the Dialog will pop up, otherwise the process will be killed directly.

How to monitor ANR?

After understanding the workflow of ANR, how can we monitor the occurrence of ANR?

ANR WatchDog detection ideas

Since the cause of ANR is that the input does not respond within a certain period of time, we naturally think of sending a task to the main thread. If it is not executed within a period of time, it is considered that an ANR has occurred.

This idea mainly has the following problems:

  1. Inaccurate, timeout conditions do not necessarily cause ANR, for example, a 5-second timeout is only one of the conditions for an ANR to occur when the TouchEvent is not consumed, and the other conditions are not necessarily 5 seconds.
  2. Missing detection: If the timeout is set to 5 seconds, there is a certain probability of missing detection (the cycle is not synchronized) when detecting the ANR of TouchEvent.

ANR signal monitoring ideas

When introducing the overall ANR process above, we noticed that the SIGQUIT signal will be sent when an ANR occurs. So can we not implement ANR monitoring by listening to this signal? In fact, both XCrash and Matrix implement ANR monitoring in this way.

It should be noted here that by default, the process performs stack dumps and generates ANR Trace files by SignalCatcherlistening to signals. SIGQUITTherefore, after we monitor SIGQUITthe signal, we need to SignalCatchersend it againSIGQUIT

If there is no step to resend the SIGQUIT signal to the SignalCatcher, the Android System Management Service (AMS) will wait for the ANR process to write the stack information. Until the timeout period of 20 seconds is exceeded, AMS will be forced to interrupt and continue the subsequent process. This will cause the ANR pop-up window to be displayed very slowly (because the timeout is 20 seconds), and the complete ANR Trace file cannot be generated in the /data/anr directory.

Handling false positives

When the SIGQUIT signal is monitored, ANR does not necessarily occur.

Matrix's documentation mentions two cases of false positives:

  1. For example, it may be that another process has an ANR, and the process where the ANR occurred is not the only process that needs a stack dump. The system collects stack dumps of many other processes and uses them to generate ANR Trace files.
  2. The signal is sent by the manufacturer or developer itself SIGQUIT. Sending the SIGQUIT signal is actually very easy.

Therefore, we need to perform another check when listening to the signal: before the ANR pop-up window, the process where the ANR occurred will be marked with a NOT_RESPONDING flag, and we can obtain this flag through ActivityManager.

private static boolean checkErrorState() {
   
    
    
    try {
   
    
    
        Application application = sApplication == null ? Matrix.with().getApplication() : sApplication;
        ActivityManager am = (ActivityManager) application.getSystemService(Context.ACTIVITY_SERVICE);

Guess you like

Origin blog.csdn.net/m0_70748458/article/details/130506156