[Android] Android ANR generation process and analysis method

foreword

The Android ANR problem has always been a relatively difficult problem to solve. Firstly, it is difficult to reproduce, and secondly, it is not easy to analyze after recurrence. This article sorts out the process of ANR generation and how to locate the cause of ANR to get the log file. In fact, ANR online monitoring is also quite tricky. After reading this article, let’s look at some ANR terminal monitoring solutions (such as WeChat Matrix) and maybe have a clearer idea.

When ANR appears as shown below:
insert image description here

What is ANR

ANR means that the application has not responded for a long time, and a pop-up window will pop up on the interface (as shown above). It is not a Runtime Exception and cannot be caught by catch. And it is popped up by the System Server process, so the app process cannot perceive it (it can be perceived at the native layer, which will be analyzed below). When ANR occurs, the System Server process will print logs in Logcat, and output more detailed logs to the /data/anrdirectory and store them in the form of files (generally called ANR trace files).

ANR causes

Android ANR is generally caused by the following reasons:

Service Timeout : The foreground service is not completed within 20s, and the background service is not completed within 200s;
BroadcastQueue Timeout : The foreground broadcast is completed within 10s, and the background is 60s;
ContentProvider Timeout : The Provider release timeout is 10s;
InputDispatching Timeout : The input event processing timeout is 5s, including Key and touch events.

The more common scenario is the fourth one, that is, the input event response timeout, mainly touch events. Why does it time out? Generally, it is because the main thread is blocked for some reasons, such as time-consuming tasks, complex calculations, deadlocks, sleep and so on.

ANR generation process

Service Timeout indicates that the Service component lifecycle functions such as onCreate process timeout. The following example illustrates how the onCreate lifecycle function of Service generates AAR.

onCreate is called after startSerice, so startService. The following code is based on Android SDK 29.
The process is as follows:

Context.startService
ContextImpl.startService
ActivityManagerService.startService
ActiveServices.startServiceLocked
ActiveServices.startServiceInnerLocked
ActiveServices.bringUpServiceLocked
ActiveServices.realStartServiceLocked

Then focus on ActiveServices.realStartServiceLockedthe code of the function:

  private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException {
    
    
		...
		//这个函数会发送一条延时20秒的消息
        bumpServiceExecutingLocked(r, execInFg, "create");
   		...
        try {
    
    
            ...
           //通知app进程创建Service:这里面会调用onCreate生命周期函数
            app.thread.scheduleCreateService(r, r.serviceInfo,
  		    ...
  		 } catch (DeadObjectException e) {
    
    
		 ...

bumpServiceExecutingLockedSend delay message function:

 private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
    
    
  ...
   scheduleServiceTimeoutLocked(r.app);
   ...
}

Continue to look at scheduleServiceTimeoutLockedthe function:

    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    
    
        //获取延时消息
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
        //发送延时消息,前台服务是20秒,后台是200秒
        mAm.mHandler.sendMessageDelayed(msg,
                proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
    }

Regarding the definition of constant SERVICE_TIMEOUTsum SERVICE_BACKGROUND_TIMEOUT:

    //路径:com.android.server.am.ActiveServices.java
    
   // How long we wait for a service to finish executing.
    static final int SERVICE_TIMEOUT = 20*1000;

    // How long we wait for a service to finish executing.
    static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
    

It can be seen that the ANR time of the foreground service is 20 seconds, and the ANR time of the background service is 10 times that is 200 seconds.

Continue to see how the ActivityThread of the app process handles creating Service tasks:

  private void handleCreateService(CreateServiceData data) {
    
    
        Service service = null;
        try {
    
    
            java.lang.ClassLoader cl = packageInfo.getClassLoader();
            service = packageInfo.getAppFactory()
                    .instantiateService(cl, data.info.name, data.intent);
        } catch (Exception e) {
    
    
      ...
        }

        try {
    
    
           ...
            ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
            context.setOuterContext(service);

            Application app = packageInfo.makeApplication(false, mInstrumentation);
            service.attach(context, this, data.info.name, data.token, app,
                    ActivityManager.getService());
            //重点:调用生命周期函数onCreate
            service.onCreate();
            mServices.put(data.token, service);
            try {
    
    
            //重点:通知AMS Service创建完成,会清除handler里的延时消息
                ActivityManager.getService().serviceDoneExecuting(
                        data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
            } catch (RemoteException e) {
    
    
                throw e.rethrowFromSystemServer();
            }
            ,,,

service.onCreateAfter that, the AMS service will be notified that the creation is complete.

ActivityManagerService.serviceDoneExecutingThe method will go to ActiveServices
serviceDoneExecutingLocked:

  private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,boolean finishing) {
    
    
 
	... 
	//移除之前发送的延时消息
	mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
	...
 }

It can be seen that if the service's life cycle function onCreate is completed within 20, the delayed message will be cleared from the handler, and the message will not be executed.

If the Service's life cycle function onCreate has not finished within 20, the previously sent delay message will be executed.

This message is the message that handles ANR.

Regarding the delayed message: SERVICE_TIMEOUT_MSG, the processing of MainHandler is as follows:

//com.android.server.am.ActivityManagerService
final class MainHandler extends Handler {
    
    
        public MainHandler(Looper looper) {
    
    
            super(looper, null, true);
        }

        @Override
        public void handleMessage(Message msg) {
    
    
            switch (msg.what) {
    
    
         	...
            case SERVICE_TIMEOUT_MSG: {
    
    
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            ...
         	}
         	...
         }
 }

mServices.serviceTimeoutThe implementation is as follows:

  void serviceTimeout(ProcessRecord proc) {
    
    
	  ...
      proc.appNotResponding(null, null, null, null, false, 
      ...
  }

So when ANR occurs in Service, it will go to ProcessRecord.appNotRespondingthe function.

After analysis of other types of ANR, they will also go to ProcessRecord.appNotRespondingfunctions, such as input event timeout:

//com.android.server.am.ActivityManagerService
   /**
     * Handle input dispatching timeouts.
     * @return whether input dispatching should be aborted or not.
     */
    boolean inputDispatchingTimedOut(ProcessRecord proc, String activityShortComponentName,
            ApplicationInfo aInfo, String parentShortComponentName,
            WindowProcessController parentProcess, boolean aboveSystem, String reason) {
    
    
        if (checkCallingPermission(FILTER_EVENTS) != PackageManager.PERMISSION_GRANTED) {
    
    
            throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
    
    
            annotation = "Input dispatching timed out";
        } else {
    
    
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
    
    
            synchronized (this) {
    
    
                if (proc.isDebugging()) {
    
    
                    return false;
                }

                if (proc.getActiveInstrumentation() != null) {
    
    
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            //输入事件超时同样也会走到ProcessRecord.appNotResponding
            proc.appNotResponding(activityShortComponentName, aInfo,
                    parentShortComponentName, parentProcess, aboveSystem, annotation);
        }

        return true;
    }

The timeout processing flow of input events, broadcasts, and providers will not be analyzed one by one.

So ProcessRecord.appNotRespondingthis function leads to the same goal, and all types of ANR will eventually go here.

Handle ANRs

The process of processing ANR is divided into the following steps:

收集需要dump堆栈的进程id
分别通知这些进程开始dump线程堆栈-输出到/data/anr目录下
打印Logcat日志
前台进程弹出ANR弹窗/后台进程不弹

The detailed process is as follows:

//com.android.server.am.ProcessRecord
   void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
          String parentShortComponentName, WindowProcessController parentProcess,
          boolean aboveSystem, String annotation) {
    
    
       //收集需要dump堆栈的进程id,分为firstPids、lastPids和nativeProcs
      ArrayList<Integer> firstPids = new ArrayList<>(5);
      SparseArray<Boolean> lastPids = new SparseArray<>(20);

      synchronized (mService) {
    
    
   		...
          // In case we come through here for the same app before completing
          // this one, mark as anring now so we will bail out.
          setNotResponding(true);

          // Dump thread traces as quickly as we can, starting with "interesting" processes.
          firstPids.add(pid);

          // Don't dump other PIDs if it's a background ANR
          if (!isSilentAnr()) {
    
    
              int parentPid = pid;
              if (parentProcess != null && parentProcess.getPid() > 0) {
    
    
                  parentPid = parentProcess.getPid();
              }
              if (parentPid != pid) firstPids.add(parentPid);

              if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);

              for (int i = getLruProcessList().size() - 1; i >= 0; i--) {
    
    
                  ProcessRecord r = getLruProcessList().get(i);
                  if (r != null && r.thread != null) {
    
    
                      int myPid = r.pid;
                      if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {
    
    
                          if (r.isPersistent()) {
    
    
                              firstPids.add(myPid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                          } else if (r.treatLikeActivity) {
    
    
                              firstPids.add(myPid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                          } else {
    
    
                              lastPids.put(myPid, Boolean.TRUE);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                          }
                      }
                  }
              }
          }
      }
  	//开始组装logcat日志
      // Log the ANR to the main log.
      StringBuilder info = new StringBuilder();
      info.setLength(0);
      info.append("ANR in ").append(processName);
      if (activityShortComponentName != null) {
    
    
          info.append(" (").append(activityShortComponentName).append(")");
      }
      info.append("\n");
      info.append("PID: ").append(pid).append("\n");
      if (annotation != null) {
    
    
          info.append("Reason: ").append(annotation).append("\n");
      }
      if (parentShortComponentName != null
              && parentShortComponentName.equals(activityShortComponentName)) {
    
    
          info.append("Parent: ").append(parentShortComponentName).append("\n");
      }

      ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

  	//收集需要dump的native进程id
      // don't dump native PIDs for background ANRs unless it is the process of interest
      String[] nativeProcs = null;
      if (isSilentAnr()) {
    
    
          for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
    
    
              if (NATIVE_STACKS_OF_INTEREST[i].equals(processName)) {
    
    
                  nativeProcs = new String[] {
    
     processName };
                  break;
              }
          }
      } else {
    
    
          nativeProcs = NATIVE_STACKS_OF_INTEREST;
      }

      int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
      ArrayList<Integer> nativePids = null;

      if (pids != null) {
    
    
          nativePids = new ArrayList<>(pids.length);
          for (int i : pids) {
    
    
              nativePids.add(i);
          }
      }
  	//重点:开始dump堆栈
      // For background ANRs, don't pass the ProcessCpuTracker to
      // avoid spending 1/2 second collecting stats to rank lastPids.
      File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
              (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
              nativePids);

      String cpuInfo = null;
      if (isMonitorCpuUsage()) {
    
    
          mService.updateCpuStatsNow();
          synchronized (mService.mProcessCpuTracker) {
    
    
              cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
          }
          info.append(processCpuTracker.printCurrentLoad());
          info.append(cpuInfo);
      }

      info.append(processCpuTracker.printCurrentState(anrTime));
      
  	  //输出日志到Logcat
      Slog.e(TAG, info.toString());
      if (tracesFile == null) {
    
    
          // There is no trace file, so dump (only) the alleged culprit's threads to the log
          Process.sendSignal(pid, Process.SIGNAL_QUIT);
      }

      synchronized (mService) {
    
    
  		...
  		//后台进程直接杀死,不弹ANR
          if (isSilentAnr() && !isDebugging()) {
    
    
              kill("bg anr", true);
              return;
          }
          //给app进程设置一个ANR状态
          // Set the app's notResponding state, and look up the errorReportReceiver
          makeAppNotRespondingLocked(activityShortComponentName,
                  annotation != null ? "ANR " + annotation : "ANR", info.toString());

          // mUiHandler can be null if the AMS is constructed with injector only. This will only
          // happen in tests.
          //开始弹出ANR弹窗
          if (mService.mUiHandler != null) {
    
    
              // Bring up the infamous App Not Responding dialog
              Message msg = Message.obtain();
              msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
              msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);

              mService.mUiHandler.sendMessage(msg);
          }
      }
  }

Continue to see ActivityManagerServicehow to dump the stack:

  File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
                (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
                nativePids);

ActivityManagerService.dumpStackTracesfunction:

//com.android.server.am.ActivityManagerService
 public static File dumpStackTraces(ArrayList<Integer> firstPids,
           ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids,
           ArrayList<Integer> nativePids) {
    
    
       ArrayList<Integer> extraPids = null;

       Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids);

       // Measure CPU usage as soon as we're called in order to get a realistic sampling
       // of the top users at the time of the request.
       if (processCpuTracker != null) {
    
    
           processCpuTracker.init();
           try {
    
    
               Thread.sleep(200);
           } catch (InterruptedException ignored) {
    
    
           }

           processCpuTracker.update();
   		...
   		//创建ANR的输出文件:ANR_TRACE_DIR = "/data/anr";
       final File tracesDir = new File(ANR_TRACE_DIR);
       // Each set of ANR traces is written to a separate file and dumpstate will process
       // all such files and add them to a captured bug report if they're recent enough.
       maybePruneOldTraces(tracesDir);

       // NOTE: We should consider creating the file in native code atomically once we've
       // gotten rid of the old scheme of dumping and lot of the code that deals with paths
       // can be removed.
       File tracesFile = createAnrDumpFile(tracesDir);
       if (tracesFile == null) {
    
    
           return null;
       }
   	//文件创建完毕,开始dump
       dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids);
       return tracesFile;
   }

ActivityManagerService.dumpStackTraces:

 //com.android.server.am.ActivityManagerService
 public static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
            ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) {
    
    

        Slog.i(TAG, "Dumping to " + tracesFile);

        // We don't need any sort of inotify based monitoring when we're dumping traces via
        // tombstoned. Data is piped to an "intercept" FD installed in tombstoned so we're in full
        // control of all writes to the file in question.

        // We must complete all stack dumps within 20 seconds.
        long remainingTime = 20 * 1000;

        // First collect all of the stacks of the most important pids.
        if (firstPids != null) {
    
    
            int num = firstPids.size();
            for (int i = 0; i < num; i++) {
    
    
                Slog.i(TAG, "Collecting stacks for pid " + firstPids.get(i));
                final long timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile,
                                                                remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
                           "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                }
            }
        }

        // Next collect the stacks of the native pids
        if (nativePids != null) {
    
    
            for (int pid : nativePids) {
    
    
                Slog.i(TAG, "Collecting stacks for native pid " + pid);
                final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);

                final long start = SystemClock.elapsedRealtime();
                Debug.dumpNativeBacktraceToFileTimeout(
                        pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
                final long timeTaken = SystemClock.elapsedRealtime() - start;

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
                        "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
                }
            }
        }

        // Lastly, dump stacks for all extra PIDs from the CPU tracker.
        if (extraPids != null) {
    
    
            for (int pid : extraPids) {
    
    
                Slog.i(TAG, "Collecting stacks for extra pid " + pid);

                final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
                            "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
                }
            }
        }
        Slog.i(TAG, "Done dumping");
    }

可见,dump trace用了两个函数:
dumpJavaTracesTombstonedDebug.dumpNativeBacktraceToFileTimeout,分别是Java层和native层的。Native层是直接调用android.os.Debug类处理。Java层调用dumpJavaTracesTombstoned处理。先看下Java层。

ActivityManagerService.dumpJavaTracesTombstoned:

 /**
     * Dump java traces for process {@code pid} to the specified file. If java trace dumping
     * fails, a native backtrace is attempted. Note that the timeout {@code timeoutMs} only applies
     * to the java section of the trace, a further {@code NATIVE_DUMP_TIMEOUT_MS} might be spent
     * attempting to obtain native traces in the case of a failure. Returns the total time spent
     * capturing traces.
     */
    private static long dumpJavaTracesTombstoned(int pid, String fileName, long timeoutMs) {
    
    
        final long timeStart = SystemClock.elapsedRealtime();
        boolean javaSuccess = Debug.dumpJavaBacktraceToFileTimeout(pid, fileName,
                (int) (timeoutMs / 1000));
        if (javaSuccess) {
    
    
            // Check that something is in the file, actually. Try-catch should not be necessary,
            // but better safe than sorry.
            try {
    
    
                long size = new File(fileName).length();
                if (size < JAVA_DUMP_MINIMUM_SIZE) {
    
    
                    Slog.w(TAG, "Successfully created Java ANR file is empty!");
                    javaSuccess = false;
                }
            } catch (Exception e) {
    
    
                Slog.w(TAG, "Unable to get ANR file size", e);
                javaSuccess = false;
            }
        }
        if (!javaSuccess) {
    
    
            Slog.w(TAG, "Dumping Java threads failed, initiating native stack dump.");
            if (!Debug.dumpNativeBacktraceToFileTimeout(pid, fileName,
                    (NATIVE_DUMP_TIMEOUT_MS / 1000))) {
    
    
                Slog.w(TAG, "Native stack dump failed!");
            }
        }

        return SystemClock.elapsedRealtime() - timeStart;
    }

又调用了 Debug.dumpJavaBacktraceToFileTimeout处理dump。

看下Debug类:

//android.os.Debug
  /**
     * Append the Java stack traces of a given native process to a specified file.
     *
     * @param pid pid to dump.
     * @param file path of file to append dump to.
     * @param timeoutSecs time to wait in seconds, or 0 to wait forever.
     * @hide
     */
    public static native boolean dumpJavaBacktraceToFileTimeout(int pid, String file,
                                                                int timeoutSecs);

    /**
     * Append the native stack traces of a given process to a specified file.
     *
     * @param pid pid to dump.
     * @param file path of file to append dump to.
     * @param timeoutSecs time to wait in seconds, or 0 to wait forever.
     * @hide
     */
    public static native boolean dumpNativeBacktraceToFileTimeout(int pid, String file,
                                                                  int timeoutSecs);

所以Dump trace最终还是调用android.os.Debug类的这两个函数:
dumpJavaBacktraceToFileTimeoutdumpNativeBacktraceToFileTimeout

这两个方法是native修饰的,因此需要去看下android源码。

注意这两个方法是加了@hide标记,app侧不能调用。

Native层如何dump trace

在Android源码中搜索dumpJavaBacktraceToFileTimeout这个函数对应的c++代码,找到了frameworks/base/core/jni/android_os_Debug.cpp,对应函数的实现:

frameworks/base/core/jni/android_os_Debug.cpp

static jboolean android_os_Debug_dumpJavaBacktraceToFileTimeout(JNIEnv* env, jobject clazz,
        jint pid, jstring fileName, jint timeoutSecs) {
    
    
    const bool ret = dumpTraces(env, pid, fileName, timeoutSecs, kDebuggerdJavaBacktrace);
    return ret ? JNI_TRUE : JNI_FALSE;
}

跟踪到了system/core/debuggerd/client/debuggerd_client.cppdebuggerd_trigger_dump方法:

bool debuggerd_trigger_dump(pid_t tid, DebuggerdDumpType dump_type, unsigned int timeout_ms,
                            unique_fd output_fd) {
    
    
     ...
 	// Send the signal.
  	const int signal = (dump_type == kDebuggerdJavaBacktrace) ? SIGQUIT 	: BIONIC_SIGNAL_DEBUGGER;
  	sigval val = {
    
    .sival_int = (dump_type == kDebuggerdNativeBacktrace) ? 1 : 0};
  	if (sigqueue(pid, signal, val) != 0) {
    
    
   	 log_error(output_fd, errno, "failed to send signal to pid %d", pid);
    	return false;
 	 }
 	 ...
  }

这个函数里面会通过sigqueue函数(bionic/libc/bionic/signal.cpp)给目标进程发送一个SIGQUIT信号。

继续看接收SIGQUIT信号的地方。

每一个app进程都会有一个SignalCatcher线程,专门处理SIGQUIT信号,来到art/runtime/signal_catcher.cc:

void* SignalCatcher::Run(void* arg) {
    
    
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  ...
  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    
    
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
    
    
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    
    
    case SIGQUIT:
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

监听到SIGQUIT信号后交给了HandleSigQuit函数处理:

void SignalCatcher::HandleSigQuit() {
    
    
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";

  DumpCmdLine(os);

  // Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by
  // debuggerd. This allows, for example, the stack tool to work.
  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";

  runtime->DumpForSigQuit(os);

  if ((false)) {
    
    
    std::string maps;
    if (android::base::ReadFileToString("/proc/self/maps", &maps)) {
    
    
      os << "/proc/self/maps:\n" << maps;
    }
  }
  os << "----- end " << getpid() << " -----\n";
  Output(os.str());
}

中间调用art/runtime/runtime.cc的DumpForSigQuit方法收集了更多详细的信息,包括线程堆栈。

void Runtime::DumpForSigQuit(std::ostream& os) {
    
    
  // Print backtraces first since they are important do diagnose ANRs,
  // and ANRs can often be trimmed to limit upload size.
  thread_list_->DumpForSigQuit(os);
  GetClassLinker()->DumpForSigQuit(os);
  GetInternTable()->DumpForSigQuit(os);
  GetJavaVM()->DumpForSigQuit(os);
  GetHeap()->DumpForSigQuit(os);
  oat_file_manager_->DumpForSigQuit(os);
  if (GetJit() != nullptr) {
    
    
    GetJit()->DumpForSigQuit(os);
  } else {
    
    
    os << "Running non JIT\n";
  }
  DumpDeoptimizations(os);
  TrackedAllocators::Dump(os);
  GetMetrics()->DumpForSigQuit(os);
  os << "\n";

  BaseMutex::DumpAll(os);

  // Inform anyone else who is interested in SigQuit.
  {
    
    
    ScopedObjectAccess soa(Thread::Current());
    callbacks_->SigQuit();
  }
}

ANR打印的信息比较多,详细请参阅相关源码。

到这里已经分析完了整个ANR从发生到打印的流程。

ANR分析方法

现在已经知道了ANR是怎么回事了,现在看下发生了ANR是如何定位原因的。
上文已经讲到发生ANR会在两个地方打印日志,一个是在Logcat里打印,一个是在/data/anr/目录下的trace文件里打印。

下面模拟两个场景复现ANR,一个场景是耗时操作导致ANR,一个是死锁导致ANR。

场景1:耗时操作导致ANR

为了方便,就让主线程休眠10s。

在Activity界面上有一个按钮,点击会让主线程休眠10s,代码如下,显然会发生ANR。

class AnrTestActivity : AppCompatActivity() {
    
    
    override fun onCreate(savedInstanceState: Bundle?) {
    
    
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_anr_test)
        this.findViewById<Button>(R.id.button).setOnClickListener{
    
    
            SystemClock.sleep(10000)
        }
    }

连续点击两次,5s之后会弹出ANR弹窗。
insert image description here
Logcat输出日志如下:

2022-10-02 15:38:00.505 594-5381/system_process E/ActivityManager: ANR in com.devnn.demo (com.devnn.demo/.AnrTestActivity)
    PID: 5232
    Reason: Input dispatching timed out (f99e8bb com.devnn.demo/com.devnn.demo.AnrTestActivity (server) is not responding. Waited 5008ms for MotionEvent(deviceId=8, source=0x00005002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=22.8, yPrecision=12.8, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (804.9, 1173.9)]), policyFlags=0x62000000)
    Parent: com.devnn.demo/.AnrTestActivity
    Load: 0.05 / 0.01 / 0.0
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    ----- End output from /proc/pressure/memory -----
    
    CPU usage from 158257ms to 0ms ago (2022-10-02 15:35:18.256 to 2022-10-02 15:37:56.513):
      6.2% 279/[email protected]: 0.3% user + 5.9% kernel
      2.2% 292/[email protected]: 0% user + 2.1% kernel
      1.6% 594/system_server: 0.3% user + 1.3% kernel / faults: 1085 minor
      1.4% 300/[email protected]: 0% user + 1.4% kernel
      0.4% 277/android.hardware.audio.service.ranchu: 0% user + 0.4% kernel / faults: 10 minor
      0.2% 371/audioserver: 0% user + 0.2% kernel / faults: 4 minor
      0.2% 5232/com.devnn.demo: 0% user + 0.2% kernel / faults: 272 minor
      0.2% 318/surfaceflinger: 0% user + 0.2% kernel
      0% 16/ksoftirqd/1: 0% user + 0% kernel
      0% 365/adbd: 0% user + 0% kernel
      0% 477/llkd: 0% user + 0% kernel
      0% 872/[email protected]: 0% user + 0% kernel
      0% 10/rcu_preempt: 0% user + 0% kernel
      0% 2014/com.android.systemui: 0% user + 0% kernel / faults: 39 minor
      0% 9/ksoftirqd/0: 0% user + 0% kernel
      0% 1002/com.android.phone: 0% user + 0% kernel / faults: 100 minor
      0% 3645/kworker/0:2-events_power_efficient: 0% user + 0% kernel
      0% 157/logd: 0% user + 0% kernel
      0% 427/libgoldfish-rild: 0% user + 0% kernel / faults: 16 minor
      0% 3270/kworker/1:1-mm_percpu_wq: 0% user + 0% kernel
      0% 159/servicemanager: 0% user + 0% kernel
      0% 160/hwservicemanager: 0% user + 0% kernel
      0% 478/hostapd_nohidl: 0% user + 0% kernel
      0% 5346/kworker/u4:0-events_unbound: 0% user + 0% kernel
      0% 11/migration/0: 0% user + 0% kernel
      0% 15/migration/1: 0% user + 0% kernel
      0% 164/qemu-props: 0% user + 0% kernel
      0% 188/jbd2/dm-5-8: 0% user + 0% kernel
      0% 269/statsd: 0% user + 0% kernel
      0% 342/logcat: 0% user + 0% kernel
      0% 418/media.metrics: 0% user + 0% kernel / faults: 1 minor
      0% 442/[email protected]: 0% user + 0% kernel
      0% 761/wpa_supplicant: 0% user + 0% kernel
      0% 3615/logcat: 0% user + 0% kernel
      0% 5068/kworker/u4:1-phy0: 0% user + 0% kernel
    1.9% TOTAL: 0.1% user + 1.7% kernel + 0% softirq
    CPU usage from 20ms to 335ms later (2022-10-02 15:37:56.533 to 2022-10-02 15:37:56.848):
      22% 594/system_server: 15% user + 7.5% kernel / faults: 161 minor
        22% 5381/AnrConsumer: 7.5% user + 15% kernel
      6.9% 279/[email protected]: 0% user + 6.9% kernel
        6.9% 1215/[email protected]: 0% user + 6.9% kernel
      3.5% 292/[email protected]: 0% user + 3.5% kernel
    18% TOTAL: 8.6% user + 10% kernel

注意需要选中system_process进程。

从Logcat日志可以看出来,是进程id=5323的处理输入事件超时了。这个日志也是上文分析的ProcessRecord.appNotResponding方法打印出来的。

下面看下/data/anr/目录下的日志内容是怎么样的。

整个trace文件就代表发生一次ANR的日志。每发生一次ANR就会生成新的trace文件,trace文件名称以时间命名的。
insert image description here
整个trace文件是有结构的,它整体上是以进程为单位进行打印的。

由于发生ANR不一定是app进程导致的,可能是其它关联进程导致的,所以它把相关进程的信息都打印在同一个文件里了。基本上是以下面这个结构打印的。

----- pid 5232 at 2022-10-02 15:37:56 -----
进程5232的详细日志
----- end 5232 -----

----- pid 594 at 2022-10-02 15:37:57 -----
进程594的详细日志
----- end 594 -----

----- pid xxx at xxxx-xx-xx xx:xx:xx -----
进程xxx的详细日志
----- end xxx -----

第一个进程就是发生ANR的进程,一般是app进程。

由于内容过长,整个trace文件有700多KB,下面就截取app进程的主要信息。

每个进程信息的开头是它的概要信息,包括进程id,发生ANR的时间,进程的名称。

----- pid 5232 at 2022-10-02 15:37:56 -----
Cmd line: com.devnn.demo
Build fingerprint: 'Android/sdk_phone_x86_64/generic_x86_64:11/RSR1.210722.012/7758210:userdebug/test-keys'
ABI: 'x86_64'
Build type: optimized
Zygote loaded classes=15740 post zygote classes=1289
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes10.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes11.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes6.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes2.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes3.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes8.dex], parent #1
Done dumping class loaders
Classes initialized: 526 in 694.025ms
Intern table: 31792 strong; 523 weak
JNI: CheckJNI is on; globals=639 (plus 37 weak)
Libraries: libandroid.so libaudioeffect_jni.so libcompiler_rt.so libicu_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so librs_jni.so libsfplugin_ccodec.so libsoundpool.so libstats_jni.so libwebviewchromium_loader.so (14)
Heap: 46% free, 11MB/21MB; 75810 objects
//此处省略部分内容

第二部分是进程里所有线程的状态、堆栈,也是我们重点要关注的:


suspend all histogram:	Sum: 74.854ms 99% C.I. 0.005ms-43.315ms Avg: 3.742ms Max: 44.394ms
DALVIK THREADS (21):
"Signal Catcher" daemon prio=10 tid=4 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x12c40b10 self=0x7fada5a4af50
  | sysTid=5242 nice=-20 cgrp=top-app sched=0/0 handle=0x7fac275adcf0
  | state=R schedstat=( 21716542 2041235 2 ) utm=0 stm=2 core=0 HZ=100
  | stack=0x7fac274b6000-0x7fac274b8000 stackSize=995KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 000000000054da9e  /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+126)
  native: #01 pc 000000000069615c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool, BacktraceMap*, bool) const+380)
  native: #02 pc 00000000006b7320  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+1088)
  native: #03 pc 00000000006b064d  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+557)
  native: #04 pc 00000000006af729  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+1817)
  native: #05 pc 00000000006aec28  /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+824)
  native: #06 pc 00000000006470d9  /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+201)
  native: #07 pc 000000000065ceb6  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1766)
  native: #08 pc 000000000065bc85  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+357)
  native: #09 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58)
  native: #10 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55)
  (no managed stack frames)

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5232 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 5775317077 4230577099 871 ) utm=286 stm=291 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:442)
  - locked <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:358)
  at android.os.SystemClock.sleep(SystemClock.java:131)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-0(AnrTestActivity.kt:17)
  at com.devnn.demo.AnrTestActivity.lambda$UpadNwrDNzrVyNaTI0ysWoH569M(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$UpadNwrDNzrVyNaTI0ysWoH569M.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)
  at android.view.View.access$3600(View.java:810)
  at android.view.View$PerformClick.run(View.java:28305)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:223)
  at android.app.ActivityThread.main(ActivityThread.java:7656)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947)

"perfetto_hprof_listener" prio=10 tid=5 Native (still starting up)
  | group="" sCount=1 dsCount=0 flags=1 obj=0x0 self=0x7fada5a4cb20
  | sysTid=5243 nice=-20 cgrp=top-app sched=0/0 handle=0x7fac274afcf0
  | state=S schedstat=( 3314219 3983561 6 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7fac273b8000-0x7fac273ba000 stackSize=995KB
  | held mutexes=
  native: #00 pc 00000000000b1ec5  /apex/com.android.runtime/lib64/bionic/libc.so (read+5)
  native: #01 pc 000000000001cb70  /apex/com.android.art/lib64/libperfetto_hprof.so (void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ArtPlugin_Initialize::$_29> >(void*)+288)
  native: #02 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58)
  native: #03 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55)
  (no managed stack frames)
  //...省略其它线程

可以看到第一个线程是Signal Catcher守护线程,用来捕获SIGQUIT信号的。从这里也说明这个线程是属于app进程的。第二个线程就是我们app的主线程:

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5232 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 5775317077 4230577099 871 ) utm=286 stm=291 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:442)
  - locked <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:358)
  at android.os.SystemClock.sleep(SystemClock.java:131)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-0(AnrTestActivity.kt:17)
  at com.devnn.demo.AnrTestActivity.lambda$UpadNwrDNzrVyNaTI0ysWoH569M(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$UpadNwrDNzrVyNaTI0ysWoH569M.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)

You can see that the main thread is unable to respond to input events because it is sleeping.

The first line of each thread information is fixed:

"main" prio=5 tid=1 Sleeping

The first indicates the thread name, the second is its priority, the third is the thread id, and the fourth is the thread state.

The key information here is the thread state. Generally, you can probably know what caused the ANR by looking at the thread state. It looks like it is dormant here, so you can analyze the specific code location by looking at its stack later.

Let's look at an example of ANR caused by a deadlock operation.

Scenario 2: Deadlock leads to ANR

 private fun clickTest() {
    
    

        val obj1 = Object()
        val obj2 = Object()

        Thread {
    
    
            synchronized(obj1) {
    
    
                Thread.sleep(100)
                //子线程已经获取obj1的锁,想要获取ojb2的锁
                synchronized(obj2) {
    
    
                    Log.i("AnrTest", "sub")
                }
            }
        }.start()

        synchronized(obj2) {
    
    
            Thread.sleep(100)
            //子线程已经获取obj2的锁,想要获取ojb1的锁
            synchronized(obj1) {
    
    
                Log.i("AnrTest", "main")
            }
        }

    }

The Logcat log is as follows, and it still shows that it cannot respond to input events.

2022-10-02 16:30:14.001 594-5956/system_process E/ActivityManager: ANR in com.devnn.demo (com.devnn.demo/.AnrTestActivity)
    PID: 5906
    Reason: Input dispatching timed out (1313584 com.devnn.demo/com.devnn.demo.AnrTestActivity (server) is not responding. Waited 5007ms for MotionEvent(deviceId=8, source=0x00005002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=22.8, yPrecision=12.8, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (721.0, 1641.9)]), policyFlags=0x62000000)
    Parent: com.devnn.demo/.AnrTestActivity
    Load: 0.8 / 0.67 / 0.39
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    ----- End output from /proc/pressure/memory -----
    
    CPU usage from 285508ms to 0ms ago (2022-10-02 16:25:25.779 to 2022-10-02 16:30:11.287):
      8.1% 279/[email protected]: 0.6% user + 7.4% kernel
      4.4% 292/[email protected]: 0.3% user + 4.1% kernel
      4.3% 594/system_server: 1.4% user + 2.8% kernel / faults: 19536 minor
      2.8% 318/surfaceflinger: 0.3% user + 2.4% kernel / faults: 871 minor
      2% 300/[email protected]: 0% user + 1.9% kernel
      0.5% 2014/com.android.systemui: 0% user + 0.5% kernel / faults: 4342 minor
      0.4% 365/adbd: 0% user + 0.4% kernel / faults: 946 minor
      0.2% 1152/com.android.launcher3: 0% user + 0.2% kernel / faults: 50 minor
      0.2% 157/logd: 0% user + 0.2% kernel / faults: 13 minor
      0.2% 277/android.hardware.audio.service.ranchu: 0% user + 0.1% kernel / faults: 5 minor
      0.2% 10/rcu_preempt: 0% user + 0.2% kernel
      0.1% 1002/com.android.phone: 0% user + 0% kernel / faults: 1267 minor

The specific reason cannot be seen in Logat, so it depends on the trace file.

----- pid 5906 at 2022-10-02 16:30:11 -----
Cmd line: com.devnn.demo
Build fingerprint: 'Android/sdk_phone_x86_64/generic_x86_64:11/RSR1.210722.012/7758210:userdebug/test-keys'
ABI: 'x86_64'
Build type: optimized

...省略无关内容 


"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5906 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 2792813804 2053378730 782 ) utm=161 stm=117 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at com.devnn.demo.AnrTestActivity.clickTest(AnrTestActivity.kt:48)
  - waiting to lock <0x026f6b14> (a java.lang.Object) held by thread 2
  - locked <0x0188dfbd> (a java.lang.Object)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-1(AnrTestActivity.kt:21)
  at com.devnn.demo.AnrTestActivity.lambda$W1-GSjdjbC-dtyUoueoTRdjL4Es(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$W1-GSjdjbC-dtyUoueoTRdjL4Es.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)
  at android.view.View.access$3600(View.java:810)
  at android.view.View$PerformClick.run(View.java:28305)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:223)
  at android.app.ActivityThread.main(ActivityThread.java:7656)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947)

You can see that the main thread status is Bocked (blocked).

 waiting to lock <0x026f6b14> (a java.lang.Object) held by thread 2
  - locked <0x0188dfbd> (a java.lang.Object)

The stack shows that the main thread is acquiring 0x026f6b14a lock on this object, which is held by thread 2. At the same time the main thread is holding 0x0188dfbdthe object lock.

Then look at thread 2's stack:

"Thread-5" prio=5 tid=2 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12db7fc0 self=0x7fada5a55630
  | sysTid=5953 nice=0 cgrp=top-app sched=0/0 handle=0x7fabdc49fcf0
  | state=S schedstat=( 1560220 17477159 3 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7fabdc39c000-0x7fabdc39e000 stackSize=1043KB
  | held mutexes=
  at com.devnn.demo.AnrTestActivity.clickTest$lambda-4(AnrTestActivity.kt:39)
  - waiting to lock <0x0188dfbd> (a java.lang.Object) held by thread 1
  - locked <0x026f6b14> (a java.lang.Object)
  at com.devnn.demo.AnrTestActivity.lambda$A4lEoLZVf4n-xUBZSqj2v3ihIqw(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$A4lEoLZVf4n-xUBZSqj2v3ihIqw.run(lambda:-1)
  at java.lang.Thread.run(Thread.java:923)

Thread 2 is also in the Blocked state, and it is waiting for 0x0188dfbdthe lock on this object, which is being held by Thread 1. And thread 2 is holding 0x026f6b14this object lock.

This is the ANR caused by the deadlock.

Thread state in trace file

When viewing the thread state in the trace file, you can see that the thread has many states:

"Signal Catcher" daemon prio=10 tid=4 Runnable
"RenderThread" daemon prio=7 tid=21 Native
"DefaultDispatcher-worker-1" daemon prio=5 tid=22 TimedWaiting
"main" prio=5 tid=1 Blocked
"main" prio=5 tid=1 Sleeping
"main" prio=5 tid=1 MONITOR

There are mainly these states, and several states have been defined in the Thread class, but Nativewhat MONITORis the state?

Review Threadthe several thread states defined in the following classes:

//java.lang.Thread
public class Thread implements Runnable {
    
    
 public enum State {
    
    
        /**
         * Thread state for a thread which has not yet started.
         */
        NEW,

        /**
         * Thread state for a runnable thread.  A thread in the runnable
         * state is executing in the Java virtual machine but it may
         * be waiting for other resources from the operating system
         * such as processor.
         */
        RUNNABLE,

        /**
         * Thread state for a thread blocked waiting for a monitor lock.
         * A thread in the blocked state is waiting for a monitor lock
         * to enter a synchronized block/method or
         * reenter a synchronized block/method after calling
         * {@link Object#wait() Object.wait}.
         */
        BLOCKED,

        /**
         * Thread state for a waiting thread.
         * A thread is in the waiting state due to calling one of the
         * following methods:
         * <ul>
         *   <li>{@link Object#wait() Object.wait} with no timeout</li>
         *   <li>{@link #join() Thread.join} with no timeout</li>
         *   <li>{@link LockSupport#park() LockSupport.park}</li>
         * </ul>
         *
         * <p>A thread in the waiting state is waiting for another thread to
         * perform a particular action.
         *
         * For example, a thread that has called <tt>Object.wait()</tt>
         * on an object is waiting for another thread to call
         * <tt>Object.notify()</tt> or <tt>Object.notifyAll()</tt> on
         * that object. A thread that has called <tt>Thread.join()</tt>
         * is waiting for a specified thread to terminate.
         */
        WAITING,

        /**
         * Thread state for a waiting thread with a specified waiting time.
         * A thread is in the timed waiting state due to calling one of
         * the following methods with a specified positive waiting time:
         * <ul>
         *   <li>{@link #sleep Thread.sleep}</li>
         *   <li>{@link Object#wait(long) Object.wait} with timeout</li>
         *   <li>{@link #join(long) Thread.join} with timeout</li>
         *   <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
         *   <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
         * </ul>
         */
        TIMED_WAITING,

        /**
         * Thread state for a terminated thread.
         * The thread has completed execution.
         */
        TERMINATED;
    }
}

There are their corresponding relationships in VMThread:

//VMThread.java
    /**
     * Holds a mapping from native Thread statuses to Java one. Required for
     * translating back the result of getStatus().
     */
    static final Thread.State[] STATE_MAP = new Thread.State[] {
    
    
        Thread.State.TERMINATED,     // ZOMBIE
        Thread.State.RUNNABLE,       // RUNNING
        Thread.State.TIMED_WAITING,  // TIMED_WAIT
        Thread.State.BLOCKED,        // MONITOR
        Thread.State.WAITING,        // WAIT
        Thread.State.NEW,            // INITIALIZING
        Thread.State.NEW,            // STARTING
        Thread.State.RUNNABLE,       // NATIVE
        Thread.State.WAITING,        // VMWAIT
        Thread.State.RUNNABLE        // SUSPENDED
    };

Visible NATIVErepresents RUNNABLE, MONITORrepresents BLOCKED.

OK, this is the end of the introduction to the ANR problem generation process and analysis method.

Guess you like

Origin blog.csdn.net/devnn/article/details/127138547