[Android] Proceso de generación y método de análisis de Android ANR

prefacio

El problema de Android ANR siempre ha sido un problema relativamente difícil de resolver. En primer lugar, es difícil de reproducir y, en segundo lugar, no es fácil de analizar después de la recurrencia. Este artículo resuelve el proceso de generación de ANR y cómo localizar la causa de ANR para obtener el archivo de registro. De hecho, el monitoreo en línea de ANR también es bastante complicado. Después de leer este artículo, veamos algunas soluciones de monitoreo de terminales ANR (como WeChat Matrix) y tal vez tengamos una idea más clara.

Cuando ANR aparece como se muestra a continuación:
inserte la descripción de la imagen aquí

¿Qué es ANR?

ANR significa que la aplicación no ha respondido durante mucho tiempo y aparecerá una ventana emergente en la interfaz (como se muestra arriba). No es una Excepción de Tiempo de Ejecución y no puede ser capturado por captura. Y el proceso del servidor del sistema lo muestra, por lo que el proceso de la aplicación no puede percibirlo (se puede percibir en la capa nativa, que se analizará a continuación). Cuando ocurre ANR, el proceso del servidor del sistema imprimirá registros en Logcat y generará registros más detallados en el /data/anrdirectorio y los almacenará en forma de archivos (generalmente llamados archivos de rastreo ANR).

Causas ANR

Android ANR generalmente se genera por las siguientes razones:

Tiempo de espera del servicio : el servicio en primer plano no se completa en 20 s y el servicio en segundo plano no se completa en 200 s;
Tiempo de espera de la cola de transmisión : la transmisión en primer plano se completa en 10 s y la de fondo en 60 s;
Tiempo de espera del proveedor de contenido : el tiempo de espera de liberación del proveedor es de 10 s;
InputDispatching Tiempo de espera : el tiempo de espera del procesamiento del evento de entrada es de 5 s, incluidos los eventos de tecla y toque.

El escenario más común es el cuarto, es decir, el tiempo de espera de respuesta del evento de entrada, principalmente eventos táctiles. ¿Por qué se agota el tiempo? En general, se debe a que el subproceso principal está bloqueado por algunas razones, como tareas que consumen mucho tiempo, cálculos complejos, interbloqueos, suspensión, etc.

Proceso de generación de ANR

El tiempo de espera del servicio indica que el ciclo de vida del componente del servicio funciona como el tiempo de espera del proceso onCreate. El siguiente ejemplo ilustra cómo la función de ciclo de vida onCreate de Service genera AAR.

onCreate se llama después de startService, por lo que startService. El siguiente código se basa en Android SDK 29.
El proceso es el siguiente:

Context.startService
ContextImpl.startService
ActivityManagerService.startService
ActiveServices.startServiceLocked
ActiveServices.startServiceInnerLocked
ActiveServices.bringUpServiceLocked
ActiveServices.realStartServiceLocked

Luego enfócate en ActiveServices.realStartServiceLockedel código de la función:

  private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException {
    
    
		...
		//这个函数会发送一条延时20秒的消息
        bumpServiceExecutingLocked(r, execInFg, "create");
   		...
        try {
    
    
            ...
           //通知app进程创建Service:这里面会调用onCreate生命周期函数
            app.thread.scheduleCreateService(r, r.serviceInfo,
  		    ...
  		 } catch (DeadObjectException e) {
    
    
		 ...

bumpServiceExecutingLockedEnviar función de mensaje de retraso:

 private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
    
    
  ...
   scheduleServiceTimeoutLocked(r.app);
   ...
}

Continúe mirando scheduleServiceTimeoutLockedla función:

    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    
    
        //获取延时消息
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
        //发送延时消息,前台服务是20秒,后台是200秒
        mAm.mHandler.sendMessageDelayed(msg,
                proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
    }

En cuanto a la definición de SERVICE_TIMEOUTsuma constante SERVICE_BACKGROUND_TIMEOUT:

    //路径:com.android.server.am.ActiveServices.java
    
   // How long we wait for a service to finish executing.
    static final int SERVICE_TIMEOUT = 20*1000;

    // How long we wait for a service to finish executing.
    static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
    

Puede verse que el tiempo ANR del servicio en primer plano es de 20 segundos, y el tiempo ANR del servicio en segundo plano es 10 veces, es decir, 200 segundos.

Continúe para ver cómo el ActivityThread del proceso de la aplicación maneja la creación de tareas de servicio:

  private void handleCreateService(CreateServiceData data) {
    
    
        Service service = null;
        try {
    
    
            java.lang.ClassLoader cl = packageInfo.getClassLoader();
            service = packageInfo.getAppFactory()
                    .instantiateService(cl, data.info.name, data.intent);
        } catch (Exception e) {
    
    
      ...
        }

        try {
    
    
           ...
            ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
            context.setOuterContext(service);

            Application app = packageInfo.makeApplication(false, mInstrumentation);
            service.attach(context, this, data.info.name, data.token, app,
                    ActivityManager.getService());
            //重点:调用生命周期函数onCreate
            service.onCreate();
            mServices.put(data.token, service);
            try {
    
    
            //重点:通知AMS Service创建完成,会清除handler里的延时消息
                ActivityManager.getService().serviceDoneExecuting(
                        data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
            } catch (RemoteException e) {
    
    
                throw e.rethrowFromSystemServer();
            }
            ,,,

service.onCreateDespués de eso, se notificará al servicio AMS que la creación está completa.

ActivityManagerService.serviceDoneExecutingEl método irá a ActiveServices
serviceDoneExecutingLocked:

  private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,boolean finishing) {
    
    
 
	... 
	//移除之前发送的延时消息
	mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
	...
 }

Se puede ver que si la función de ciclo de vida del servicio onCreate se completa dentro de los 20, el mensaje retrasado se borrará del controlador y el mensaje no se ejecutará.

Si la función del ciclo de vida del Servicio onCreate no finaliza en 20, se ejecutará el mensaje de demora enviado anteriormente.

Este mensaje es el mensaje que maneja ANR.

Con respecto al mensaje retrasado: SERVICE_TIMEOUT_MSG, el procesamiento de MainHandler es el siguiente:

//com.android.server.am.ActivityManagerService
final class MainHandler extends Handler {
    
    
        public MainHandler(Looper looper) {
    
    
            super(looper, null, true);
        }

        @Override
        public void handleMessage(Message msg) {
    
    
            switch (msg.what) {
    
    
         	...
            case SERVICE_TIMEOUT_MSG: {
    
    
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            ...
         	}
         	...
         }
 }

mServices.serviceTimeoutLa implementación es la siguiente:

  void serviceTimeout(ProcessRecord proc) {
    
    
	  ...
      proc.appNotResponding(null, null, null, null, false, 
      ...
  }

Entonces, cuando ANR ocurre en Servicio, irá a ProcessRecord.appNotRespondingla función.

Después del análisis de otros tipos de ANR, también irán a ProcessRecord.appNotRespondingfunciones, como el tiempo de espera del evento de entrada:

//com.android.server.am.ActivityManagerService
   /**
     * Handle input dispatching timeouts.
     * @return whether input dispatching should be aborted or not.
     */
    boolean inputDispatchingTimedOut(ProcessRecord proc, String activityShortComponentName,
            ApplicationInfo aInfo, String parentShortComponentName,
            WindowProcessController parentProcess, boolean aboveSystem, String reason) {
    
    
        if (checkCallingPermission(FILTER_EVENTS) != PackageManager.PERMISSION_GRANTED) {
    
    
            throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
    
    
            annotation = "Input dispatching timed out";
        } else {
    
    
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
    
    
            synchronized (this) {
    
    
                if (proc.isDebugging()) {
    
    
                    return false;
                }

                if (proc.getActiveInstrumentation() != null) {
    
    
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            //输入事件超时同样也会走到ProcessRecord.appNotResponding
            proc.appNotResponding(activityShortComponentName, aInfo,
                    parentShortComponentName, parentProcess, aboveSystem, annotation);
        }

        return true;
    }

El flujo de procesamiento de tiempo de espera de eventos de entrada, transmisiones y proveedores no se analizará uno por uno.

Entonces, ProcessRecord.appNotRespondingesta función conduce al mismo objetivo, y todos los tipos de ANR eventualmente irán aquí.

Manejar ANR

El proceso ANR se divide en los siguientes pasos:

收集需要dump堆栈的进程id
分别通知这些进程开始dump线程堆栈-输出到/data/anr目录下
打印Logcat日志
前台进程弹出ANR弹窗/后台进程不弹

El proceso detallado es el siguiente:

//com.android.server.am.ProcessRecord
   void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
          String parentShortComponentName, WindowProcessController parentProcess,
          boolean aboveSystem, String annotation) {
    
    
       //收集需要dump堆栈的进程id,分为firstPids、lastPids和nativeProcs
      ArrayList<Integer> firstPids = new ArrayList<>(5);
      SparseArray<Boolean> lastPids = new SparseArray<>(20);

      synchronized (mService) {
    
    
   		...
          // In case we come through here for the same app before completing
          // this one, mark as anring now so we will bail out.
          setNotResponding(true);

          // Dump thread traces as quickly as we can, starting with "interesting" processes.
          firstPids.add(pid);

          // Don't dump other PIDs if it's a background ANR
          if (!isSilentAnr()) {
    
    
              int parentPid = pid;
              if (parentProcess != null && parentProcess.getPid() > 0) {
    
    
                  parentPid = parentProcess.getPid();
              }
              if (parentPid != pid) firstPids.add(parentPid);

              if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);

              for (int i = getLruProcessList().size() - 1; i >= 0; i--) {
    
    
                  ProcessRecord r = getLruProcessList().get(i);
                  if (r != null && r.thread != null) {
    
    
                      int myPid = r.pid;
                      if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {
    
    
                          if (r.isPersistent()) {
    
    
                              firstPids.add(myPid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
                          } else if (r.treatLikeActivity) {
    
    
                              firstPids.add(myPid);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
                          } else {
    
    
                              lastPids.put(myPid, Boolean.TRUE);
                              if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
                          }
                      }
                  }
              }
          }
      }
  	//开始组装logcat日志
      // Log the ANR to the main log.
      StringBuilder info = new StringBuilder();
      info.setLength(0);
      info.append("ANR in ").append(processName);
      if (activityShortComponentName != null) {
    
    
          info.append(" (").append(activityShortComponentName).append(")");
      }
      info.append("\n");
      info.append("PID: ").append(pid).append("\n");
      if (annotation != null) {
    
    
          info.append("Reason: ").append(annotation).append("\n");
      }
      if (parentShortComponentName != null
              && parentShortComponentName.equals(activityShortComponentName)) {
    
    
          info.append("Parent: ").append(parentShortComponentName).append("\n");
      }

      ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

  	//收集需要dump的native进程id
      // don't dump native PIDs for background ANRs unless it is the process of interest
      String[] nativeProcs = null;
      if (isSilentAnr()) {
    
    
          for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
    
    
              if (NATIVE_STACKS_OF_INTEREST[i].equals(processName)) {
    
    
                  nativeProcs = new String[] {
    
     processName };
                  break;
              }
          }
      } else {
    
    
          nativeProcs = NATIVE_STACKS_OF_INTEREST;
      }

      int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
      ArrayList<Integer> nativePids = null;

      if (pids != null) {
    
    
          nativePids = new ArrayList<>(pids.length);
          for (int i : pids) {
    
    
              nativePids.add(i);
          }
      }
  	//重点:开始dump堆栈
      // For background ANRs, don't pass the ProcessCpuTracker to
      // avoid spending 1/2 second collecting stats to rank lastPids.
      File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
              (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
              nativePids);

      String cpuInfo = null;
      if (isMonitorCpuUsage()) {
    
    
          mService.updateCpuStatsNow();
          synchronized (mService.mProcessCpuTracker) {
    
    
              cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
          }
          info.append(processCpuTracker.printCurrentLoad());
          info.append(cpuInfo);
      }

      info.append(processCpuTracker.printCurrentState(anrTime));
      
  	  //输出日志到Logcat
      Slog.e(TAG, info.toString());
      if (tracesFile == null) {
    
    
          // There is no trace file, so dump (only) the alleged culprit's threads to the log
          Process.sendSignal(pid, Process.SIGNAL_QUIT);
      }

      synchronized (mService) {
    
    
  		...
  		//后台进程直接杀死,不弹ANR
          if (isSilentAnr() && !isDebugging()) {
    
    
              kill("bg anr", true);
              return;
          }
          //给app进程设置一个ANR状态
          // Set the app's notResponding state, and look up the errorReportReceiver
          makeAppNotRespondingLocked(activityShortComponentName,
                  annotation != null ? "ANR " + annotation : "ANR", info.toString());

          // mUiHandler can be null if the AMS is constructed with injector only. This will only
          // happen in tests.
          //开始弹出ANR弹窗
          if (mService.mUiHandler != null) {
    
    
              // Bring up the infamous App Not Responding dialog
              Message msg = Message.obtain();
              msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
              msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);

              mService.mUiHandler.sendMessage(msg);
          }
      }
  }

Continúe para ver ActivityManagerServicecómo volcar la pila:

  File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
                (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
                nativePids);

ActivityManagerService.dumpStackTracesfunción:

//com.android.server.am.ActivityManagerService
 public static File dumpStackTraces(ArrayList<Integer> firstPids,
           ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids,
           ArrayList<Integer> nativePids) {
    
    
       ArrayList<Integer> extraPids = null;

       Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids);

       // Measure CPU usage as soon as we're called in order to get a realistic sampling
       // of the top users at the time of the request.
       if (processCpuTracker != null) {
    
    
           processCpuTracker.init();
           try {
    
    
               Thread.sleep(200);
           } catch (InterruptedException ignored) {
    
    
           }

           processCpuTracker.update();
   		...
   		//创建ANR的输出文件:ANR_TRACE_DIR = "/data/anr";
       final File tracesDir = new File(ANR_TRACE_DIR);
       // Each set of ANR traces is written to a separate file and dumpstate will process
       // all such files and add them to a captured bug report if they're recent enough.
       maybePruneOldTraces(tracesDir);

       // NOTE: We should consider creating the file in native code atomically once we've
       // gotten rid of the old scheme of dumping and lot of the code that deals with paths
       // can be removed.
       File tracesFile = createAnrDumpFile(tracesDir);
       if (tracesFile == null) {
    
    
           return null;
       }
   	//文件创建完毕,开始dump
       dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids);
       return tracesFile;
   }

ActivityManagerService.dumpStackTraces:

 //com.android.server.am.ActivityManagerService
 public static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
            ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) {
    
    

        Slog.i(TAG, "Dumping to " + tracesFile);

        // We don't need any sort of inotify based monitoring when we're dumping traces via
        // tombstoned. Data is piped to an "intercept" FD installed in tombstoned so we're in full
        // control of all writes to the file in question.

        // We must complete all stack dumps within 20 seconds.
        long remainingTime = 20 * 1000;

        // First collect all of the stacks of the most important pids.
        if (firstPids != null) {
    
    
            int num = firstPids.size();
            for (int i = 0; i < num; i++) {
    
    
                Slog.i(TAG, "Collecting stacks for pid " + firstPids.get(i));
                final long timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile,
                                                                remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
                           "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
                }
            }
        }

        // Next collect the stacks of the native pids
        if (nativePids != null) {
    
    
            for (int pid : nativePids) {
    
    
                Slog.i(TAG, "Collecting stacks for native pid " + pid);
                final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);

                final long start = SystemClock.elapsedRealtime();
                Debug.dumpNativeBacktraceToFileTimeout(
                        pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
                final long timeTaken = SystemClock.elapsedRealtime() - start;

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
                        "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
                }
            }
        }

        // Lastly, dump stacks for all extra PIDs from the CPU tracker.
        if (extraPids != null) {
    
    
            for (int pid : extraPids) {
    
    
                Slog.i(TAG, "Collecting stacks for extra pid " + pid);

                final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);

                remainingTime -= timeTaken;
                if (remainingTime <= 0) {
    
    
                    Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
                            "); deadline exceeded.");
                    return;
                }

                if (DEBUG_ANR) {
    
    
                    Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
                }
            }
        }
        Slog.i(TAG, "Done dumping");
    }

可见,dump trace用了两个函数:
dumpJavaTracesTombstonedDebug.dumpNativeBacktraceToFileTimeout,分别是Java层和native层的。Native层是直接调用android.os.Debug类处理。Java层调用dumpJavaTracesTombstoned处理。先看下Java层。

ActivityManagerService.dumpJavaTracesTombstoned:

 /**
     * Dump java traces for process {@code pid} to the specified file. If java trace dumping
     * fails, a native backtrace is attempted. Note that the timeout {@code timeoutMs} only applies
     * to the java section of the trace, a further {@code NATIVE_DUMP_TIMEOUT_MS} might be spent
     * attempting to obtain native traces in the case of a failure. Returns the total time spent
     * capturing traces.
     */
    private static long dumpJavaTracesTombstoned(int pid, String fileName, long timeoutMs) {
    
    
        final long timeStart = SystemClock.elapsedRealtime();
        boolean javaSuccess = Debug.dumpJavaBacktraceToFileTimeout(pid, fileName,
                (int) (timeoutMs / 1000));
        if (javaSuccess) {
    
    
            // Check that something is in the file, actually. Try-catch should not be necessary,
            // but better safe than sorry.
            try {
    
    
                long size = new File(fileName).length();
                if (size < JAVA_DUMP_MINIMUM_SIZE) {
    
    
                    Slog.w(TAG, "Successfully created Java ANR file is empty!");
                    javaSuccess = false;
                }
            } catch (Exception e) {
    
    
                Slog.w(TAG, "Unable to get ANR file size", e);
                javaSuccess = false;
            }
        }
        if (!javaSuccess) {
    
    
            Slog.w(TAG, "Dumping Java threads failed, initiating native stack dump.");
            if (!Debug.dumpNativeBacktraceToFileTimeout(pid, fileName,
                    (NATIVE_DUMP_TIMEOUT_MS / 1000))) {
    
    
                Slog.w(TAG, "Native stack dump failed!");
            }
        }

        return SystemClock.elapsedRealtime() - timeStart;
    }

又调用了 Debug.dumpJavaBacktraceToFileTimeout处理dump。

看下Debug类:

//android.os.Debug
  /**
     * Append the Java stack traces of a given native process to a specified file.
     *
     * @param pid pid to dump.
     * @param file path of file to append dump to.
     * @param timeoutSecs time to wait in seconds, or 0 to wait forever.
     * @hide
     */
    public static native boolean dumpJavaBacktraceToFileTimeout(int pid, String file,
                                                                int timeoutSecs);

    /**
     * Append the native stack traces of a given process to a specified file.
     *
     * @param pid pid to dump.
     * @param file path of file to append dump to.
     * @param timeoutSecs time to wait in seconds, or 0 to wait forever.
     * @hide
     */
    public static native boolean dumpNativeBacktraceToFileTimeout(int pid, String file,
                                                                  int timeoutSecs);

所以Dump trace最终还是调用android.os.Debug类的这两个函数:
dumpJavaBacktraceToFileTimeoutdumpNativeBacktraceToFileTimeout

这两个方法是native修饰的,因此需要去看下android源码。

注意这两个方法是加了@hide标记,app侧不能调用。

Native层如何dump trace

在Android源码中搜索dumpJavaBacktraceToFileTimeout这个函数对应的c++代码,找到了frameworks/base/core/jni/android_os_Debug.cpp,对应函数的实现:

frameworks/base/core/jni/android_os_Debug.cpp

static jboolean android_os_Debug_dumpJavaBacktraceToFileTimeout(JNIEnv* env, jobject clazz,
        jint pid, jstring fileName, jint timeoutSecs) {
    
    
    const bool ret = dumpTraces(env, pid, fileName, timeoutSecs, kDebuggerdJavaBacktrace);
    return ret ? JNI_TRUE : JNI_FALSE;
}

跟踪到了system/core/debuggerd/client/debuggerd_client.cppdebuggerd_trigger_dump方法:

bool debuggerd_trigger_dump(pid_t tid, DebuggerdDumpType dump_type, unsigned int timeout_ms,
                            unique_fd output_fd) {
    
    
     ...
 	// Send the signal.
  	const int signal = (dump_type == kDebuggerdJavaBacktrace) ? SIGQUIT 	: BIONIC_SIGNAL_DEBUGGER;
  	sigval val = {
    
    .sival_int = (dump_type == kDebuggerdNativeBacktrace) ? 1 : 0};
  	if (sigqueue(pid, signal, val) != 0) {
    
    
   	 log_error(output_fd, errno, "failed to send signal to pid %d", pid);
    	return false;
 	 }
 	 ...
  }

这个函数里面会通过sigqueue函数(bionic/libc/bionic/signal.cpp)给目标进程发送一个SIGQUIT信号。

继续看接收SIGQUIT信号的地方。

每一个app进程都会有一个SignalCatcher线程,专门处理SIGQUIT信号,来到art/runtime/signal_catcher.cc:

void* SignalCatcher::Run(void* arg) {
    
    
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  ...
  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    
    
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
    
    
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    
    
    case SIGQUIT:
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

监听到SIGQUIT信号后交给了HandleSigQuit函数处理:

void SignalCatcher::HandleSigQuit() {
    
    
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";

  DumpCmdLine(os);

  // Note: The strings "Build fingerprint:" and "ABI:" are chosen to match the format used by
  // debuggerd. This allows, for example, the stack tool to work.
  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";

  runtime->DumpForSigQuit(os);

  if ((false)) {
    
    
    std::string maps;
    if (android::base::ReadFileToString("/proc/self/maps", &maps)) {
    
    
      os << "/proc/self/maps:\n" << maps;
    }
  }
  os << "----- end " << getpid() << " -----\n";
  Output(os.str());
}

中间调用art/runtime/runtime.cc的DumpForSigQuit方法收集了更多详细的信息,包括线程堆栈。

void Runtime::DumpForSigQuit(std::ostream& os) {
    
    
  // Print backtraces first since they are important do diagnose ANRs,
  // and ANRs can often be trimmed to limit upload size.
  thread_list_->DumpForSigQuit(os);
  GetClassLinker()->DumpForSigQuit(os);
  GetInternTable()->DumpForSigQuit(os);
  GetJavaVM()->DumpForSigQuit(os);
  GetHeap()->DumpForSigQuit(os);
  oat_file_manager_->DumpForSigQuit(os);
  if (GetJit() != nullptr) {
    
    
    GetJit()->DumpForSigQuit(os);
  } else {
    
    
    os << "Running non JIT\n";
  }
  DumpDeoptimizations(os);
  TrackedAllocators::Dump(os);
  GetMetrics()->DumpForSigQuit(os);
  os << "\n";

  BaseMutex::DumpAll(os);

  // Inform anyone else who is interested in SigQuit.
  {
    
    
    ScopedObjectAccess soa(Thread::Current());
    callbacks_->SigQuit();
  }
}

ANR打印的信息比较多,详细请参阅相关源码。

到这里已经分析完了整个ANR从发生到打印的流程。

ANR分析方法

现在已经知道了ANR是怎么回事了,现在看下发生了ANR是如何定位原因的。
上文已经讲到发生ANR会在两个地方打印日志,一个是在Logcat里打印,一个是在/data/anr/目录下的trace文件里打印。

下面模拟两个场景复现ANR,一个场景是耗时操作导致ANR,一个是死锁导致ANR。

场景1:耗时操作导致ANR

为了方便,就让主线程休眠10s。

在Activity界面上有一个按钮,点击会让主线程休眠10s,代码如下,显然会发生ANR。

class AnrTestActivity : AppCompatActivity() {
    
    
    override fun onCreate(savedInstanceState: Bundle?) {
    
    
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_anr_test)
        this.findViewById<Button>(R.id.button).setOnClickListener{
    
    
            SystemClock.sleep(10000)
        }
    }

连续点击两次,5s之后会弹出ANR弹窗。
inserte la descripción de la imagen aquí
Logcat输出日志如下:

2022-10-02 15:38:00.505 594-5381/system_process E/ActivityManager: ANR in com.devnn.demo (com.devnn.demo/.AnrTestActivity)
    PID: 5232
    Reason: Input dispatching timed out (f99e8bb com.devnn.demo/com.devnn.demo.AnrTestActivity (server) is not responding. Waited 5008ms for MotionEvent(deviceId=8, source=0x00005002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=22.8, yPrecision=12.8, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (804.9, 1173.9)]), policyFlags=0x62000000)
    Parent: com.devnn.demo/.AnrTestActivity
    Load: 0.05 / 0.01 / 0.0
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    ----- End output from /proc/pressure/memory -----
    
    CPU usage from 158257ms to 0ms ago (2022-10-02 15:35:18.256 to 2022-10-02 15:37:56.513):
      6.2% 279/[email protected]: 0.3% user + 5.9% kernel
      2.2% 292/[email protected]: 0% user + 2.1% kernel
      1.6% 594/system_server: 0.3% user + 1.3% kernel / faults: 1085 minor
      1.4% 300/[email protected]: 0% user + 1.4% kernel
      0.4% 277/android.hardware.audio.service.ranchu: 0% user + 0.4% kernel / faults: 10 minor
      0.2% 371/audioserver: 0% user + 0.2% kernel / faults: 4 minor
      0.2% 5232/com.devnn.demo: 0% user + 0.2% kernel / faults: 272 minor
      0.2% 318/surfaceflinger: 0% user + 0.2% kernel
      0% 16/ksoftirqd/1: 0% user + 0% kernel
      0% 365/adbd: 0% user + 0% kernel
      0% 477/llkd: 0% user + 0% kernel
      0% 872/[email protected]: 0% user + 0% kernel
      0% 10/rcu_preempt: 0% user + 0% kernel
      0% 2014/com.android.systemui: 0% user + 0% kernel / faults: 39 minor
      0% 9/ksoftirqd/0: 0% user + 0% kernel
      0% 1002/com.android.phone: 0% user + 0% kernel / faults: 100 minor
      0% 3645/kworker/0:2-events_power_efficient: 0% user + 0% kernel
      0% 157/logd: 0% user + 0% kernel
      0% 427/libgoldfish-rild: 0% user + 0% kernel / faults: 16 minor
      0% 3270/kworker/1:1-mm_percpu_wq: 0% user + 0% kernel
      0% 159/servicemanager: 0% user + 0% kernel
      0% 160/hwservicemanager: 0% user + 0% kernel
      0% 478/hostapd_nohidl: 0% user + 0% kernel
      0% 5346/kworker/u4:0-events_unbound: 0% user + 0% kernel
      0% 11/migration/0: 0% user + 0% kernel
      0% 15/migration/1: 0% user + 0% kernel
      0% 164/qemu-props: 0% user + 0% kernel
      0% 188/jbd2/dm-5-8: 0% user + 0% kernel
      0% 269/statsd: 0% user + 0% kernel
      0% 342/logcat: 0% user + 0% kernel
      0% 418/media.metrics: 0% user + 0% kernel / faults: 1 minor
      0% 442/[email protected]: 0% user + 0% kernel
      0% 761/wpa_supplicant: 0% user + 0% kernel
      0% 3615/logcat: 0% user + 0% kernel
      0% 5068/kworker/u4:1-phy0: 0% user + 0% kernel
    1.9% TOTAL: 0.1% user + 1.7% kernel + 0% softirq
    CPU usage from 20ms to 335ms later (2022-10-02 15:37:56.533 to 2022-10-02 15:37:56.848):
      22% 594/system_server: 15% user + 7.5% kernel / faults: 161 minor
        22% 5381/AnrConsumer: 7.5% user + 15% kernel
      6.9% 279/[email protected]: 0% user + 6.9% kernel
        6.9% 1215/[email protected]: 0% user + 6.9% kernel
      3.5% 292/[email protected]: 0% user + 3.5% kernel
    18% TOTAL: 8.6% user + 10% kernel

注意需要选中system_process进程。

从Logcat日志可以看出来,是进程id=5323的处理输入事件超时了。这个日志也是上文分析的ProcessRecord.appNotResponding方法打印出来的。

下面看下/data/anr/目录下的日志内容是怎么样的。

整个trace文件就代表发生一次ANR的日志。每发生一次ANR就会生成新的trace文件,trace文件名称以时间命名的。
inserte la descripción de la imagen aquí
整个trace文件是有结构的,它整体上是以进程为单位进行打印的。

由于发生ANR不一定是app进程导致的,可能是其它关联进程导致的,所以它把相关进程的信息都打印在同一个文件里了。基本上是以下面这个结构打印的。

----- pid 5232 at 2022-10-02 15:37:56 -----
进程5232的详细日志
----- end 5232 -----

----- pid 594 at 2022-10-02 15:37:57 -----
进程594的详细日志
----- end 594 -----

----- pid xxx at xxxx-xx-xx xx:xx:xx -----
进程xxx的详细日志
----- end xxx -----

第一个进程就是发生ANR的进程,一般是app进程。

由于内容过长,整个trace文件有700多KB,下面就截取app进程的主要信息。

每个进程信息的开头是它的概要信息,包括进程id,发生ANR的时间,进程的名称。

----- pid 5232 at 2022-10-02 15:37:56 -----
Cmd line: com.devnn.demo
Build fingerprint: 'Android/sdk_phone_x86_64/generic_x86_64:11/RSR1.210722.012/7758210:userdebug/test-keys'
ABI: 'x86_64'
Build type: optimized
Zygote loaded classes=15740 post zygote classes=1289
Dumping registered class loaders
#0 dalvik.system.PathClassLoader: [], parent #1
#1 java.lang.BootClassLoader: [], no parent
#2 dalvik.system.PathClassLoader: [/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes10.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes11.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes6.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes2.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes3.dex:/data/app/~~Qnj80NrB3yjtX87JepktGQ==/com.devnn.demo-FWP2tIJA7Ec1qoJefwnc0A==/base.apk!classes8.dex], parent #1
Done dumping class loaders
Classes initialized: 526 in 694.025ms
Intern table: 31792 strong; 523 weak
JNI: CheckJNI is on; globals=639 (plus 37 weak)
Libraries: libandroid.so libaudioeffect_jni.so libcompiler_rt.so libicu_jni.so libjavacore.so libjavacrypto.so libjnigraphics.so libmedia_jni.so libopenjdk.so librs_jni.so libsfplugin_ccodec.so libsoundpool.so libstats_jni.so libwebviewchromium_loader.so (14)
Heap: 46% free, 11MB/21MB; 75810 objects
//此处省略部分内容

第二部分是进程里所有线程的状态、堆栈,也是我们重点要关注的:


suspend all histogram:	Sum: 74.854ms 99% C.I. 0.005ms-43.315ms Avg: 3.742ms Max: 44.394ms
DALVIK THREADS (21):
"Signal Catcher" daemon prio=10 tid=4 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x12c40b10 self=0x7fada5a4af50
  | sysTid=5242 nice=-20 cgrp=top-app sched=0/0 handle=0x7fac275adcf0
  | state=R schedstat=( 21716542 2041235 2 ) utm=0 stm=2 core=0 HZ=100
  | stack=0x7fac274b6000-0x7fac274b8000 stackSize=995KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 000000000054da9e  /apex/com.android.art/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, BacktraceMap*, char const*, art::ArtMethod*, void*, bool)+126)
  native: #01 pc 000000000069615c  /apex/com.android.art/lib64/libart.so (art::Thread::DumpStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool, BacktraceMap*, bool) const+380)
  native: #02 pc 00000000006b7320  /apex/com.android.art/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+1088)
  native: #03 pc 00000000006b064d  /apex/com.android.art/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*, art::Closure*)+557)
  native: #04 pc 00000000006af729  /apex/com.android.art/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, bool)+1817)
  native: #05 pc 00000000006aec28  /apex/com.android.art/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+824)
  native: #06 pc 00000000006470d9  /apex/com.android.art/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+201)
  native: #07 pc 000000000065ceb6  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1766)
  native: #08 pc 000000000065bc85  /apex/com.android.art/lib64/libart.so (art::SignalCatcher::Run(void*)+357)
  native: #09 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58)
  native: #10 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55)
  (no managed stack frames)

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5232 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 5775317077 4230577099 871 ) utm=286 stm=291 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:442)
  - locked <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:358)
  at android.os.SystemClock.sleep(SystemClock.java:131)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-0(AnrTestActivity.kt:17)
  at com.devnn.demo.AnrTestActivity.lambda$UpadNwrDNzrVyNaTI0ysWoH569M(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$UpadNwrDNzrVyNaTI0ysWoH569M.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)
  at android.view.View.access$3600(View.java:810)
  at android.view.View$PerformClick.run(View.java:28305)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:223)
  at android.app.ActivityThread.main(ActivityThread.java:7656)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947)

"perfetto_hprof_listener" prio=10 tid=5 Native (still starting up)
  | group="" sCount=1 dsCount=0 flags=1 obj=0x0 self=0x7fada5a4cb20
  | sysTid=5243 nice=-20 cgrp=top-app sched=0/0 handle=0x7fac274afcf0
  | state=S schedstat=( 3314219 3983561 6 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7fac273b8000-0x7fac273ba000 stackSize=995KB
  | held mutexes=
  native: #00 pc 00000000000b1ec5  /apex/com.android.runtime/lib64/bionic/libc.so (read+5)
  native: #01 pc 000000000001cb70  /apex/com.android.art/lib64/libperfetto_hprof.so (void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ArtPlugin_Initialize::$_29> >(void*)+288)
  native: #02 pc 00000000000c7d2a  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+58)
  native: #03 pc 000000000005f0c7  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+55)
  (no managed stack frames)
  //...省略其它线程

可以看到第一个线程是Signal Catcher守护线程,用来捕获SIGQUIT信号的。从这里也说明这个线程是属于app进程的。第二个线程就是我们app的主线程:

"main" prio=5 tid=1 Sleeping
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5232 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 5775317077 4230577099 871 ) utm=286 stm=291 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at java.lang.Thread.sleep(Native method)
  - sleeping on <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:442)
  - locked <0x06059c02> (a java.lang.Object)
  at java.lang.Thread.sleep(Thread.java:358)
  at android.os.SystemClock.sleep(SystemClock.java:131)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-0(AnrTestActivity.kt:17)
  at com.devnn.demo.AnrTestActivity.lambda$UpadNwrDNzrVyNaTI0ysWoH569M(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$UpadNwrDNzrVyNaTI0ysWoH569M.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)

Puede ver que el subproceso principal no puede responder a los eventos de entrada porque está inactivo.

La primera línea de información de cada hilo es fija:

"main" prio=5 tid=1 Sleeping

El primero indica el nombre del hilo, el segundo es su prioridad, el tercero es la identificación del hilo y el cuarto es el estado del hilo.

La información clave aquí es el estado del subproceso. Generalmente, probablemente pueda saber qué causó el ANR observando el estado del subproceso. Parece que está inactivo aquí, por lo que puede analizar la ubicación del código específico mirando su pila más tarde.

Veamos un ejemplo de ANR causado por una operación de interbloqueo.

Escenario 2: Interbloqueo conduce a ANR

 private fun clickTest() {
    
    

        val obj1 = Object()
        val obj2 = Object()

        Thread {
    
    
            synchronized(obj1) {
    
    
                Thread.sleep(100)
                //子线程已经获取obj1的锁,想要获取ojb2的锁
                synchronized(obj2) {
    
    
                    Log.i("AnrTest", "sub")
                }
            }
        }.start()

        synchronized(obj2) {
    
    
            Thread.sleep(100)
            //子线程已经获取obj2的锁,想要获取ojb1的锁
            synchronized(obj1) {
    
    
                Log.i("AnrTest", "main")
            }
        }

    }

El registro de Logcat es el siguiente y todavía muestra que no puede responder a los eventos de entrada.

2022-10-02 16:30:14.001 594-5956/system_process E/ActivityManager: ANR in com.devnn.demo (com.devnn.demo/.AnrTestActivity)
    PID: 5906
    Reason: Input dispatching timed out (1313584 com.devnn.demo/com.devnn.demo.AnrTestActivity (server) is not responding. Waited 5007ms for MotionEvent(deviceId=8, source=0x00005002, displayId=0, action=DOWN, actionButton=0x00000000, flags=0x00000000, metaState=0x00000000, buttonState=0x00000000, classification=NONE, edgeFlags=0x00000000, xPrecision=22.8, yPrecision=12.8, xCursorPosition=nan, yCursorPosition=nan, pointers=[0: (721.0, 1641.9)]), policyFlags=0x62000000)
    Parent: com.devnn.demo/.AnrTestActivity
    Load: 0.8 / 0.67 / 0.39
    ----- Output from /proc/pressure/memory -----
    some avg10=0.00 avg60=0.00 avg300=0.00 total=0
    full avg10=0.00 avg60=0.00 avg300=0.00 total=0
    ----- End output from /proc/pressure/memory -----
    
    CPU usage from 285508ms to 0ms ago (2022-10-02 16:25:25.779 to 2022-10-02 16:30:11.287):
      8.1% 279/[email protected]: 0.6% user + 7.4% kernel
      4.4% 292/[email protected]: 0.3% user + 4.1% kernel
      4.3% 594/system_server: 1.4% user + 2.8% kernel / faults: 19536 minor
      2.8% 318/surfaceflinger: 0.3% user + 2.4% kernel / faults: 871 minor
      2% 300/[email protected]: 0% user + 1.9% kernel
      0.5% 2014/com.android.systemui: 0% user + 0.5% kernel / faults: 4342 minor
      0.4% 365/adbd: 0% user + 0.4% kernel / faults: 946 minor
      0.2% 1152/com.android.launcher3: 0% user + 0.2% kernel / faults: 50 minor
      0.2% 157/logd: 0% user + 0.2% kernel / faults: 13 minor
      0.2% 277/android.hardware.audio.service.ranchu: 0% user + 0.1% kernel / faults: 5 minor
      0.2% 10/rcu_preempt: 0% user + 0.2% kernel
      0.1% 1002/com.android.phone: 0% user + 0% kernel / faults: 1267 minor

El motivo específico no se puede ver en Logat, por lo que depende del archivo de rastreo.

----- pid 5906 at 2022-10-02 16:30:11 -----
Cmd line: com.devnn.demo
Build fingerprint: 'Android/sdk_phone_x86_64/generic_x86_64:11/RSR1.210722.012/7758210:userdebug/test-keys'
ABI: 'x86_64'
Build type: optimized

...省略无关内容 


"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x71fb36a8 self=0x7fada5a477b0
  | sysTid=5906 nice=-10 cgrp=top-app sched=0/0 handle=0x7faecb97d4f8
  | state=S schedstat=( 2792813804 2053378730 782 ) utm=161 stm=117 core=0 HZ=100
  | stack=0x7ffc29566000-0x7ffc29568000 stackSize=8192KB
  | held mutexes=
  at com.devnn.demo.AnrTestActivity.clickTest(AnrTestActivity.kt:48)
  - waiting to lock <0x026f6b14> (a java.lang.Object) held by thread 2
  - locked <0x0188dfbd> (a java.lang.Object)
  at com.devnn.demo.AnrTestActivity.onCreate$lambda-1(AnrTestActivity.kt:21)
  at com.devnn.demo.AnrTestActivity.lambda$W1-GSjdjbC-dtyUoueoTRdjL4Es(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$W1-GSjdjbC-dtyUoueoTRdjL4Es.onClick(lambda:-1)
  at android.view.View.performClick(View.java:7448)
  at android.view.View.performClickInternal(View.java:7425)
  at android.view.View.access$3600(View.java:810)
  at android.view.View$PerformClick.run(View.java:28305)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loop(Looper.java:223)
  at android.app.ActivityThread.main(ActivityThread.java:7656)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:947)

Puede ver que el estado del hilo principal es Bocked (bloqueado).

 waiting to lock <0x026f6b14> (a java.lang.Object) held by thread 2
  - locked <0x0188dfbd> (a java.lang.Object)

La pila muestra que el subproceso principal está adquiriendo 0x026f6b14un bloqueo en este objeto, que está en manos del subproceso 2. Al mismo tiempo, el hilo principal mantiene 0x0188dfbdel bloqueo del objeto.

Luego mira la pila del hilo 2:

"Thread-5" prio=5 tid=2 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12db7fc0 self=0x7fada5a55630
  | sysTid=5953 nice=0 cgrp=top-app sched=0/0 handle=0x7fabdc49fcf0
  | state=S schedstat=( 1560220 17477159 3 ) utm=0 stm=0 core=0 HZ=100
  | stack=0x7fabdc39c000-0x7fabdc39e000 stackSize=1043KB
  | held mutexes=
  at com.devnn.demo.AnrTestActivity.clickTest$lambda-4(AnrTestActivity.kt:39)
  - waiting to lock <0x0188dfbd> (a java.lang.Object) held by thread 1
  - locked <0x026f6b14> (a java.lang.Object)
  at com.devnn.demo.AnrTestActivity.lambda$A4lEoLZVf4n-xUBZSqj2v3ihIqw(AnrTestActivity.kt:-1)
  at com.devnn.demo.-$$Lambda$AnrTestActivity$A4lEoLZVf4n-xUBZSqj2v3ihIqw.run(lambda:-1)
  at java.lang.Thread.run(Thread.java:923)

El subproceso 2 también está en estado Bloqueado y está esperando 0x0188dfbdel bloqueo de este objeto, que está retenido por el subproceso 1. Y el subproceso 2 mantiene 0x026f6b14este bloqueo de objeto.

Este es el ANR causado por el interbloqueo.

Estado del subproceso en el archivo de seguimiento

Al ver el estado del subproceso en el archivo de seguimiento, puede ver que el subproceso tiene muchos estados:

"Signal Catcher" daemon prio=10 tid=4 Runnable
"RenderThread" daemon prio=7 tid=21 Native
"DefaultDispatcher-worker-1" daemon prio=5 tid=22 TimedWaiting
"main" prio=5 tid=1 Blocked
"main" prio=5 tid=1 Sleeping
"main" prio=5 tid=1 MONITOR

Existen principalmente estos estados, y se han definido varios estados en la clase Thread, pero ¿ Nativecuál MONITORes el estado?

Revise Threadlos diversos estados de subprocesos definidos en las siguientes clases:

//java.lang.Thread
public class Thread implements Runnable {
    
    
 public enum State {
    
    
        /**
         * Thread state for a thread which has not yet started.
         */
        NEW,

        /**
         * Thread state for a runnable thread.  A thread in the runnable
         * state is executing in the Java virtual machine but it may
         * be waiting for other resources from the operating system
         * such as processor.
         */
        RUNNABLE,

        /**
         * Thread state for a thread blocked waiting for a monitor lock.
         * A thread in the blocked state is waiting for a monitor lock
         * to enter a synchronized block/method or
         * reenter a synchronized block/method after calling
         * {@link Object#wait() Object.wait}.
         */
        BLOCKED,

        /**
         * Thread state for a waiting thread.
         * A thread is in the waiting state due to calling one of the
         * following methods:
         * <ul>
         *   <li>{@link Object#wait() Object.wait} with no timeout</li>
         *   <li>{@link #join() Thread.join} with no timeout</li>
         *   <li>{@link LockSupport#park() LockSupport.park}</li>
         * </ul>
         *
         * <p>A thread in the waiting state is waiting for another thread to
         * perform a particular action.
         *
         * For example, a thread that has called <tt>Object.wait()</tt>
         * on an object is waiting for another thread to call
         * <tt>Object.notify()</tt> or <tt>Object.notifyAll()</tt> on
         * that object. A thread that has called <tt>Thread.join()</tt>
         * is waiting for a specified thread to terminate.
         */
        WAITING,

        /**
         * Thread state for a waiting thread with a specified waiting time.
         * A thread is in the timed waiting state due to calling one of
         * the following methods with a specified positive waiting time:
         * <ul>
         *   <li>{@link #sleep Thread.sleep}</li>
         *   <li>{@link Object#wait(long) Object.wait} with timeout</li>
         *   <li>{@link #join(long) Thread.join} with timeout</li>
         *   <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
         *   <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
         * </ul>
         */
        TIMED_WAITING,

        /**
         * Thread state for a terminated thread.
         * The thread has completed execution.
         */
        TERMINATED;
    }
}

Hay sus relaciones correspondientes en VMThread:

//VMThread.java
    /**
     * Holds a mapping from native Thread statuses to Java one. Required for
     * translating back the result of getStatus().
     */
    static final Thread.State[] STATE_MAP = new Thread.State[] {
    
    
        Thread.State.TERMINATED,     // ZOMBIE
        Thread.State.RUNNABLE,       // RUNNING
        Thread.State.TIMED_WAITING,  // TIMED_WAIT
        Thread.State.BLOCKED,        // MONITOR
        Thread.State.WAITING,        // WAIT
        Thread.State.NEW,            // INITIALIZING
        Thread.State.NEW,            // STARTING
        Thread.State.RUNNABLE,       // NATIVE
        Thread.State.WAITING,        // VMWAIT
        Thread.State.RUNNABLE        // SUSPENDED
    };

Visible NATIVErepresenta RUNNABLE, MONITORrepresenta BLOCKED.

Bien, este es el final de la introducción al proceso de generación de problemas ANR y al método de análisis.

Supongo que te gusta

Origin blog.csdn.net/devnn/article/details/127138547
Recomendado
Clasificación