android ANR explanation

Note: This article is a translation of the ANR of the Android official website. Some irrelevant descriptions are not translated, but the overall meaning is consistent with the original text. The original text link is as follows:

Android ANR

(The "worker thread" in the article actually refers to the child thread)

First, let's take a look at the official Android definition of ANR:

When the UI thread of an Android app is blocked for too long, an "Application Not Responding" (ANR) error is triggered. If the app is in the foreground, the system displays a dialog to the user.

This means that when the UI thread of an application is blocked for a long time, an ANR (Application Not Responding, application not responding) error is triggered. If the application is in the foreground, the system will display a dialog box to the user (As for what the dialog box looks like, everyone who has encountered ANR must know, I will not post it here)

We have to be clear that ANR is a very serious problem, because the main thread of the APP (the thread responsible for updating the UI) cannot handle user input events or drawing, causing great trouble to users.

An ANR occurs when one of the following conditions occurs:

  1. When the activity is in the foreground, your program does not respond to input events or BroadcastReceiver within 5 seconds (such as key presses, screen touch events)
  2. When the activity is not in the foreground, your BroadcastReceiver has not been executed for a long time

How to diagnose ANR

Some common scenarios are as follows:

  1. The program does some operations involving I/O on the main thread
  2. The program does some long-term operations on the main thread
  3. The main thread has a synchronous Binder call to other processes, but other processes need to spend
  4. Long time to return
  5. The main thread is blocked waiting for a long synchronization lock, and this long operation is in another thread
  6. When the main thread interacts with other threads in the same process or through Binder calls, the main thread is deadlocked. At this point, the main thread is not only waiting for a long operation to complete, but also in a deadlock state

The following methods can help you find out which of the above causes ANR:

(1) Enable Strict mode

When you develop your program, using StrictMode can help you find some unexpected IO operations in the main thread. You can use StrictMode at the application or activity level.

(2) Enable the background ANR dialog

Only when the "Show all ANR" switch in the "Developer Options" of the device is enabled, Android will display the ANR dialog box for apps that take a long time to process broadcast messages. Therefore, the background ANR dialog box is not always displayed, but this APP is still experiencing performance issues.

(3) Traceview

When your program is running a use case, you can use Traceview to get the trace information of the running program to confirm where the main thread is busy. About how to use Traceview, I will introduce it in the next blog.

(4) Pull out the traces file

When an ANR occurs, Android will save the trace information. On the older release version, there is only one /data/anr/traces.txt file on the device; on the new release version, there are multiple /data/anr/anr_* file. You can use adb to access these traces files from the device or simulator:

adb root
adb shell ls /data/anr
adb pull /data/anr/<filename>

How to fix ANR issues

(1) Slow code of the main thread (slow code)

Locate where the main thread is busy for more than 5 seconds in your code, find some suspicious use case scenarios and reproduce ANR. For example, in the timeline of Traceview shown in the figure below, the main thread is busy for more than 5 seconds:

Figure 2. Traceview timeline showing a busy main thread

The above figure shows that the time-consuming operation code occurs in the onClick function. The example code is as follows:

@Override
public void onClick(View view) {
    // 这个任务运行在主线程
    BubbleSort.sort(data);
}

In this case, you need to move this time-consuming code to a worker thread. The Android framework provides some classes. For example, the following sample code shows how to use AsyncTask:

@Override
public void onClick(View view) {
   new AsyncTask<Integer[], Integer, Long>() {
       @Override
       protected Long doInBackground(Integer[]... params) {
           BubbleSort.sort(params[0]);// 运行在工作线程
       }
   }.execute(data);
}

Traceview shows that most of the code is running in the worker thread, as shown in the figure below, the main thread can respond to user events.

Figure 3. Traceview timeline showing the work handled by a worker thread

(2) IO of the main thread

Performing IO operations on the main thread is a common cause of slow operation on the main thread, and it will cause ANR. As shown in the previous section, it is recommended to move all IO operations to worker threads. Some examples of IO operations are network and storage. For more information, please refer to  Performing Network Operations  and  Saving Data

(3) Contention for locks

In some scenarios, the task that causes ANR is not directly executed on the main thread. If a worker thread acquires a lock on a certain resource, and the main thread needs this resource to complete its task, ANR may occur.

In the Traceview timeline shown in the figure below, most tasks are executed in the worker thread AsyncTask #2:

Figure 4. Traceview timeline that shows the work being executed on a worker thread

However, if ANR is occurring, you should look at the main thread status in Android Device Monitor. Normally, if the main thread is ready to update the UI and responds normally, the state of the main thread is Runnable. If the main thread cannot resume execution, then it will be in the BLOCKED state and cannot respond to events. The status displayed on the Android Device Monitor is Monitor or Wait, as shown in the following table:

Figure 5. Main thread in the Monitor status

The following trace shows that the main thread is blocked while waiting for a resource:

...
AsyncTask #2" prio=5 tid=18 Runnable
  | group="main" sCount=0 dsCount=0 obj=0x12c333a0 self=0x94c87100
  | sysTid=25287 nice=10 cgrp=default sched=0/0 handle=0x94b80920
  | state=R schedstat=( 0 0 0 ) utm=757 stm=0 core=3 HZ=100
  | stack=0x94a7e000-0x94a80000 stackSize=1038KB
  | held mutexes= "mutator lock"(shared held)
  at com.android.developer.anrsample.BubbleSort.sort(BubbleSort.java:8)
  at com.android.developer.anrsample.MainActivity$LockTask.doInBackground(MainActivity.java:147)
  - locked <0x083105ee> (a java.lang.Boolean)
  at com.android.developer.anrsample.MainActivity$LockTask.doInBackground(MainActivity.java:135)
  at android.os.AsyncTask$2.call(AsyncTask.java:305)
  at java.util.concurrent.FutureTask.run(FutureTask.java:237)
  at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:243)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
  at java.lang.Thread.run(Thread.java:761)
...

Analyzing the trace can help you locate the code that blocks the main thread. The following code holds the lock that blocks the main thread:

@Override
public void onClick(View v) {
   // 工作线程获得了 lockedResource 的锁
   new LockTask().execute(data);

   synchronized (lockedResource) {
       // 主线程在这里需要 lockedResource,但是它必须等待 LockTask 使用完成
   }
}

public class LockTask extends AsyncTask<Integer[], Integer, Long> {
   @Override
   protected Long doInBackground(Integer[]... params) {
       synchronized (lockedResource) {
           // 这是一个长时间运行的操作,使得锁持续了一段时间
           BubbleSort.sort(params[0]);
       }
   }
}

Another example is the main thread is waiting for the result of another worker thread, as shown in the following code.

public void onClick(View v) {
   WaitTask waitTask = new WaitTask();
   synchronized (waitTask) {
       try {
           waitTask.execute(data);
           // 等待工作线程的通知
           waitTask.wait();
       } catch (InterruptedException e) {}
   }
}

class WaitTask extends AsyncTask<Integer[], Integer, Long> {
   @Override
   protected Long doInBackground(Integer[]... params) {
       synchronized (this) {
           BubbleSort.sort(params[0]);
           // 结束,通知主线程
           notify();
       }
   }
}

There are other scenarios that can block the main thread, including threads using Lock, Semaphore, resource pools (such as database connection pools), or other mutex mechanisms. You should evaluate the locks held by general resources in your program, but if you want to avoid ANR, you should pay attention to the locks held by those resources required by the main thread. Make sure that the lock is held for the minimum time, and evaluate whether the program needs a lock first. If you use the lock and decide when to update the UI based on the processing of the worker thread, use an image like  onProgressUpdate() and onPostExecute() 这种机制来实现主线程和工作线程之间的通信。

Deadlock

When the resource required by a thread is held by another thread, it enters a waiting state, and another thread is also waiting for the resource held by the first thread, a deadlock will occur. If the main thread is in this situation, ANR is likely to occur. Deadlock is a relatively well-studied phenomenon in computer science, and you can use some deadlock prevention algorithms to avoid deadlock. For more details, please refer to Deadlock  and  Deadlock prevention algorithms on Wikipedia  .

Slow performing broadcast receiver

The application can respond to the broadcast message, such as enabling or disabling the flight mode and the change of the network connection status, all of which can be implemented by the broadcast receiver. ANR occurs when the program processes broadcast messages for a long time.

ANR occurs in the following situations:

  1. BroadcastReceiver did not finish its onReceive() method for a long time
  2. BroadcastReceiver called goAsync() method, and then called finish() on the PendingResult object failed

In the onReceive() method of BroadcastReceiver, only short-term operations should be performed. If your program needs to process a more complex broadcast message, you should delegate your task to IntentService for execution. You can use a tool like Traceview to confirm whether your Receiver performs long-term operations in the main thread. For example, the timeline in the figure below shows that the broadcast receiver processes a message in the main thread for nearly 100 seconds:

Figure 6. Traceview timeline showing the BroadcastReceiver work on the main thread

This behavior can be caused by a long operation performed in the onReceive() method of BroadcastReceiver, such as the following sample code:

@Override
public void onReceive(Context context, Intent intent) {
    // 长时间的操作
    BubbleSort.sort(data);
}

In this case, it is recommended to move the long-time operation code to IntentService for implementation, because it uses a worker thread to perform tasks. The following code shows how to use IntentService to handle a long operation:

@Override
public void onReceive(Context context, Intent intent) {
    // 现在这个任务运行在工作线程
    Intent intentService = new Intent(context, MyIntentService.class);
    context.startService(intentService);
}

public class MyIntentService extends IntentService {
   @Override
   protected void onHandleIntent(@Nullable Intent intent) {
       BubbleSort.sort(data);
   }
}

The result of using IntentService is that this long-term operation is not executed on the main thread, but on a worker thread, as shown in the following figure:

Figure 7. Traceview timeline showing the broadcast message processed on a worker thread

Your broadcast receiver can use the goAsync() method to signal to the system that it needs more time to process the message. However, you should also call the finish() method on the PendingResult object. The following example shows how to call the finish() method to let the system recover broadcast messages and avoid ANR:

final PendingResult pendingResult = goAsync();
new AsyncTask<Integer[], Integer, Long>() {
   @Override
   protected Long doInBackground(Integer[]... params) {
       // 长时间的操作
       BubbleSort.sort(params[0]);
       pendingResult.finish();
   }
}.execute(data);

However, if this broadcast is in the background, moving the code to another thread and using goAsync() will not fix the ANR, and the ANR timeout will still take effect.

For more information about ANR, please refer to  Keeping your app responsive . For more information about threads, please refer to  Threading performance .

Guess you like

Origin blog.csdn.net/Xia_Leon/article/details/82936351