Android ANR problem (1) - basic analysis method

This article briefly summarizes the general routine of ANR analysis from a system perspective.

1. Introduction to ANR

1.1 Definition of ANR

ANR (Application Not Responding): The application is not responding. ANR occurs when the main thread does not finish specific things within the timeout period.

1.2 ANR types

1) KeyDispatchTimeout - the main type of key or touch event, the input event has not been processed within 5S and the ANR
log keyword: Reason: Input dispatching timed out xxxx

2) ServiceTimeout-bind, create, start, unbind, etc. take time to process in the main thread. The foreground Service is within 20s, and the background Service is not processed within 200s. ANR log keywords: Timeout executing service:/
executing service XXX

3) BroadcastTimeout- When BroadcastReceiver onReceiver processes transactions, the foreground broadcast is within 10s, and the background broadcast is not processed within 60s. ANR
log keywords: Timeout of broadcast XXX/Receiver during timeout:XXX/Broadcast of XXX

4) ProcessContentProviderPublishTimedOutLocked-ContentProvider publish has not been processed within 10s and ANR
log keywords: timeout publishing content providers

1.3 Common causes of ANR

  • Time-consuming operations on the main thread, such as complex layout, huge for loop, IO, etc.
  • The main thread is blocked by the child thread synchronously
  • The main thread is blocked by the Binder peer
  • Binder is full and the main thread cannot communicate with SystemServer
  • Unable to get system resources (CPU/Memory/IO)
Two, ANR analysis

The ANR analysis described in this article is the analysis for bugreport

First grab the bugreport:

adb bugreport > bugreport.txt

Analyzing ANR is roughly divided into the following steps:

1. Determine when ANR occurs, keyword: am_anr, ANR in
2. View the trace printed when ANR occurs, file directory: /data/anr/traces.txt, and system additional information keyword: MIUI-BLOCK-MONITOR
3. Check the time-consuming keywords of the system: binder_sample, dvm_lock_sample, am_lifecycle_sample, binder thread
4. Combine the source code and the above information for analysis

2.1 am_anr

12-17 06:02:14.463 1566 1583 I am_anr : [0,8769,com.android.updater,952680005,Broadcast of Intent

The time when anr occurs am_anr: Process pid: 8769, process name: com.android.updater, type of ANR: BroadcastTimeout, specific class or reason: { act=android.intent.action.BOOT_COMPLETED flg=0x9000010 cmp= com.android.updater/.BootCompletedReceiver (has extras) }

2.2 ANR in

insert image description here

Use am_anr and ANR in to determine the time when ANR occurs, the corresponding process, reason description, etc., and you can also pay attention to those with special abnormalities in CPU usage and iowait time.

2.3 trace

When ANR occurs, the function stack information of each application process and system process is output to a /data/anr/traces.txt file. We often pay more attention to the specific execution stack of the main thread of the application process where ANR occurs. We can know that in What is the main thread doing at the time when ANR occurs, why is it stuck, is it waiting for a lock, binder call, or time-consuming operations on the main thread, etc.

2.4 Several system keywords

binder_sample: Monitor the time consumption of the binder transaction of the main thread of each process, and output the corresponding target call information when it exceeds the threshold (for example: 500ms).

insert image description here

illustrate:

1. The main thread is 2754,
2. Execute the android.app.IActivityManager interface,
3. The corresponding method code = 35 (that is, STOP_SERVICE_TRANSACTION),
4. The time spent is 2900ms,
5. The package of the block is android.process.media.
The last parameter is the sample ratio (not much value)

dvm_lock_sample: When the time blocked by a thread waiting for the lock exceeds the threshold (for example: 500ms), the current lock holding status is output.

insert image description here

Description: system_server: Binder_9, executed to the 6403 lines of code in ActivityManagerService.java, has been waiting for the AMS lock, "-" means that the lock is the same file,

That is, the lock is held by 1448 lines of code in the same file, causing the Binder_9 thread to be blocked for 1500ms.

am_lifecycle_sample: When the execution time of the app's life cycle callback method in the main thread exceeds the threshold (for example: 3000ms), the corresponding information will be output.

insert image description here

Description: pid=8203, processName=com.android.systemui, MessageCode=114(CREATE_SERVICE), time-consuming 3.827s

Note: MessageCode=200 (parallel broadcast onReceive time-consuming), see ActivityThread.H class for other codes

binder thread: When the thread pool of processes such as system_server is used up and there are no idle threads, the binder communication is in a starvation state, and the information is output when the starvation state exceeds a certain threshold.

![Insert picture description here](https://img-blog.csdnimg.cn/cb115676a99d42218fb05e0f991cfd22.png)

Description: The thread pool of the system_server process is full for up to 100ms

For the above binder call information, we look for the lock information in the log

insert image description here

The lock required by PackageManagerService.java No. 3537 is held by line 3380 in UserManagerService.java. Let’s look at line 3380 in UserManagerService.java in combination with the source code. What causes the lock to take time?

Note: The binder call is time-consuming. It may be caused by the busyness of the binder during the communication process, or it may be time-consuming because the peer is holding a lock or performing some time-consuming operations. The binder call information is printed in the log to indicate that the communication between the binder call and the remote end has ended , the appearance of binder call information does not mean that there is a problem with the framework, and it needs to be accurately located according to the log analysis.

Summarize the basic analysis process:

  1. According to the log, confirm the process of ANR occurrence, the time of occurrence, and the approximate operation, and pay attention to the situation of CPU, memory, and IO at this time.
  2. Analyze the trace, first check whether the time is correct, determine whether it is a crime scene, and then pay attention to whether there are problems such as time-consuming, deadlock, and other locks in the main thread, so that you can basically see whether it is an APP problem or a system problem.
  3. If it is caused by a system problem, combine binder_sample and dvm_lock_sample to locate the problem of binder call time-consuming and system lock-holding time-consuming respectively.
  4. Combine the code or source code to analyze the problematic points in detail.

Reprint: https://www.jianshu.com/p/082045769443

Guess you like

Origin blog.csdn.net/gqg_guan/article/details/130526234