Android crash problem study notes

5b05f3cf74e42cf9b04a8d46655ac572.gif

Learn with you for lifeXi, this is Programmer Android

Recommended classic articles. By reading this article, you will gain the following knowledge points:

1. Simplified diagram of the crashed system
2. Possible reasons for the crash
3. What data needs to be analyzed for the crash problem 4.
Java Backtrace analysis
5. Common Java backtrace examples
6. Native Backtrace
7. Kernel Backtrace
8. Several typical abnormal situations

1. Simplified diagram of crash system

When the user operates the mobile phone, the corresponding data flow will be the following general flow chart.

7e30668cc43e52ff6b1edcbdd46611ee.jpeg

  • After HW, such as sensors, touch screens (TP), physical buttons (KP), etc., sense user operations, they trigger related interrupts (ISR) and pass them to the Kernel. The Kernel-related driver processes these ISRs and converts them into standard InputEvents.

  • The Input System in the System Server of the User Space continues to monitor the original InputEvent passed by the Kernel. After further processing, it becomes an Input Event that can be directly processed by the upper-layer APP, such as button click, long press, sliding, etc.

  • After processing the relevant events, the APP requests to update the relevant logical interface, and this is the responsibility of WMS in System Server.

  • After the relevant logical interface is updated (Z-Window), SurfaceFlinger will be requested to generate FrameBuffer data, and SurfaceFlinger will use the GPU to calculate and generate it.

  • Display System/Driver will update and display the data in the FrameBuffer, so that the user can perceive his operation behavior.

2. Possible causes of crash

In principle, in the above process, if there is a problem at every step, it may cause a crash problem. Generally speaking, it can be divided into two levels: hardware HW and software SW. Hardware HW is not within our scope.

1. On software SW, the reasons for crash can be divided into two types:

Logic misbehavior

  • Logical judgment error

  • Logical design errors

Logic block

  • Deadloop

  • Deadlock

2. Based on the specific reasons, they can be further divided into:

(1). Input Driver

  • Unable to receive HW's middle ISR, generate original InputEvent, or generate InputEvent exception.

(2). Input System

  • Unable to monitor the original InputEvent passed by Kernel, or conversion and transmission exceptions.

(3). System Logic

  • Unable to respond normally to the InputEvent passed by the Input System, or an error occurred in the response.

(4). WMS/Surfaceflinger behaves abnormally

  • WMS/SF cannot correctly perform overlay conversion on Z-Window

(5). Display System3

  • Unable to update Framebuffer data, or the filled data is wrong

(6). LCM Driver

  • Unable to display Framebuffer data on LCM

(7). Kernel supporting modules such as CPU Scheduler/File System/Memory Manager.

  • System support modules cannot function properly, causing the entire process to fail to proceed normally. For example, the CPU Scheduler cannot execute normally, causing threads to be unable to be scheduled. Memory exhaustion causes the system to be unable to apply for memory quickly. The file system is stuck and cannot be read or written. Operation etc.

3. Corresponding to hardware HW hang, common situations include:

  • Power exception

  • Clock/26M/32K invalid

  • CPU Core cannot start, execution abnormality, etc.

  • Memory & Memory Controller.

  • Fail IC.

3. What data needs to be analyzed for the crash problem?

1. Crash analysis data

As the saying goes, a good woman can't make a meal without straw, and crash analysis also requires first-hand information to analyze the problem. So what data can be used to analyze crashes.
Roughly speaking, it can be divided into spatial data and time data. Spatial data refers to the on-site environment at that time, such as which processes are running, the execution status of the CPU, the utilization of memory, and the memory data of specific processes, etc. Time data is continuous behavioral data, such as what operations a certain Process performs within a certain period of time, changes in CPU utilization during a certain period of time, etc. Usually time and space are integrated, and the same is often true when we capture information.

2. What data can be used for analysis.

Backtrace

Backtrace is divided into Java backtrace, Native Backtrace, and Kernel Backtrace. It is a very important means to analyze the crash. We can quickly know what actions the corresponding process/thread is performing at that time, where it is stuck, etc. It can analyze the crash scene very intuitively.

There is another type of trace called ftrace/systrace. Unless it is specially enabled and added at a specific process point, this type of trace is often not very detailed, but it also has a good reference effect.

System operating environment

Objectively reflects the execution environment of the system, usually including CPU utilization, Memory usage, Storage remaining status, etc. This information is also very important. For example, you can quickly know whether a Process was executing crazily at that time, whether it was in a serious low memory situation, whether Storage was exhausted, etc.

program execution environment

Objectively reflects the execution scene of a certain program (Kernel can also be regarded as a program) at that time. Such information usually includes coredump of process, java heap prof, memory dump of kernel, etc. With the complete execution environment, we can quickly know the specific variable values, register values, etc. at that time, and analyze the problem in detail.

Some other information

This information is relatively scattered, such as the usual LOG, the results of some debug commands, JTAG & CVD data, etc.

4. Backtrace analysis

1. Java Backtrace

From Java Backtrace, we can know the execution status of the virtual machine of Process at that time. Java Backtrace relies on SignalCatcher to capture.
Google default: SignalCatcher catches SIGQUIT(3), and then print the java backtrace to /data/anr/trace.txt
MTK Enhance : SignalCatcher catches SIGSTKFLT(16), and then print the java backtrace to /data/anr/mtktrace.txt (only Android ICS 4.0 <-> Android M 6.0 version)
You can update system properties dalvik.vm.stack-trace-file to Change the address, default is /data/anr/traces.txt

1.1 How to capture

  • In ENG Build

adb remount
adb shell chmod 0777 data/anr
adb shell kill -3 pid
adb pull /data/anr
  • In User Build

Without root permissions, you can only directly pull the existing backtrace.

adb pull /data/anr
  • You can try to directly use the following script to grab it all at once

adb remount
adb shell chmod 0777 data/anr
adb shell ps
@echo off
set processid=
set /p processid=Please Input process id:
@echo on
adb shell kill -3 %processid%
@echo off
ping -n 8 127.0.0.1>nul
@echo on
adb pull data/anr/traces.txt trace-%processid%.txt
pause

1.2 JavaBacktrace analysis

Android's relatively new version of java backtrace, in addition to direct thread backtrace, also prints out some basic status of ART, which is more convenient to observe the basic status of ART, such as:

----- pid 1051 at 2018-09-29 23:23:52 -----
Cmd line: system_server
Build fingerprint: 'XXXXX/A73/A73:8.1.0/O11019/1537977601:user/release-keys'
ABI: 'arm64'
Build type: optimized
Zygote loaded classes=5413 post zygote classes=6049
Intern table: 61702 strong; 12632 weak
JNI: CheckJNI is off; globals=10050 (plus 1015 weak)
Libraries: /system/lib64/libandroid.so /system/lib64/libandroid_servers.so /system/lib64/libcompiler_rt.so /system/lib64/libdcfdecoderjni.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libmediatek_exceptionlog.so /system/lib64/libperfframeinfo_jni.so /system/lib64/librutils.so /system/lib64/libsoundpool.so /system/lib64/libwebviewchromium_loader.so /system/lib64/libwifi-service.so /vendor/lib64/libnativecheck-jni.so libjavacore.so libopenjdk.so (17)
Heap: 9% free, 80MB/89MB; 1893251 objects
Dumping cumulative Gc timings
Start Dumping histograms for 12601 iterations for concurrent copying
ProcessMarkStack: Sum: 3760.617s 99% C.I. 35.990ms-1218.784ms Avg: 298.438ms Max: 14617.568ms
ScanImmuneSpaces: Sum: 216.241s 99% C.I. 2.719ms-110.340ms Avg: 17.160ms Max: 1644.161ms
FlipOtherThreads: Sum: 136.969s 99% C.I. 0.517ms-174.411ms Avg: 10.869ms Max: 5585.050ms
VisitConcurrentRoots: Sum: 119.937s 99% C.I. 1.928ms-50.934ms Avg: 9.518ms Max: 1356.509ms
ClearFromSpace: Sum: 66.775s 99% C.I. 0.168ms-21.473ms Avg: 5.299ms Max: 118.068ms
SweepSystemWeaks: Sum: 58.215s 99% C.I. 0.196ms-19.387ms Avg: 4.619ms Max: 342.553ms
GrayAllDirtyImmuneObjects: Sum: 57.570s 99% C.I. 0.133ms-123.041ms Avg: 4.568ms Max: 2010.547ms
ForwardSoftReferences: Sum: 37.474s 99% C.I. 0.037ms-13.834ms Avg: 2.973ms Max: 118.213ms
EnqueueFinalizerReferences: Sum: 30.905s 99% C.I. 0.094ms-22.239ms Avg: 2.452ms Max: 248.969ms
MarkingPhase: Sum: 24.277s 99% C.I. 0.257ms-51.155ms Avg: 1.926ms Max: 3743.053ms
VisitNonThreadRoots: Sum: 23.507s 99% C.I. 0.065ms-15.260ms Avg: 1.865ms Max: 190.694ms
ProcessReferences: Sum: 17.221s 99% C.I. 9.019us-4689.544us Avg: 683.321us Max: 94215us
EmptyRBMarkBitStack: Sum: 15.343s 99% C.I. 0.026ms-13.119ms Avg: 1.217ms Max: 145.672ms
ThreadListFlip: Sum: 10.730s 99% C.I. 128.387us-25549.046us Avg: 851.524us Max: 1555533us
InitializePhase: Sum: 7.689s 99% C.I. 125us-9536.627us Avg: 610.256us Max: 434902us
FlipThreadRoots: Sum: 4.346s 99% C.I. 16.387us-13773.217us Avg: 344.930us Max: 241943us
RecordFree: Sum: 4.106s 99% C.I. 95us-2474.875us Avg: 325.895us Max: 31656us
SweepAllocSpace: Sum: 2.746s 99% C.I. 32.035us-6375.082us Avg: 217.946us Max: 397272us
SweepLargeObjects: Sum: 2.611s 99% C.I. 10us-1976.352us Avg: 207.248us Max: 36205us
ResumeOtherThreads: Sum: 2.453s 99% C.I. 8.115us-8079.599us Avg: 194.712us Max: 86018us
ReclaimPhase: Sum: 2.086s 99% C.I. 9us-4266.285us Avg: 165.567us Max: 155093us
ResumeRunnableThreads: Sum: 1.759s 99% C.I. 8.063us-2676.218us Avg: 139.607us Max: 87029us
MarkStackAsLive: Sum: 782.752ms 99% C.I. 5us-799.362us Avg: 62.118us Max: 41297us
MarkZygoteLargeObjects: Sum: 741.028ms 99% C.I. 12us-899.899us Avg: 58.807us Max: 24679us
(Paused)GrayAllNewlyDirtyImmuneObjects: Sum: 651.153ms 99% C.I. 12us-373.754us Avg: 51.674us Max: 14328us
ClearRegionSpaceCards: Sum: 412.842ms 99% C.I. 8us-199.633us Avg: 32.762us Max: 16297us
(Paused)SetFromSpace: Sum: 357.553ms 99% C.I. 3us-286.340us Avg: 28.374us Max: 14500us
SwapBitmaps: Sum: 151.902ms 99% C.I. 4us-99.729us Avg: 12.054us Max: 5447us
Sweep: Sum: 110.635ms 99% C.I. 2us-49.924us Avg: 8.779us Max: 1087us
(Paused)FlipCallback: Sum: 105.455ms 99% C.I. 2us-99.800us Avg: 8.368us Max: 8433us
(Paused)ClearCards: Sum: 89.377ms 99% C.I. 1000ns-199016ns Avg: 295ns Max: 17545000ns
UnBindBitmaps: Sum: 17.838ms 99% C.I. 0.250us-49.765us Avg: 1.415us Max: 1516us
Done Dumping histograms
concurrent copying paused: Sum: 10.999s 99% C.I. 320.432us-63766.026us Avg: 872.888us Max: 1214342us
concurrent copying total time: 4607.007s mean time: 365.606ms
concurrent copying freed: 4075114399 objects with total size 152GB
concurrent copying throughput: 884547/s / 33MB/s
Cumulative bytes moved 115136403208
Cumulative objects moved 2160415247
Total time spent in GC: 4607.007s
Mean GC size throughput: 32MB/s
Mean GC object throughput: 884504 objects/s
Total number of allocations 4076810306
Total bytes allocated 145GB
Total bytes freed 145GB
Free memory 8MB
Free memory until GC 8MB
Free memory until OOME 431MB
Total memory 89MB
Max memory 512MB
Zygote space size 652KB
Total mutator paused time: 10.999s
Total time waiting for GC to complete: 71.458s
Total GC count: 12601
Total GC time: 4607.007s
Total blocking GC count: 3305
Total blocking GC time: 1302.402s
Histogram of GC count per 10000 ms: 0:11034,1:7128,2:1807,3:417,4:87,5:21,6:6,7:3,8:2,9:2,11:2,12:2,17:1
Histogram of blocking GC count per 10000 ms: 0:17706,1:2336,2:443,3:26,4:1
Registered native bytes allocated: 57354982
/system/framework/oat/arm64/com.android.location.provider.odex: speed
/system/priv-app/FusedLocation/oat/arm64/FusedLocation.odex: speed
/system/priv-app/Telecom/oat/arm64/Telecom.odex: speed
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/com.coloros.statistics.odex: quicken
/system/priv-app/SettingsProvider/oat/arm64/SettingsProvider.odex: speed
/system/framework/oat/arm64/services.odex: speed
/system/framework/oat/arm64/ethernet-service.odex: speed
/system/framework/oat/arm64/wifi-service.odex: speed
/system/framework/oat/arm64/com.android.location.provider.odex: speed
/system/framework/oat/arm64/mediatek-services.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/data/dalvik-cache/arm64/vendor@app@[email protected]@classes.dex: quicken
/system/framework/oat/arm64/mediatek-framework-net.odex: quicken
/data/dalvik-cache/arm64/vendor@app@[email protected]@classes.dex: quicken
Running non JIT

For crash situations, we pay more attention to the memory usage and GC situation of java heap:

Heap: 9% free, 80MB/89MB; 1893251 objects 
Free memory 8MB
Free memory until GC 8MB
Free memory until OOME 431MB
Total memory 89MB
Max memory 512MB

==》You can simply know whether there are object leaks in the Java layer and the situations that trigger GC.

After removing these, let's look at the specific information of java backtrace.

The following is the beginning of a short java backtrace of system server

----- pid 682 at 2014-07-30 18:04:53 -----
Cmd line: system_server
JNI: CheckJNI is off; workarounds are off; pins=4; globals=1484 (plus 50 weak)
DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)  <== 只有老版本有.
"main" prio=5 tid=1 NATIVE
  | group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8
  | sysTid=682 nice=-2 sched=0/0 cgrp=apps handle=1074835940
  | state=S schedstat=( 47858718206 26265263191 44902 ) utm=4074 stm=711 core=0
  at android.os.MessageQueue.nativePollOnce(Native Method)
  at android.os.MessageQueue.next(MessageQueue.java:138)
  at android.os.Looper.loop(Looper.java:150)
  at com.android.server.ServerThread.initAndLoop(SystemServer.java:1468)
  at com.android.server.SystemServer.main(SystemServer.java:1563)
  at java.lang.reflect.Method.invokeNative(Native Method)
  at java.lang.reflect.Method.invoke(Method.java:515)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:829)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:645)
  at dalvik.system.NativeStart.main(Native Method)

Let's analyze it line by line.

  • 0# The beginning is -----PID at Time and then Cmd line: process name

  • 1# the backtrace header: dvm thread: "DALVIK THREADS:"
    The new version includes thread quantities, such as: "DALVIK THREADS (214):" You can observe whether there are java thread leaks.

  • 2# Global DVM mutex value: if 0 unlock, else lock
    tll: threadListLock,
    tsl: threadSuspendLock,
    tscl: threadSuspendCountLock
    ghl: gcHeapLock

(mutexes: tll=0 tsl=0 tscl=0 ghl=0)

  • 3# thread name, java thread Priority, [daemon], DVM thread id, DVM thread status.
    "main" -> main thread -> activity thread
    prio -> java thread priority default is 5, (nice =0, linux thread priority 120), domain is [1,10]
    DVM thread id, NOT linux thread id
    DVM thread Status:
    ZOMBIE, RUNNABLE, TIMED_WAIT, MONITOR, WAIT, INITALIZING,STARTING, NATIVE, VMWAIT, SUSPENDED,UNKNOWN

"main" prio=5 tid=1 NATIVE

  • 4# DVM thread status
    group: default is “main”
    Compiler,JDWP,Signal Catcher,GC,FinalizerWatchdogDaemon,FinalizerDaemon,ReferenceQueueDaemon are system group
    sCount: thread suspend count
    dsCount: thread dbg suspend count
    obj: thread obj address
    Sef: thread point address

group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8

  • 5 Linux thread status

sysTId: linux thread tid
Nice: linux thread nice value
sched: cgroup policy/gourp id
cgrp: c group
handle: handle address
sysTid=682 nice=-2 sched=0/0 cgrp=apps handle=1074835940

  • 6 CPU Sched stat

Schedstat (Run CPU Clock/ns, Wait CPU Clock/ns, Slice times)
utm: utime, user space time(jiffies)
stm: stime, kernel space time(jiffies)
Core now running in cpu.
state=S schedstat=( 47858718206 26265263191 44902 ) utm=4074 stm=711 core=0

5. Common Java backtrace examples

1.ActivityThread Normal State/ActivityThread Normal Case

Message Queue is empty, and thread wait for next message.
 "main" prio=5 tid=1 NATIVE
   | group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8
   | sysTid=11559 nice=0 sched=0/0 cgrp=apps/bg_non_interactive handle=1074835940
   | state=S schedstat=( 2397315020 9177261498 7975 ) utm=100 stm=139 core=1
   at android.os.MessageQueue.nativePollOnce(Native Method)
   at android.os.MessageQueue.next(MessageQueue.java:138)
   at android.os.Looper.loop(Looper.java:150)
   at android.app.ActivityThread.main(ActivityThread.java:5299)
   at java.lang.reflect.Method.invokeNative(Native Method)
   at java.lang.reflect.Method.invoke(Method.java:515)
   at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:829)
   at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:645)
   at dalvik.system.NativeStart.main(Native Method)

2.Java Backtrace Monitor case

Synchronized Lock: 等待同步锁时的backtrace.
 "AnrMonitorThread" prio=5 tid=24 MONITOR
   | group="main" sCount=1 dsCount=0 obj=0x41fd80c8 self=0x551ac808
   | sysTid=711 nice=0 sched=0/0 cgrp=apps handle=1356369328
   | state=S schedstat=( 8265377638 4744771625 6892 ) utm=160 stm=666 core=0
   at com.android.server.am.ANRManager$AnrDumpMgr.dumpAnrDebugInfoLocked(SourceFile:~832)
   - waiting to lock <0x42838968> (a com.android.server.am.ANRManager$AnrDumpRecord) held by tid=20 (ActivityManager)
   at com.android.server.am.ANRManager$AnrDumpMgr.dumpAnrDebugInfo(SourceFile:824)
   at com.android.server.am.ANRManager$AnrMonitorHandler.handleMessage(SourceFile:220)
   at android.os.Handler.dispatchMessage(Handler.java:110)
   at android.os.Looper.loop(Looper.java:193)
   at android.os.HandlerThread.run(HandlerThread.java:61)

3. The execution of JNI code does not return and the status is native.

"WifiP2pService" prio=5 tid=37 NATIVE
   | group="main" sCount=1 dsCount=0 obj=0x427a9910 self=0x55f088d8
   | sysTid=734 nice=0 sched=0/0 cgrp=apps handle=1443230288
   | state=S schedstat=( 91121772 135245305 170 ) utm=7 stm=2 core=1
   #00  pc 00032700  /system/lib/libc.so (epoll_wait+12)
   #01  pc 000105e3  /system/lib/libutils.so (android::Looper::pollInner(int)+94)
   #02  pc 00010811  /system/lib/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+92)
   #03  pc 0006c96d  /system/lib/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, int)+22)
   #04  pc 0001eacc  /system/lib/libdvm.so (dvmPlatformInvoke+112)
   #05  pc 0004fed9  /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+484)
   #06  pc 00027ea8  /system/lib/libdvm.so
   #07  pc 0002f4b0  /system/lib/libdvm.so (dvmMterpStd(Thread*)+76)
   #08  pc 0002c994  /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+188)
   #09  pc 000632a5  /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+340)
   #10  pc 000632c9  /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20)
   #11  pc 00057961  /system/lib/libdvm.so
   #12  pc 0000dd40  /system/lib/libc.so (__thread_entry+72)
   at android.os.MessageQueue.nativePollOnce(Native Method)
   at android.os.MessageQueue.next(MessageQueue.java:138)
   at android.os.Looper.loop(Looper.java:150)
   at android.os.HandlerThread.run(HandlerThread.java:61)

4. Execute object.wait waiting state

"AsyncTask #1" prio=5 tid=33 WAIT
   | group="main" sCount=1 dsCount=0 obj=0x427a8480 self=0x56036b40
   | sysTid=733 nice=10 sched=0/0 cgrp=apps/bg_non_interactive handle=1443076000
   | state=S schedstat=( 1941480839 10140523154 4229 ) utm=119 stm=75 core=0
   at java.lang.Object.wait(Native Method)
   - waiting on <0x427a8618> (a java.lang.VMThread) held by tid=33 (AsyncTask #1)
   at java.lang.Thread.parkFor(Thread.java:1212)
   at sun.misc.Unsafe.park(Unsafe.java:325)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:157)
   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2017)
   at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:410)
   at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1035)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1097)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
   at java.lang.Thread.run(Thread.java:848)

5. Suspend status, usually indicates that when the backtrace is captured, the java code is still executing and is forced to suspend.

"FileObserver" prio=5 tid=23 SUSPENDED
   | group="main" sCount=1 dsCount=0 obj=0x41fd1dc8 self=0x551abda0
   | sysTid=710 nice=0 sched=0/0 cgrp=apps handle=1427817920
   | state=S schedstat=( 130152222 399783851 383 ) utm=9 stm=4 core=0
   #00  pc 000329f8  /system/lib/libc.so (__futex_syscall3+8)
   #01  pc 000108cc  /system/lib/libc.so (__pthread_cond_timedwait_relative+48)
   #02  pc 0001092c  /system/lib/libc.so (__pthread_cond_timedwait+64)
   #03  pc 00055a93  /system/lib/libdvm.so
   #04  pc 0005614d  /system/lib/libdvm.so (dvmChangeStatus(Thread*, ThreadStatus)+34)
   #05  pc 0004ae7f  /system/lib/libdvm.so
   #06  pc 0004e353  /system/lib/libdvm.so
   #07  pc 000518d5  /system/lib/libandroid_runtime.so
   #08  pc 0008af9f  /system/lib/libandroid_runtime.so
   #09  pc 0001eacc  /system/lib/libdvm.so (dvmPlatformInvoke+112)
   #10  pc 0004fed9  /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+484)
   #11  pc 00027ea8  /system/lib/libdvm.so
   #12  pc 0002f4b0  /system/lib/libdvm.so (dvmMterpStd(Thread*)+76)
   #13  pc 0002c994  /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+188)
   #14  pc 000632a5  /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+340)
   #15  pc 000632c9  /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20)
   #16  pc 00057961  /system/lib/libdvm.so
   #17  pc 0000dd40  /system/lib/libc.so (__thread_entry+72)
   at android.os.FileObserver$ObserverThread.observe(Native Method)
   at android.os.FileObserver$ObserverThread.run(FileObserver.java:88)

6. Native Backtrace

1. Native Backtrace capture method

1. Use debuggerd to capture

MTK has produced a tool RTT (Runtime Trace) that uses debuggerd to capture Native backtrace. The corresponding execution command is:

rtt built timestamp (Apr 18 2014 15:36:21)
USAGE : rtt [-h] -f function -p pid [-t tid]
  -f funcion : current support functions:
                 bt  (Backtrace function)
  -p pid     : pid to trace
  -t tid     : tid to trace
  -n name    : process name to trace
  -h         : help menu

This rtt is currently only available in eng/userdebug; in addition, you can use Google's default debuggerd to crawl:

adb shell debuggerd -b pid

Note that these are dependent on the normal process of debuggerd and the normal behavior of debuggerd_signal_handler corresponding to the crawling process, otherwise it will not be captured.

2. Add code to capture directly

Google provides CallStack API by default, please refer to

system/core/include/libutils/CallStack.h 
system/core/libutils/CallStack.cpp

Quickly print backtrace of a single thread.

2.Analysis Native Backtrace

You can use tools such as GDB or addr2line to parse the captured Native Backtrace to know the native code being executed at that time.
For example, addr2line executes

arm-linux-androideabi-addr2line  -f -C -e symbols address

7. Kernel Backtrace

1.Kernel Backtrace capture method

1 Fetch at runtime

  • AEE/RTT tools

  • Proc System

cat proc/pid/task/tid/stack
  • Sysrq-trigger

adb shell cat proc/kmsg > kmsg.txt
 adb shell "echo 8 > proc/sys/kernel/printk“ //修改printk loglevel
 adb shell "echo t > /proc/sysrq-trigger“ //打印所有的backtrace
 adb shell "echo w > /proc/sysrq-trigger“//打印'-D' status 'D' 的 process
  • KDB

Long press volume UP and DOWN more then 10s
 btp             <pid>                
 Display stack for process <pid>
 bta             [DRSTCZEUIMA]        
 Display stack all processes
 btc                                  
 Backtrace current process on each cpu
 btt             <vaddr>              
 Backtrace process given its struct task add

2. Add code to capture directly

#include <linux/sched.h>
 当前thread:  dump_stack();
 其他thread:  show_stack(task, NULL);

3. Process/Thread status

"R (running)", /* 0 /
"S (sleeping)", /
 1 /
"D (disk sleep)", /
 2 /
"T (stopped)", /
 4 /
"t (tracing stop)", /
 8 /
"Z (zombie)", /
 16 /
"X (dead)", /
 32 /
"x (dead)", /
 64 /
"K (wakekill)", /
 128 /
"W (waking)", /
 256 */

Usually, the general process state is S (sleeping), and if it is found to be in a state such as D (disk sleep), T (stopped), Z (zombie), etc., it must be carefully reviewed.

8. Several typical abnormal situations

1. Deadlock

In the following case, you can see that PowerManagerService, ActivityManager, and WindowManager deadlock each other.

"PowerManagerService" prio=5 tid=25 MONITOR
  | group="main" sCount=1 dsCount=0 obj=0x42bae270 self=0x6525d5c0
  | sysTid=913 nice=0 sched=0/0 cgrp=apps handle=1696964440
  | state=S schedstat=( 5088939411 10237027338 34016 ) utm=232 stm=276 core=2
  at com.android.server.am.ActivityManagerService.bindService(ActivityManagerService.java:~14079)
  - waiting to lock <0x42aa0f78> (a com.android.server.am.ActivityManagerService) held by tid=16 (ActivityManager)
  at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:1665)
  at android.app.ContextImpl.bindService(ContextImpl.java:1648)
  at com.android.server.power.PowerManagerService.bindSmartStandByService(PowerManagerService.java:4090)
  at com.android.server.power.PowerManagerService.handleSmartStandBySettingChangedLocked(PowerManagerService.java:4144)
  at com.android.server.power.PowerManagerService.access$5600(PowerManagerService.java:102)
  at com.android.server.power.PowerManagerService$SmartStandBySettingObserver.onChange(PowerManagerService.java:4132)
  at android.database.ContentObserver$NotificationRunnable.run(ContentObserver.java:181)
  at android.os.Handler.handleCallback(Handler.java:809)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:139)
  at android.os.HandlerThread.run(HandlerThread.java:58)
  
  "ActivityManager" prio=5 tid=16 MONITOR
  | group="main" sCount=1 dsCount=0 obj=0x42aa0d08 self=0x649166b0
  | sysTid=902 nice=-2 sched=0/0 cgrp=apps handle=1687251744
  | state=S schedstat=( 39360881460 25703061063 69675 ) utm=1544 stm=2392 core=2
  at com.android.server.wm.WindowManagerService.setAppVisibility(WindowManagerService.java:~4783)
  - waiting to lock <0x42d17590> (a java.util.HashMap) held by tid=12 (WindowManager)
  at com.android.server.am.ActivityStack.stopActivityLocked(ActivityStack.java:2432)
  at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:2103)
  at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:2914)
  at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:2921)
  at android.os.Handler.dispatchMessage(Handler.java:110)
  at android.os.Looper.loop(Looper.java:147)
  at com.android.server.am.ActivityManagerService$AThread.run(ActivityManagerService.java:2112)
  
  "WindowManager" prio=5 tid=12 MONITOR
  | group="main" sCount=1 dsCount=0 obj=0x42a92550 self=0x6491dd48
  | sysTid=898 nice=-4 sched=0/0 cgrp=apps handle=1687201104
  | state=S schedstat=( 60734070955 41987172579 219755 ) utm=4659 stm=1414 core=1
  at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManagerInternal(PowerManagerService.java:~3207)
  - waiting to lock <0x42a95140> (a java.lang.Object) held by tid=25 (PowerManagerService)
  at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManager(PowerManagerService.java:3196)
  at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedInner(WindowManagerService.java:9686)
  at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedLoop(WindowManagerService.java:8923)
  at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLocked(WindowManagerService.java:8879)
  at com.android.server.wm.WindowManagerService.access$500(WindowManagerService.java:170)
  at com.android.server.wm.WindowManagerService$H.handleMessage(WindowManagerService.java:7795)
  at android.os.Handler.dispatchMessage(Handler.java:110)
  at android.os.Looper.loop(Looper.java:147)
  at android.os.HandlerThread.run(HandlerThread.java:58)

2. Execute JNI native code and never return.

"main" prio=5 tid=1 NATIVE
  | group="main" sCount=1 dsCount=0 obj=0x41bb3d98 self=0x41ba2878
  | sysTid=814 nice=-2 sched=0/0 cgrp=apps handle=1074389380
  | state=D schedstat=( 22048890928 19526803112 32612 ) utm=1670 stm=534 core=0
  (native backtrace unavailable)
  at android.hardware.SystemSensorManager$BaseEventQueue.nativeDisableSensor(Native Method)
  at android.hardware.SystemSensorManager$BaseEventQueue.disableSensor(SystemSensorManager.java:399)
  at android.hardware.SystemSensorManager$BaseEventQueue.removeAllSensors(SystemSensorManager.java:325)
  at android.hardware.SystemSensorManager.unregisterListenerImpl(SystemSensorManager.java:194)
  at android.hardware.SensorManager.unregisterListener(SensorManager.java:560)
  at com.android.internal.policy.impl.WindowOrientationListener.disable(WindowOrientationListener.java:139)
  at com.android.internal.policy.impl.PhoneWindowManager.updateOrientationListenerLp(PhoneWindowManager.java:774)
  at com.android.internal.policy.impl.PhoneWindowManager.screenTurnedOff(PhoneWindowManager.java:4897)
  at com.android.server.power.Notifier.sendGoToSleepBroadcast(Notifier.java:518)
  at com.android.server.power.Notifier.sendNextBroadcast(Notifier.java:434)
  at com.android.server.power.Notifier.access$500(Notifier.java:63)
  at com.android.server.power.Notifier$NotifierHandler.handleMessage(Notifier.java:584)
  at android.os.Handler.dispatchMessage(Handler.java:110)
  at android.os.Looper.loop(Looper.java:193)
  at com.android.server.ServerThread.initAndLoop(SystemServer.java:1436)
  at com.android.server.SystemServer.main(SystemServer.java:1531)
  at java.lang.reflect.Method.invokeNative(Native Method)
  at java.lang.reflect.Method.invoke(Method.java:515)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:824)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:640)
  at dalvik.system.NativeStart.main(Native Method)

===>
KERNEL SPACE BACKTRACE, sysTid=814

[<ffffffff>] 0xffffffff from [<c07e5140>] __schedule+0x3fc/0x950
 [<c07e4d50>] __schedule+0xc/0x950 from [<c07e57e4>] schedule+0x40/0x80
 [<c07e57b0>] schedule+0xc/0x80 from [<c07e5ae4>] schedule_preempt_disabled+0x20/0x2c
 [<c07e5ad0>] schedule_preempt_disabled+0xc/0x2c from [<c07e3c3c>] mutex_lock_nested+0x1b8/0x560
 [<c07e3a90>] mutex_lock_nested+0xc/0x560 from [<c05667d8>] gsensor_operate+0x1bc/0x2c0
 [<c0566628>] gsensor_operate+0xc/0x2c0 from [<c0495fa0>] hwmsen_enable+0xa8/0x30c
 [<c0495f04>] hwmsen_enable+0xc/0x30c from [<c0496500>] hwmsen_unlocked_ioctl+0x2fc/0x528
 [<c0496210>] hwmsen_unlocked_ioctl+0xc/0x528 from [<c018ad98>] do_vfs_ioctl+0x94/0x5bc
 [<c018ad10>] do_vfs_ioctl+0xc/0x5bc from [<c018b33c>] sys_ioctl+0x7c/0x8c
 [<c018b2cc>] sys_ioctl+0xc/0x8c from [<c000e480>] ret_fast_syscall+0x0/0x40
 [<ffffffff>]  from [<ffffffff>]

references:

[Tencent Documentation] Android Framework Knowledge Base
https://docs.qq.com/doc/DSXBmSG9VbEROUXF5

Friendly recommendation:

Collection of useful information on Android development

At this point, this article has ended. The editor thinks the article is reprinted from the Internet and is excellent. You are welcome to click to read the original article and support the original author. If there is any infringement, please contact the editor to delete it. Your suggestions and corrections are welcome. We look forward to your attention and thank you for reading, thank you!

4c0e1f6f731caed8cee5e2dc1c78e582.jpeg

Click to read the original article and like the boss!

Guess you like

Origin blog.csdn.net/wjky2014/article/details/132061997