Android APP native crash confusing analysis of the backtrace

Perfect code logic, will be able to produce perfect procedure? the answer is negative. From the software's perspective, perhaps only a binary only never deceive you.

phenomenon

Recently, the business side feedback a strange crash that not enough information to resolve.

Signal: 11 (SIGSEGV), Code: 1 (SEGV_MAPERR)

r0  993ff520  r1  dc3170c4  r2  00000000  r3  dabe3e08
r4  993ff520  r5  00000005  r6  00000290  r7  000007ac
r8 e83253a0 r9 00006aba r10 bf921e39 r11 e83253a0 ip bfa3a9e0 sp 993ff494 lr bf88a71d pc bf96c31c #00 pc 001a731c /data/data/com.package.name/files/download/libmcto_media_player.so #01 pc 0020b7e5 /data/data/com.package.name/files/download/libmcto_media_player.so #00 993ff494 0000022c 993ff498 adcfd000  [anon:libc_malloc] 993ff49c bf88a71d /data/data/com.package.name/files/download/libmcto_media_player.so 993ff4a0 ffffffff 993ff4a4 ffffffff 993ff4a8 bf9d07e7 /data/data/com.package.name/files/download/libmcto_media_player.so #01 993ff4ac 00000000 993ff4b0 00000000 993ff4b4 00000000 993ff4b8 00000000 993ff4bc 00000000 993ff4c0 adcfd234  [anon:libc_malloc] 993ff4c4 00000000 993ff4c8 0000006e 993ff4cc 00000000 993ff4d0 adcfdf6c  [anon:libc_malloc] 993ff4d4 00000000 993ff4d8 00000000 993ff4dc 00000000 993ff4e0 00000000 993ff4e4 00000000 993ff4e8 00000000 

This first impression is certainly dynamic business logic libraries have bug, led to a segmentation fault. backtrace is indeed incomplete, but what I can not see why not complete, really need help analyze.

analysis

Common causes of incomplete backtrace

Cases have occurred backtrace is not complete, there are common reasons:

  • stack memory is write a lot wrong time of the crash. If the external input random logic in the vicinity of the crash site being processed is large, the situation is even worse, we tend to see a large number of discrete incomplete backtrace. For example:
#00 pc 00000ffb  <anonymous:c34fe000>
#01 pc 0009a885  /data/app/com.package.name-1/lib/arm/libjsc.so
#02 pc 0003ff93 /data/app/com.package.name-1/lib/arm/libjsc.so #03 pc 0011f60f /data/app/com.package.name-1/lib/arm/libjsc.so #04 pc fffffffb <unknown> 
#00 pc 000092fe  <anonymous:c15d0000>
#01 pc 00099ec3  /data/app/com.package.name-1/lib/arm/libjsc.so
#02 pc 00003ffe <anonymous:bf140000> 
#00 pc 00000ffb  <anonymous:ef304000>
  • unwind table some of the ELF file on the call path is incomplete. For example, some systems odex / oat, there WebView system of Chromium, fall into this category. For example:
#00 pc 00d12bcc  /system/lib/libwebviewchromium.so
#00 pc 01a0cf72  /system/app/WebViewGoogle/WebViewGoogle.apk!libwebviewchromium.so (offset 0x46da000)
#00 pc 00006fde  /data/app/com.package.name-1/lib/arm/libcros.so
#01 pc 00007007  /data/app/com.package.name-1/lib/arm/libcros.so
#02 pc 00007023 /data/app/com.package.name-1/lib/arm/libcros.so #03 pc 00007037 /data/app/com.package.name-1/lib/arm/libcros.so #04 pc 000070d1 /data/app/com.package.name-1/lib/arm/libcros.so #05 pc 000049bf /data/app/com.package.name-1/lib/arm/libcros.so #06 pc 000092e3 /data/app/com.package.name-1/oat/arm/base.odex 
#00 pc 00013792  /system/lib/libc.so (__futex_wait_ex+49)
#01 pc 00013b21  /system/lib/libc.so (pthread_mutex_lock+310)
#02 pc 00028351 /system/lib/libc.so (dlfree+48) #03 pc 0000ef33 /system/lib/libc.so (free+10) #04 pc 0000a367 /system/lib/libjavacrypto.so #05 pc 0000bc4d /system/lib/libjavacrypto.so #06 pc 022fd081 /system/framework/arm/boot.oat 
  • Some call the ELF file on the path itself is damaged, or removed. Further, if the crash damage ELF point itself is located in the received signal may be SIGBUS. For example:
#00 pc 5d9840f2  <unknown>
#01 pc 4008ab6c  <unknown>
#00 pc 00392fd0  /system/lib/egl/libGLES_mali.so
#01 pc 0002ab7b  /system/lib/libgui.so (_ZN7android10GLConsumer22bindTextureImageLockedEv+182)
#02 pc 0002b3a9 /system/lib/libgui.so (_ZN7android10GLConsumer14updateTexImageEv+208) #03 pc b3317c6c <unknown> 
  • SharedMemory instructions executed located, in which case the read ELF content may be unreliable, in order to avoid misleading, usually choose active termination unwind. For example:
#00 pc 0007a010 /dev/ashmem/dalvik-jit-code-cache (deleted)
#00 pc 00019e64  /system/lib/libssl.so (SSL_clear+19)
#01 pc 000103b5  /system/lib/libjavacrypto.so (_ZL25NativeCrypto_SSL_shutdownP7_JNIEnvP7_jclassxP8_jobjectS4_+156)
#02 pc 00027a7d /system/framework/arm/boot-conscrypt.oat (com.android.org.conscrypt.NativeCrypto.SSL_shutdown+156) #03 pc 00032a03 /system/framework/arm/boot-conscrypt.oat (com.android.org.conscrypt.OpenSSLSocketImpl.shutdownAndFreeSslNative+138) #04 pc 0003330b /system/framework/arm/boot-conscrypt.oat (com.android.org.conscrypt.OpenSSLSocketImpl.close+434) #05 pc 003e0931 /system/lib/libart.so (art_quick_invoke_stub_internal+64) #06 pc 003e4ea3 /system/lib/libart.so (art_quick_invoke_stub+226) #07 pc 000ac2d9 /system/lib/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+140) #08 pc 001f27fb /system/lib/libart.so (_ZN3art11interpreter34ArtInterpreterToCompiledCodeBridgeEPNS_6ThreadEPNS_9ArtMethodEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameEPNS_6JValueE+238) #09 pc 001edd71 /system/lib/libart.so (_ZN3art11interpreter6DoCallILb0ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+576) #10 pc 003cce3d /system/lib/libart.so (MterpInvokeVirtualQuick+504) #11 pc 003d6994 /system/lib/libart.so (ExecuteMterpImpl+29972) #12 pc 001d5351 /system/lib/libart.so (www.qilinchengdl.cn_ZN3art11interpreterL7ExecuteEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS_11ShadowFrameENS_6JValueEb+340) #13 pc 001da6a3 /system/lib/libart.so (_ZN3art11interpreter33ArtInterpreterToInterpreterBridgeEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameEPNS_6JValueE+142) #14 pc 001edd5b /system/lib/libart.so (_ZN3art11interpreter6DoCallILb0ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+554) #15 pc 003cb927 /system/lib/libart.so (MterpInvokeStatic+322) #16 pc 003d2d94 /system/lib/libart.so (ExecuteMterpImpl+14612) #17 pc 001d5351 /system/lib/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS_11ShadowFrameENS_6JValueEb+340) #18 pc 001da6a3 /system/lib/libart.so (_ZN3art11interpreter33ArtInterpreterToInterpreterBridgeEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameEPNS_6JValueE+142) #19 pc 001ee931 /system/lib/libart.so (_ZN3art11interpreter6DoCallILb1ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+420) #20 pc 003cc9eb /system/lib/libart.so (MterpInvokeDirectRange+294) #21 pc 003d3014 /system/lib/libart.so (ExecuteMterpImpl+15252) #22 pc 001d5351 /system/lib/libart.so (www.frgjyL.cn_ZN3art11interpreterL7ExecuteEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS_11ShadowFrameENS_6JValueEb+340) #23 pc 001da6a3 /system/lib/libart.so (_ZN3art11interpreter33ArtInterpreterToInterpreterBridgeEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameEPNS_6JValueE+142) #24 pc 001ee931 /system/lib/libart.so (_ZN3art11interpreter6DoCallILb1ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+420) #25 pc 003cc9eb /system/lib/libart.so (MterpInvokeDirectRange+294) #26 pc 003d3014 /system/lib/libart.so (ExecuteMterpImpl+15252) #27 pc 001d5351 /system/lib/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS_11ShadowFrameENS_6JValueEb+340) #28 pc 001da6a3 /system/lib/libart.so (_ZN3art11interpreter33ArtInterpreterToInterpreterBridgeEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameEPNS_6JValueE+142) #29 pc 001edd5b /system/lib/libart.so (www.xinhezaixia.cn_ZN3art11interpreter6DoCallILb0ELb0EEEbPNS_9ArtMethodEPNS_6ThreadERNS_11ShadowFrameEPKNS_11InstructionEtPNS_6JValueE+554) #30 pc 003cce3d /system/lib/libart.so (MterpInvokeVirtualQuick+504) #31 pc 003d6994 /system/lib/libart.so (ExecuteMterpImpl+29972) #32 pc 001d5351 /system/lib/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadEPKNS_7DexFile8CodeItemERNS_11ShadowFrameENS_6JValueEb+340) #33 pc 001da5f1 /system/lib/libart.so (_ZN3art11interpreter30EnterInterpreterFromEntryPointEPNS_6ThreadEPKNS_7DexFile8CodeItemEPNS_11ShadowFrameE+92) #34 pc 003c0fbd /system/lib/libart.so (www.huishenggw.cn artQuickToInterpreterBridge+944) #35 pc 003e46f1 /system/lib/libart.so (art_quick_to_interpreter_bridge+32) #36 pc 000a5511 /dev/ashmem/dalvik-jit-code-cache (deleted) 

Preliminary analysis of the crash location

Back to the question itself. Look at the crash location  001a731c:

.text:001A7310  STMFD   SP!, {R4,R5,LR}
.text:001A7314  LDR     R5, [R1]
.text:001A7318  MOV     R4, R0
.text:001A731C  LDR     R3, [R5,#-4] ;崩溃发生在这里
.text:001A7320 SUB SP, SP, #0xC .text:001A7324 CMP R3, #0 .text:001A7328 SUB R0, R5, #0xC .text:001A732C BLT loc_1A7350 .text:001A7330 LDR R3, =(dword_2759D4 - 0x1A733C) .text:001A7334 ADD R3, PC, R3 ; dword_2759D4 .text:001A7338 CMP R0, R3 .text:001A733C BNE loc_1A7364 .text:001A7340 loc_1A7340 .text:001A7340 STR R5, [R4] .text:001A7344 MOV R0, R4 .text:001A7348 ADD SP, SP, #0xC .text:001A734C LDMFD SP!, {R4,R5,PC} .text:001A7350 ADD R1, SP, #0x18+var_14 .text:001A7354 MOV R2, #0 .text:001A7358 BL sub_1A6EA8 .text:001A735C MOV R5, R0 .text:001A7360 B loc_1A7340 .text:001A7364 MOV R1, #1 .text:001A7368 ADD R0, R0, #8 .text:001A736C BL sub_1C2CAC .text:001A7370 B loc_1A7340 

This is a relatively short period of complete call process. First push the  R4, R5, LR, and begins execution. To execute  R3, [R5,#-4] when mistakes happen, R5 the current value is  0x5, which is almost able to know without even looking at maps  0x1 ( 0x5 - 0x4 = 0x1) virtual memory address must be illegal, the occurrence of mistakes is normal. Signal code Shi  SEGV_MAPERR, is also in line with expectations.

Since only two lines of backtrace, we continue to look at a line at  0020b7e5:

............
.rodata:0020B795              DCB "try_count_=%d",0
.rodata:0020B7E3 asc_20B7E3   DCB "://",0 .rodata:0020B7E7 aCdn DCB "CDN",0 ............ 

Surprisingly  0020b7e5 it located  .rodata in, but it also helps to explain why the unwind process interrupted (backtrace is incomplete).

Suspicious

Again look back command position near collapse, it did find suspicious:

.text:001A7310  STMFD   SP!, {R4,R5,LR}
............
.text:001A731C  LDR     R3, [R5,#-4] ;崩溃发生在这里
.text:001A7320  SUB     SP, SP, #0xC
............
.text:001A7348  ADD     SP, SP, #0xC
.text:001A734C LDMFD SP!, {R4,R5,PC} 

In this relatively short period of the call, using a total of only 24 bytes of stack memory space, but  SP actually it is not a one-time move into position, which is very unusual.

unwind table

Look at the unwind table:

$ arm-linux-androideabi-readelf -u ./libmcto_media_player.so

............

0x1a7268: 0x80b108ab
  Compact model index: 0
  0xb1 0x08 pop {r3} 0xab pop {r4, r5, r6, r7, r14} 0x1a7310: 0x8002a9b0 Compact model index: 0 0x02 vsp = vsp + 12 0xa9 pop {r4, r5, r14} 0xb0 finish 0x1a7424: 0x8001a8b0 Compact model index: 0 0x01 vsp = vsp + 8 0xa8 pop {r4, r14} 0xb0 finish ............ 

Crash position  001a731c matching the starting offset for the  1a7310 unwind information code  0x8002a9b0, according to the information presented here, the time unwind  SP values necessary to increase the total of 24 bytes (24 bytes of the data stack). It can be seen from the foregoing assembly instructions, when the crash occurs (to execute  001a731c), SP the value is only reduced by 12 bytes ( STMFD SP!, {R4,R5,LR}), which is a problem.

Look stack

The stack of data:

#00  993ff494  0000022c
     993ff498  adcfd000  [www.cmyLgw.cn  anon:libc_malloc]
     993ff49c  bf88a71d  /data/data/com.package.name/files/download/libmcto_media_player.so
     993ff4a0  ffffffff
     993ff4a4  ffffffff
     993ff4a8  bf9d07e7  /data/data/com.package.name/files/download/libmcto_media_player.so
#01  993ff4ac  00000000
     993ff4b0  00000000
     993ff4b4  00000000
............

In fact, we see the unwind process is strictly in accordance with the information unwind table is carried out, so he was misled, SP than the actual needs of multi moved 12 bytes, the real  LR stored in memory address  993ff49c in its value is  bf88a71d, according to maps information, we calculated the absolute address relative to the current ELF is offset, but unfortunately due to the logical business side dynamic library is more complex, calling level is relatively deep, and the process is terminated prematurely unwind, alone existing registers, stack and memory information has been enough to help the business side positioning problem.

What 001a731c at exactly?

This situation "unwind table information" and "sequence of assembler instructions corresponding to" conflicting uncommon. 001a731c At what function? Why is there such a sequence of instructions?

From the business side, where to get the dynamic library file with debugging symbols:

$arm-linux-androideabi-addr2line -f -e ./libmcto_media_player.so 001a731c
_ZNSsC1ERKSs
libgcc2.c:?

$arm-linux-androideabi-c++filt -n _ZNSsC1ERKSs
std::basic_string<char, std:www.chengmyuLegw.cn:char_traits<www.chengmingdl.com>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 

It turned out to be  std::basic_string a constructor.

This problem is most likely caused by a bug NDK. Learned from the business side, NDK version they are using r9d.

You used to know NDK

Developing and maintaining a cross-platform cross-compiler tool is not easy. With respect to the C language, the compiler needs to pay a lot of extra effort to ensure that the various grammatical feature of C ++ can work as expected at runtime, but also has long co-existence of different versions of the C ++ standard library situation. To support the new version of Android changes to the underlying, but also to maintain backward compatibility. NDK actually not as reliable as we had expected. NDK can go to the official github issues to find out the status quo.

NDK from r11 start only in clearly listed in the Changelog important Known Issues.

We see impressively wrote in the Known Issues r11 Changelog:

Exception handling will often fail when using c++_shared on ARM32. The root cause is incompatibility between the LLVM unwinder used by libc++abi for ARM32 and libgcc. This is not a regression from r10e.

In the Known Issues r12 Changelog wrote:

Exception unwinding with c++_shared still does not work for ARM on Gingerbread or Ice Cream Sandwich.

We know that C ++ exception handling at run time is dependent on the unwind. The problem should lies here.

in conclusion

Business party uses a newer version of the NDK recompile the DLL, we examined the  std::basic_string corresponding assembly instructions and found this  SP at a function at the beginning of a one-time move in place. It should be no problem.

Business party line on the recompilation of dynamic libraries, to get a complete collapse of the backtrace, locate and fix the mistakes caused by bug problem itself.

Therefore, the reason for this problem is to backtrace is incomplete: the low version of the NDK bug, results in the generation of dynamic libraries can not be properly executed unwind in some cases.

Known Issues in the foregoing description, backtrace after the collapse not only get sometimes affected their business logic used in place of C ++ exception mechanism is also likely to be affected, specifically, is this: After an exception is thrown, the code may not be as logic expected to go step by step implementation of abnormal capture logic. If there really is such a hidden problem, I hope this NDK version upgrades can fix them together.

About the crash capture tool

And finally to our advertising time.

All of the above crash information online, are using our own developed Android APP collapse capture tool xCrash capture.

Guess you like

Origin www.cnblogs.com/qwangxiao/p/10962305.html