Monkey跑出framework crash,最终发现是由于ANR产生了死锁,在WD检查锁时,kill掉了System Server进程引起的。
下面看看ANR的线程trace来分析死锁原因。
从主线程调用栈看,主线程block,而block的原因是等待锁:0x3fd06119,而该锁正在被thread80持有:
DALVIK THREADS(89): "main"prio=5 tid=1 Blocked | group="main" sCount=2 dsCount=0obj=0x73d2c050 self=0xb8ab27f8 | sysTid=556 nice=-2 cgrp=default sched=1/32handle=0xb6f5bbec | state=S schedstat=( 0 0 0 ) utm=16794stm=14151 core=1 HZ=100 | stack=0xbe4e0000-0xbe4e2000 stackSize=8MB | held mutexes= atandroid.view.inputmethod.InputMethodManager.windowDismissed(InputMethodManager.java:1296) - waiting to lock <0x3fd06119> (a >>android.view.inputmethod.InputMethodManager$H) held by thread 80 atandroid.view.WindowManagerGlobal.removeViewLocked(WindowManagerGlobal.java:366) atandroid.view.WindowManagerGlobal.removeView(WindowManagerGlobal.java:324) - locked <0x2231a2ab> (ajava.lang.Object) atandroid.view.WindowManagerImpl.removeViewImmediate(WindowManagerImpl.java:116) atandroid.app.Dialog.dismissDialog(Dialog.java:341) atandroid.app.Dialog.dismiss(Dialog.java:324)
thread80的调用栈,可知,他正等待锁0x1ba525de:
"Binder_D" prio=5 tid=80 Blocked | group="main" sCount=2 dsCount=0 obj=0x13375760 self=0xb8df9018 | sysTid=2574 nice=0 cgrp=default sched=0/0 handle=0xb8de15d0 | state=S schedstat=( 0 0 0 ) utm=11936 stm=14541 core=1 HZ=100 | stack=0xa0b28000-0xa0b2a000 stackSize=1012KB | held mutexes= at com.android.server.InputMethodManagerService.getCurrentInputMethodSubtype(InputMethodManagerService.java:3238) >> - waiting to lock <0x1ba525de> (a java.util.HashMap) held by thread 1 at android.view.inputmethod.InputMethodManager.getCurrentInputMethodSubtype(InputMethodManager.java:1948) - locked <0x3fd06119> (a android.view.inputmethod.InputMethodManager$H) at com.android.server.TextServicesManagerService.getCurrentSpellCheckerSubtype(TextServicesManagerService.java:413) - locked <0x1aaf2990> (a java.util.HashMap) at com.android.internal.textservice.ITextServicesManager$Stub.onTransact(ITextServicesManager.java:72) at android.os.Binder.execTransact(Binder.java:469)
从tid=1的线程调用栈来看,他需要锁<0x3fd06119> 即mh的hander锁:
public void windowDismissed(IBinder appWindowToken) { checkFocus(); synchronized (mH) { if (mServedView != null && mServedView.getWindowToken() == appWindowToken) { finishInputLocked(); } } }
通过其其他调用方法可知,此时它已经拥有了锁:<0x1ba525de>,即mMethodMap:
void hideInputMethodMenu() { synchronized (mMethodMap) { hideInputMethodMenuLocked(); } }
而tid=80的线程它需要锁0x1ba525de,即mMethodMap,
public InputMethodSubtypegetCurrentInputMethodSubtype() { // TODO: Make this work even fornon-current users? if (!calledFromValidUser()) { return null; } synchronized (mMethodMap) { returngetCurrentInputMethodSubtypeLocked(); } }
而该锁已经被线程1拥有,且并未释放;
另外,tid=80的线程此时恰好对tid=1线程需要的<0x3fd06119>mh的hander锁进行了上锁:
publicInputMethodSubtype getCurrentInputMethodSubtype() { synchronized (mH) { try { returnmService.getCurrentInputMethodSubtype(); } catch (RemoteException e) { Log.w(TAG, "IME died:" + mCurId, e); return null; } } } ............
这样,便形成了死锁,此时其他线程通过binder调用,也需要线程1所拥有的mMethodMap这个锁而造成线程阻塞,从而造成ANR。
死锁环境:
Thread |
Locked |
Need |
Thread 1 |
0x1ba525de(mMethodMap) |
0x3fd06119(mh) |
Thread 80 |
0x3fd06119(mh) |
0x1ba525de(mMethodMap) |
Other thread |
|
0x1ba525de(mMethodMap) |
解决方案:由于这个问题是在Monkey测试环境下触发的,环境较为复杂,概率极低,因此,考虑在InputMethodManager中增加flag,防止同时访问mMethodMap锁,避免发生死锁。