Android WatchDog（3）- HandlerChecker详解（Android 12）

HandlerChecker是WatchDog的一个核心的内部类实现，理解了这个类的业务逻辑，WatchDog的原理基本也就掌握了。

我们分段拆解来分析这个类的实现。

    /**
     * Used for checking status of handle threads and scheduling monitor callbacks.
     */
    public final class HandlerChecker implements Runnable {
    
    
    	// 构造函数中传入的Handler对象
        private final Handler mHandler;
        // 构造函数中传入的名称
        private final String mName;
        // 构造函数中传入的参数，默认为30秒
        private final long mWaitMax;
        // WatchDog监测的Monitor对象列表,schedule过程中
        // 遍历的是这个列表，schedule过程下面详述。
        private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
        // WatchDog监测的Monitor对象列表，addMonitor最终
        // 是把要监测的对象添加到这个列表中，每一次schedule
        // 开始时，会将mMonitorQueue元素复制到mMonitors中，
        // 然后mMonitorQueue会清空。schedule过程下面详述。
        private final ArrayList<Monitor> mMonitorQueue = new ArrayList<Monitor>();
        // 每一次schedule过程是否完成
        private boolean mCompleted;
        // 当前schedule的监测对象
        private Monitor mCurrentMonitor;
        // 本次schedule过程的开始时间，用于后面判断是否超时
        private long mStartTime;
        // 暂停 schedule的调用次数，如果大于0则schedule不会真正执行
        private int mPauseCount;

HandlerChecker实现了Runnable接口，注释部分对每个属性的作用做了大致说明，下面分析每个函数的时候可以更进一步理解其作用。

        HandlerChecker(Handler handler, String name, long waitMaxMillis) {
    
    
            mHandler = handler;
            mName = name;
            mWaitMax = waitMaxMillis;
            mCompleted = true;
        }

构造函数，这个没啥可分析的，属性声明部分已经做了说明。

        void addMonitorLocked(Monitor monitor) {
    
    
            // We don't want to update mMonitors when the Handler is in the middle of checking
            // all monitors. We will update mMonitors on the next schedule if it is safe
            mMonitorQueue.add(monitor);
        }

调用WatchDog的addMonitor方法时最终会调用到这里。注释部分翻译一下即是：Handler在schedule monitors过程中不会更新mMonitors列表，下次schedule时才会更新mMonitors列表。意思其实就是如果在schedule过程中，有对象调用了addMonitor函数，先将这个新的被监测对象会加到mMonitorQueue中，等本轮schedule结束下轮schedule开始时再更新到mMonitors中。

        public void scheduleCheckLocked() {
    
    
        	// mCompleted 初始值为true
        	// 或者一次schedule完成后为true，否则为false
            if (mCompleted) {
    
    
                // Safe to update monitors in queue, Handler is not in the middle of work
                // 我们前面已经分析过，每次schedule开始时，会将
                // mMonitorQueue复制到mMonitors中并清空。
                mMonitors.addAll(mMonitorQueue);
                mMonitorQueue.clear();
            }
            // mMonitors.size() == 0 表示没有被监测的对象
            // isPolling 为true表示当前线程中的消息机制处于
            // 正常轮询中，未发生阻塞
            // mPauseCount > 0表示有地方调用了pauseLocked
            // 方法
            if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling())
                    || (mPauseCount > 0)) {
    
    
                // Don't schedule until after resume OR
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread. Note that we
                // only do this if we have no monitors since those would need to
                // be executed at this point.
                // 结束此轮schedule
                mCompleted = true;
                return;
            }
            // 如果本轮schedule未结束，则不允许新的schedule开始
            if (!mCompleted) {
    
    
                // we already have a check in flight, so no need
                return;
            }

			// 经过上面等检查，真正的check动作开始
            mCompleted = false;
            mCurrentMonitor = null;
            // 记录本次check开始时间，后面用于判断是否超时
            mStartTime = SystemClock.uptimeMillis();
            // 进入run函数
            mHandler.postAtFrontOfQueue(this);
        }

这部分主要就是对当前线程和mMonitor中的对象进行check是否超时，这里先在check开始前进行了一些检查然后触发了run函数。

        @Override
        public void run() {
    
    
            // Once we get here, we ensure that mMonitors does not change even if we call
            // #addMonitorLocked because we first add the new monitors to mMonitorQueue and
            // move them to mMonitors on the next schedule when mCompleted is true, at which
            // point we have completed execution of this method.
            // 遍历mMonitors列表
            final int size = mMonitors.size();
            for (int i = 0 ; i < size ; i++) {
    
    
                synchronized (mLock) {
    
    
                	// 返回当前check的对象
                    mCurrentMonitor = mMonitors.get(i);
                }
                // 回调当前被监测对象的monitor方法
                mCurrentMonitor.monitor();
            }

			// 执行到这里说明上面的循环执行完毕，没有发生死锁问题，
			// 本次schedule过程结束
            synchronized (mLock) {
    
    
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }

可以看到，主要就是回调了被监测对象的monitor方法，可是为什么monitor方法被成功回调就说明没问题结束本轮check了呢？我们需要看下被监测对象的monitor方法的实现。以WindowManagerService为例，它的monitor方法如下：

    // Called by the heartbeat to ensure locks are not held indefnitely (for deadlock detection).
    @Override
    public void monitor() {
    
    
        synchronized (mGlobalLock) {
    
     }
    }

monitor只是持有mGlobalLock锁的空方法，如果这个方法能正常执行，说明持mGlobalLock的地方没有发生死锁。如此我们应该就理解其原理了，如果要想被WatchDog监测,自己实现的monitor方法中持有一个想被监测的锁就可以了。

        boolean isOverdueLocked() {
    
    
            return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);
        }

判断本轮schedule是否超时（默认30秒）。

        public int getCompletionStateLocked() {
    
    
            if (mCompleted) {
    
    
            	// 本轮schedule成功完成
                return COMPLETED;
            } else {
    
    
            	// 获取check开始到现在的时间差
                long latency = SystemClock.uptimeMillis() - mStartTime;
                // 小于mWaitMax/2（15秒）时，正在check中，返回WAITING
                if (latency < mWaitMax/2) {
    
    
                    return WAITING;
                 // 大于15秒小于30秒时，返回WAITED_HALF
                } else if (latency < mWaitMax) {
    
    
                    return WAITED_HALF;
                }
            }
            // 走到这里说明check到时间已经超过30秒，已经超时
            return OVERDUE;
        }

此函数用来获取当前schedule的完成状态，根据不同的条件分为COMPLETED、WAITING、WAITED_HALF和OVERDUE（超时）状态。

        public Thread getThread() {
    
    
            return mHandler.getLooper().getThread();
        }

        public String getName() {
    
    
            return mName;
        }

获取当前线程和线程名称。

        String describeBlockedStateLocked() {
    
    
            if (mCurrentMonitor == null) {
    
    
                return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
            } else {
    
    
                return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
                        + " on " + mName + " (" + getThread().getName() + ")";
            }
        }

只看这个函数的话，意思是返回当前线程的名称（mCurrentMonitor == null）或者返回当前线程的名称和被监测对象的类名(mCurrentMonitor != null)。实际上该方法是在监测到有超时情况时才会调用。所以它的作用是如果check过程中发生了超时，返回当前线程或者类名信息。方法调用部分我们后面章节会详细介绍。

        /** Pause the HandlerChecker. */
        public void pauseLocked(String reason) {
    
    
            mPauseCount++;
            // Mark as completed, because there's a chance we called this after the watchog
            // thread loop called Object#wait after 'WAITED_HALF'. In that case we want to ensure
            // the next call to #getCompletionStateLocked for this checker returns 'COMPLETED'
            mCompleted = true;
            Slog.i(TAG, "Pausing HandlerChecker: " + mName + " for reason: "
                    + reason + ". Pause count: " + mPauseCount);
        }

        /** Resume the HandlerChecker from the last {@link #pauseLocked}. */
        public void resumeLocked(String reason) {
    
    
            if (mPauseCount > 0) {
    
    
                mPauseCount--;
                Slog.i(TAG, "Resuming HandlerChecker: " + mName + " for reason: "
                        + reason + ". Pause count: " + mPauseCount);
            } else {
    
    
                Slog.wtf(TAG, "Already resumed HandlerChecker: " + mName);
            }
        }

两个对立的方法，用于记录更新mPauseCount的值，前面我们分析过，如果mPauseCount大于0的话是不会进行实际的check工作的。

至此，HandlerChecker类全部分析完毕。

本章完。

Android WatchDog（3）- HandlerChecker详解（Android 12）

猜你喜欢