It took 300 seconds to read this Android ANR analysis, and you can be forced to try it out with Ari!

ANR overview

1) First, ANR (Application Not responding) means that the application is not responding, and the Android system needs to complete some events within a certain time range. If it fails to get an effective response beyond the predetermined time or the response time is too long, ANR will result . ANR is guaranteed by the message processing mechanism. Android implements a set of sophisticated mechanisms to discover ANR at the system layer. The core principle is message scheduling and timeout processing.

2) Secondly, the main body of the ANR mechanism is implemented at the system layer. All messages related to ANR will be scheduled by the system process (system_server), and then dispatched to the application process to complete the actual processing of the message. At the same time, the system process has designed different timeout limits to track the processing of the message. Once the application processes the message improperly, the timeout limit comes into effect. It collects some system status, such as CPU / IO usage, process function call stack, and reports that the user has no response to the process (ANR dialog).

3) Then, the ANR problem is essentially a performance problem. The ANR mechanism actually limits the main thread of the application, requiring the main thread to process some of the most common operations (starting services, processing broadcasts, processing input) within a limited time. If the processing times out, the main thread is considered to have lost its response Ability to perform other operations. Time-consuming operations in the main thread, such as intensive CPU operations, a lot of IO, and complex interface layout, etc., will reduce the responsiveness of the application.

What scenarios will cause ANR?

1. When ANR occurs, AppNotRespondingDialog.show () method will be called to pop up a dialog box to prompt the user.

2.  AppErrors.appNotResponding (), this method is the only entry to finally pop up the ANR dialog box , and the ANR prompt will be displayed when the method is called. In this method, there will be no ANR prompts, and there will be no ANR related logs and reports; through the call relationship, you can see which scenarios will cause ANR. There are four scenarios:

(1) Service Timeout: Service cannot be processed within a specific time

(2) BroadcastQueue Timeout: BroadcastReceiver cannot be processed within a specific time

(3) ContentProvider Timeout: content provider execution timeout

(4) inputDispatching Timeout: The button or touch event does not respond within a specific time.

ANR mechanism

The ANR mechanism can be divided into two parts: ** ANR monitoring mechanism: ** Android has a set of monitoring mechanisms for different ANR types (Broadcast, Service, InputEvent). ** ANR reporting mechanism: ** After ANR is detected, it is necessary to display the ANR dialog box and output log (process function call stack when ANR occurs, CPU usage, etc.).

The code of the entire ANR mechanism also spans several layers of Android: App layer : the processing logic of the main thread of the application; Framework layer : the core of the ANR mechanism, mainly AMS, BroadcastQueue, ActiveServices, InputmanagerService, InputMonitor, InputChannel, ProcessCpuTracker, etc .; Native layer : InputDispatcher.cpp;

The Provider timeout mechanism encounters relatively few, and will not be analyzed for the time being; Broadcast currently mainly wants to say two knowledge points:

First: Whether it is a regular broadcast or an ordered broadcast, the onreceive of the final broadcast receiver is executed serially and can be verified by Demo;

Second: through the Demo and the framework to add related logs, it has been verified that ordinary broadcasts will also have an ANR monitoring mechanism. The ANR mechanism and problem analysis articles believe that only serial broadcasts have an ANR monitoring mechanism. Later, we will specifically explain the broadcast sending and receiving process. At the same time, the Broadcast ANR monitoring mechanism will also be supplemented; this article mainly discusses the ANR monitoring mechanism by taking Servi processing timeout and input event distribution timeout as examples.

Service timeout monitoring mechanism

Service runs on the main thread of the application. If the execution time of Service exceeds 20 seconds, ANR will be triggered.

When the Service ANR occurs, you can generally check whether there are time-consuming operations (such as complex operations, IO operations, etc.) in the Service life cycle functions (onCreate (), onStartCommand (), etc.). If the code logic of the application cannot find the problem, it is necessary to check the current system status: CPU usage, system service status, etc. to determine whether the ANR process occurred at that time was affected by the abnormal operation of the system.

How to detect Service timeout? Android is implemented by setting a timing message. Timed messages are processed by AMS's message queue (system_server's ActivityManager thread). AMS has the context information of Service operation, so it is reasonable to set a timeout detection mechanism in AMS. Let ’s throw two questions first: ** Service startup process? ** Question 1: How to monitor Service timeout?

The service monitoring mechanism is mainly explained by the above two problems. After knowing the service startup process, it is easier to analyze the service timeout monitoring mechanism through the service startup process.

1. The service startup process is shown in the following figure:

(1) ActiveServices.realStartServiceLocked () creates a Service object through scheduleCreateService () of app.thread and calls Service.onCreate (), and then calls the sendServiceArgsLocked () method to call other methods of Service, such as onStartCommand. The above two steps are inter-process communication. Cross-process communication between the application and AMS can refer to the application process and system process communication.
(2) The above are only the key steps of the Service startup process. The specific work of each method needs to be checked. For the specific code, ignore these for the time being. For those who are interested, you can refer to Android development art exploration and other related materials.

2. Service timeout monitoring mechanism Service timeout monitoring mechanism can be found from the service startup process.

(1) The main work of ActiveServices.realStartServiceLocked () is

    private final void realStartServiceLocked(ServiceRecord r,
            ProcessRecord app, boolean execInFg) throws RemoteException {
        ...
        // 主要是为了设置ANR超时,可以看出在正式启动Service之前开始ANR监测;
        bumpServiceExecutingLocked(r, execInFg, "create");
       // 启动过程调用scheduleCreateService方法,最终会调用Service.onCreate方法;
        app.thread.scheduleCreateService(r, r.serviceInfo,
        // 绑定过程中,这个方法中会调用app.thread.scheduleBindService方法
        requestServiceBindingsLocked(r, execInFg);
        // 调动Service的其他方法,如onStartCommand,也是IPC通讯
        sendServiceArgsLocked(r, execInFg, true);
    }

(2) bumpServiceExecutingLocked () will call scheduleServiceTimeoutLocked () method

    void scheduleServiceTimeoutLocked(ProcessRecord proc) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        Message msg = mAm.mHandler.obtainMessage(
                ActivityManagerService.SERVICE_TIMEOUT_MSG);
        msg.obj = proc;
        // 在serviceDoneExecutingLocked中会remove该SERVICE_TIMEOUT_MSG消息,
        // 当超时后仍没有remove SERVICE_TIMEOUT_MSG消息,则执行ActiveServices. serviceTimeout()方法;
        mAm.mHandler.sendMessageDelayed(msg,
                proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        // 前台进程中执行Service,SERVICE_TIMEOUT=20s;后台进程中执行Service,SERVICE_BACKGROUND_TIMEOUT=200s
    }

(3) If there is no serviceDoneExecutingLocked () method to remove the message within the specified time, the ActiveServices. ServiceTimeout () method will be called

void serviceTimeout(ProcessRecord proc) {
    ...
    final long maxTime =  now -
              (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
    ...
    // 寻找运行超时的Service
    for (int i=proc.executingServices.size()-1; i>=0; i--) {
        ServiceRecord sr = proc.executingServices.valueAt(i);
        if (sr.executingStart < maxTime) {
            timeout = sr;
            break;
        }
       ...
    }
    ...
    // 判断执行Service超时的进程是否在最近运行进程列表,如果不在,则忽略这个ANR
    if (timeout != null && mAm.mLruProcesses.contains(proc)) {
        anrMessage = "executing service " + timeout.shortName;
    }
    ...
    if (anrMessage != null) {
        // 当存在timeout的service,则执行appNotResponding,报告ANR
        mAm.appNotResponding(proc, null, null, false, anrMessage);
    }
}

(4) The overall process of Service onCreate timeout monitoring is shown in the figure below

Before the onCreate life cycle begins, start timeout monitoring. If onCreate is not completed within the specified time (this method performs time-consuming tasks), it will call the ActiveServices.serviceTimeout () method to report the ANR; if it is within the specified time After onCreate is executed, the
ActivityManagerService.serviceDoneExecutingLocked () method will be called to remove the SERVICE_TIMEOUT_MSG message, indicating that the Service.onCreate method does not have ANR. The Service is scheduled by AMS. Using Handler and Looper, a TIMEOUT message is designed to be processed by the AMS thread The entire timeout mechanism is implemented in the Java layer; the above is the overall process of Service timeout monitoring.

Input event timeout monitoring

The application can receive input events (button press, touch screen, trackball, etc.), and ANR will be triggered when the processing is not completed within 5 seconds.

Here, the question is first thrown: What process did the input event go through before it could be dispatched to the application interface? How to detect the input time processing timeout?

1.  Introduction to Android input system The overall process and participants of the Android input system are shown in the figure below.

Simply put, the kernel writes the original events to the device node, InputReader continuously extracts the original input events from the EventHub in its thread loop, and processes the processed events into the dispatch queue of InputDispatcher. InputDispatcher takes out the events in the dispatch queue in its thread loop, finds the appropriate window, and writes the event to the window's event receiving pipeline.

The Looper of the window event receiving thread takes the event out of the pipeline and hands it to the window event handler to respond to the event. The key processes are: reading and processing of original input events; distribution of input events; sending, receiving and feedback of input events. Input event dispatching refers to the process by which InputDispatcher continuously extracts events from the dispatch queue and finds a suitable window to send. Input event sending is the process by which InputDispatcher sends events to the window through the Connection object.

Cross-process communication between InputDispatcher and window is mainly done through InputChannel. After the InputDispatcher and the window establish a connection through the InputChannel, you can send, receive, and feedback events; the main process of sending and receiving input events is shown in the figure:

Among them, after the input event is injected into the dispatch queue, the dispatch thread will be woken up, and the dispatch thread cycle is completed by the InputDispatcher.dispatchOnce function; after the InputDispatcher writes the event as InputMessage to the InputChannel, the looper on the window side is woken up, and then NativeInputReceiver :: handleEvent () begins Input event reception, input events are dispatched to the user interface from InputEventReceiver; the above is only the general process of input events, more detailed process can refer to related information; after understanding the general process of input system, we analyze the timeout monitoring of input events mechanism.

2.  Input event timeout monitoring The overall process of key event timeout monitoring is shown in the figure below

(1) InputDispatcher :: dispatchOnceInnerLocked ():
Select different event processing methods according to the type of event: InputDispatcher :: dispatchKeyLocked () or InputDispatcher :: dispatchMotionLocked (), we take the key event timeout monitoring as an example to illustrate;
(2) findFocusedWindowTargetsLocked ( ) The method will call checkWindowReadyForMoreInputLocked (); this method checks whether the window is capable of receiving new input events; there may be a series of scenes that prevent the event from continuing to be dispatched. Related scenarios are:

Scenario 1: The window is paused and cannot process the input event "Waiting because the [targetType] window is paused."

Scenario 2: The window has not been registered with InputDispatcher, and the event cannot be dispatched to the window "Waiting because the [targetType] window's input channel is not registered with the input dispatcher. The window may be in the process of being removed."

Scenario 3: The connection between the window and InputDispatcher has been interrupted, that is, the InputChannel does not work properly "Waiting because the [targetType] window's input connection is [status]. The window may be in the process of being removed."

Scenario 4: The InputChannel is saturated and new events cannot be processed "Waiting because the [targetType] window's input channel is full. Outbound queue length:% d. Wait queue length:% d."

Scenario 5: For KeyEvent input events, you need to wait for the last event to be processed. “Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it. Outbound queue length:% d. Wait queue length:% d. "

Scenario 6: For touch events (TouchEvent), input events can be dispatched to the current window immediately, because TouchEvents occur in the window currently visible to the user. But there is a situation, if the current application has too many input events waiting to be dispatched, resulting in ANR, then TouchEvent events need to be queued for dispatch. "Waiting to send non-key event because the% s window has not finished processing certain input events that were delivered to it over% 0.1fms ago. Wait queue length:% d. Wait queue head age:% 0.1fms."

The above scenarios are the printing of ANR causes that we often see in logs.

(3) The 5s limit of event distribution is defined in InputDispatcher.cpp; InputDispatcher :: handleTargetsNotReadyLocked () method If the event has not been distributed within 5s, then call InputDispatcher :: onANRLocked () to prompt the user to apply ANR

//默认分发超时间为5s
const nsecs_t DEFAULT_INPUT_DISPATCHING_TIMEOUT = 5000 * 1000000LL; 
int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
        const EventEntry* entry,
        const sp<InputApplicationHandle>& applicationHandle,
        const sp<InputWindowHandle>& windowHandle,
        nsecs_t* nextWakeupTime, const char* reason) {
    // 1.如果当前没有聚焦窗口,也没有聚焦的应用
    if (applicationHandle == NULL && windowHandle == NULL) {
        ...
    } else {
        // 2.有聚焦窗口或者有聚焦的应用
        if (mInputTargetWaitCause != INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY) {
            // 获取等待的时间值
            if (windowHandle != NULL) {
                // 存在聚焦窗口,DEFAULT_INPUT_DISPATCHING_TIMEOUT事件为5s
                timeout = windowHandle->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);
            } else if (applicationHandle != NULL) {
                // 存在聚焦应用,则获取聚焦应用的分发超时时间
                timeout = applicationHandle->getDispatchingTimeout(
                        DEFAULT_INPUT_DISPATCHING_TIMEOUT);
            } else {
                // 默认的分发超时时间为5s
                timeout = DEFAULT_INPUT_DISPATCHING_TIMEOUT;
            }
        }
    }
    // 如果当前时间大于输入目标等待超时时间,即当超时5s时进入ANR处理流程
    // currentTime 就是系统的当前时间,mInputTargetWaitTimeoutTime 是一个全局变量,
    if (currentTime >= mInputTargetWaitTimeoutTime) {
        // 调用ANR处理流程
        onANRLocked(currentTime, applicationHandle, windowHandle,
                entry->eventTime, mInputTargetWaitStartTime, reason);
        // 返回需要等待处理
        return INPUT_EVENT_INJECTION_PENDING;
    } 
}

(4) When the main thread of the application is stuck, then clicking on other components of the application is also unresponsive, because the event dispatch is serial, the previous event is not processed, and the next event will not be processed.

(5) Activity.onCreate performs time-consuming operations, no matter how the user operates, ANR will not occur, because the input event related monitoring mechanism has not been established; InputChannel channel has not been established at this time, it will not respond to input events, and InputDispatcher cannot yet events After being sent to the application window, the ANR monitoring mechanism has not been established, so ANR will not be reported at this time.

(6) Input events are scheduled by InputDispatcher. The pending input events will all enter the queue and wait. A waiting timeout judgment is designed. The timeout mechanism is implemented in the Native layer. The above is the input event ANR monitoring mechanism; for specific logic, please refer to the relevant source code;

ANR reporting mechanism

No matter what type of ANR occurs, the AppErrors.appNotResponding () method will eventually be called, so-called "different paths lead to the same result". The function of this method is to report to the user or developer that ANR has occurred. The final manifestation is: a dialog box pops up to tell the user that a certain program is currently unresponsive; enter a lot of logs related to ANR, so that developers can solve problems.

    final void appNotResponding(ProcessRecord app, ActivityRecord activity,
            ActivityRecord parent, boolean aboveSystem, final String annotation) {
        ...
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            // 1. 更新CPU使用信息。ANR的第一次CPU信息采样,采样数据会保存在mProcessStats这个变量中
            mService.updateCpuStatsNow();
        }
            // 记录ANR到EventLog中
            EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
                    app.processName, app.info.flags, annotation);
        // 输出ANR到main log.
        StringBuilder info = new StringBuilder();
        info.setLength(0);
        info.append("ANR in ").append(app.processName);
        if (activity != null && activity.shortComponentName != null) {
            info.append(" (").append(activity.shortComponentName).append(")");
        }
        info.append("\n");
        info.append("PID: ").append(app.pid).append("\n");
        if (annotation != null) {
            info.append("Reason: ").append(annotation).append("\n");
        }
        if (parent != null && parent != activity) {
            info.append("Parent: ").append(parent.shortComponentName).append("\n");
        }
        // 3. 打印调用栈。具体实现由dumpStackTraces()函数完成
        File tracesFile = ActivityManagerService.dumpStackTraces(
                true, firstPids,
                (isSilentANR) ? null : processCpuTracker,
                (isSilentANR) ? null : lastPids,
                nativePids);

        String cpuInfo = null;
        // MONITOR_CPU_USAGE默认为true
        if (ActivityManagerService.MONITOR_CPU_USAGE) {
            // 4. 更新CPU使用信息。ANR的第二次CPU使用信息采样。两次采样的数据分别对应ANR发生前后的CPU使用情况
            mService.updateCpuStatsNow();
            synchronized (mService.mProcessCpuTracker) {
                // 输出ANR发生前一段时间内各个进程的CPU使用情况
                cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
            }
            // 输出CPU负载
            info.append(processCpuTracker.printCurrentLoad());
            info.append(cpuInfo);
        }

        // 输出ANR发生后一段时间内各个进程的CPU使用率
        info.append(processCpuTracker.printCurrentState(anrTime));
        //会打印发生ANR的原因,如输入事件导致ANR的不同场景
        Slog.e(TAG, info.toString());
        if (tracesFile == null) {
            // There is no trace file, so dump (only) the alleged culprit's threads to the log
            // 发送signal 3(SIGNAL_QUIT)来dump栈信息
            Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
        }

        // 将anr信息同时输出到DropBox
        mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
                cpuInfo, tracesFile, null);
            // Bring up the infamous App Not Responding dialog
            // 5. 显示ANR对话框。抛出SHOW_NOT_RESPONDING_MSG消息,
            // AMS.MainHandler会处理这条消息,显示AppNotRespondingDialog对话框提示用户发生ANR
            Message msg = Message.obtain();
            HashMap<String, Object> map = new HashMap<String, Object>();
            msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
            msg.obj = map;
            msg.arg1 = aboveSystem ? 1 : 0;
            map.put("app", app);
            if (activity != null) {
                map.put("activity", activity);
            }

            mService.mUiHandler.sendMessage(msg);
        }
    }

In addition to the main logic, various types of logs are also output when ANR occurs: event log : By retrieving the "am_anr" keyword, you can find the application where ANR occurred main log : By retrieving the "ANR in" keyword, you can find ANR information , The context of the log will contain the CPU usage dropbox : by retrieving the "anr" type, you can find ANR information traces : when ANR occurs, the function call stack information of each process

At this point, the ANR related report has been completed, and the ANR problem needs to be analyzed later. The analysis of ANR often starts from the CPU usage in the main log and the function call stack in traces. Therefore, updating the CPU usage information updateCpuStatsNow () method and printing function stack dumpStackTraces () method is the key to the system reporting ANR problems. For specific analysis of ANR problems, please refer to related materials.

to sum up

1.  ANR monitoring mechanism : first analyze the general workflow of Service and input events, and then start from the source code implementation of two different ANR monitoring mechanisms, Service and InputEvent, and analyze how Android finds various types of ANR. When starting services and distributing input events, timeout detection is implanted to discover ANR. 2.  ANR reporting mechanism : analyze how Android outputs ANR logs. When ANR is discovered, the two very important log outputs are: CPU usage and the function call stack of the process. These two types of logs are our weapon to solve the ANR problem. 3.  The core principle of monitoring ANR is message scheduling and timeout processing. 4. Only scenes monitored by ANR will have ANR report and ANR prompt box.

References

ANR mechanism and problem analysis Understanding the triggering principle of Android ANR In-depth understanding of Android volume three (Android input system) Android development art exploration Android source code

Finally, thanks to the author of the article referenced in this article.

Original link: https://www.jianshu.com/p/ad1a84b6ec69

The article is not easy. If you like this article, or if it is helpful to you, I hope you can enjoy it. ** Like, forward, and follow **. The article will be updated continuously. Absolutely dry goods! ! !

Published 34 original articles · Like1 · Visits 756

Guess you like

Origin blog.csdn.net/Android725/article/details/105534876