Android uncaught exception mechanism

1. Uncaught exceptions at the Framework layer
2. Uncaught exceptions at the Framework layer to avoid pop-ups
3. Uncaught exceptions at the Native layer
4. Crash collection principles at the Native layer
5. Uncaught exceptions at the Native layer to avoid pop-ups

1. Uncaught exception at the Framework layer:

Let me talk about a few conclusions first:
①As long as the exception is passed to the system, the process and process group will be killed, no matter which thread has the exception; ②If the
exception is not passed to the system, the main thread will have an uncaught exception, and the process will die. But sub-threads won't.

After the process starts ZygoteInit.main, it will call RuntimeInit.commonInit. There is a line of code in this method:

Thread.setDefaultUncaughtExceptionHandler(new UncaughtHandler());

That is to say, when the process starts, the system will set an UncaughtExHandler for us by default. Then look at the specific implementation of this class:
[–>RuntimeInit.java]

private static class UncaughtHandler implements Thread.UncaughtExceptionHandler {
    public void uncaughtException(Thread t, Throwable e) {
        try {
            //保证crash处理过程不会重入
            if (mCrashing) return;
            mCrashing = true;
            ...
            // 打印异常堆栈信息
            StringBuilder message = new StringBuilder();
            message.append("FATAL EXCEPTION: ").append(t.getName()).append("\n");
            final String processName = ActivityThread.currentProcessName();
            if (processName != null) {
                message.append("Process: ").append(processName).append(", ");
            }
            message.append("PID: ").append(Process.myPid());
            Clog_e(TAG, message.toString(), e);

    // 把异常信息传递给系统服务,也就是交给AMS处理。
    ActivityManagerNative.getDefault().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.CrashInfo(e));
        } catch (Throwable t2) {
            ...
        } finally {
            //自杀并且退出。
            Process.killProcess(Process.myPid());
            System.exit(10);
        }
    }
}

handleApplicationCrash will eventually go to handleApplicationCrashInner, and then look at how AMS handles this exception:
[–>ActivityManagerService.java]

void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
        ApplicationErrorReport.CrashInfo crashInfo) {
    //将Crash信息写入到Event log
    EventLog.writeEvent(EventLogTags.AM_CRASH,...);
    //将错误信息添加到DropBox
    addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);
    //正式进入crash处理流程
    crashApplication(r, crashInfo);
}

[–>ActivityManagerService.java]

private void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
        ...        
        // makeAppCrashingLocked里面会杀掉进程和进程组,移除进程里面的服务,window之类的。
         if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace)) {
            Binder.restoreCallingIdentity(origId);
            return;
        }
        Message msg = Message.obtain();
        msg.what = SHOW_ERROR_MSG;
        HashMap data = new HashMap();
        data.put("result", result);
        data.put("app", r);
        msg.obj = data;
        //发送消息SHOW_ERROR_MSG,弹出提示crash的对话框,等待用户选择   
        mUiHandler.sendMessage(msg);
        ...
    }

This is the code for App Crash or uncaught exceptions that cause system pop-ups. This processing flow is still relatively complicated, and the subsequent processing will still be designed to determine whether there are different strategies for system applications. For details, you can go to: http://gityuan.com/2016/06/24/app-crash/

2. Framework layer uncaught exception to avoid pop-up window scheme

There are two methods here. One is that we set the UncaughtExHandler ourselves and do not pass the exception to the system. However, in order to avoid flushing out the Handler, the general SDK third-party library will pass it on by default. Here is another solution:
there is such a line of code in the system that sets UncaughtExHandler for us by default:

ActivityManagerNative.getDefault().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.CrashInfo(e));

Then we can get AMN, hook its handleApplicationCrash method, and not send this exception information to the system service, so that it will not cause pop-up windows.

public static  void hookAms() throws Exception {
        Class<?> amnClass = Class.forName("android.app.ActivityManagerNative");
        Method getDefaultMethod = amnClass.getMethod("getDefault");
        final Object IActivityManager = getDefaultMethod.invoke(null);

        Field gDefaultField = amnClass.getDeclaredField("gDefault");
        gDefaultField.setAccessible(true);
        Object gDefault = gDefaultField.get(null);

        Class<?> singleClass = Class.forName("android.util.Singleton");
        Field mInstanceField = singleClass.getDeclaredField("mInstance");
        mInstanceField.setAccessible(true);

        Object proxyInstance = Proxy.newProxyInstance(Thread.currentThread().getContextClassLoader(),
                IActivityManager.getClass().getInterfaces(),
                new InvocationHandler() {
                    @Override
                    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
                        if ("handleApplicationCrash".equals(method.getName())) {
                            LoggerUtilsKt.logD("handleApplicationCrash invoke");
                            return null;
                        }
                        return method.invoke(IActivityManager, args);
                    }
                });
        // 替换掉AMS
        mInstanceField.set(gDefault, proxyInstance);
        LoggerUtilsKt.logD("hook finish");
    }

Some manufacturers will also modify this part of the code to block this pop-up window. Starting from Android P, this pop-up window will not appear by default.

3. Uncaught exceptions in the Native layer

Native exception handling process:
exception occurs -> Kernal sends a semaphore -> the current process captures the semaphore -> sends the crash information to the system service -> system service processing -> sends the information to AMS after processing.
Similar to the framework layer, the native layer system will also give us a semaphore processing mechanism by default:
[-> linker/debugger.cpp]

__LIBC_HIDDEN__ void debuggerd_init() {
  struct sigaction action;
  memset(&action, 0, sizeof(action));
  sigemptyset(&action.sa_mask);
  // 指定信号接收的函数
  action.sa_sigaction = debuggerd_signal_handler;
  action.sa_flags = SA_RESTART | SA_SIGINFO;
  //使用备用signal栈(如果可用),以便我们能捕获栈溢出
  action.sa_flags |= SA_ONSTACK;
  sigaction(SIGABRT, &action, nullptr);
  sigaction(SIGBUS, &action, nullptr);
  sigaction(SIGFPE, &action, nullptr);
  sigaction(SIGILL, &action, nullptr);
  sigaction(SIGPIPE, &action, nullptr);
  sigaction(SIGSEGV, &action, nullptr);
#if defined(SIGSTKFLT)
  sigaction(SIGSTKFLT, &action, nullptr);
#endif
  sigaction(SIGTRAP, &action, nullptr);
}

When the kernel sends a semaphore, it will enter this function for processing:
[-> linker/debugger.cpp]

static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void*) {
  if (!have_siginfo(signal_number)) {
    info = nullptr; //SA_SIGINFO标识被意外清空,则info未定义
  }
  //输出一些简要signal信息
  log_signal_summary(signal_number, info);
  //建立于debuggerd的socket通信连接,这个函数比较关键,就是它把crash信息发送给系统服务debuggerd
  send_debuggerd_packet(info);
  //重置信号处理函数为SIG_DFL(默认操作)
  signal(signal_number, SIG_DFL);

  switch (signal_number) {
    case SIGABRT:
    case SIGFPE:
    case SIGPIPE:
#if defined(SIGSTKFLT)
    case SIGSTKFLT:
#endif
    case SIGTRAP:
      tgkill(getpid(), gettid(), signal_number);
      break;
    default:    // SIGILL, SIGBUS, SIGSEGV
      break;
  }
}

[-> linker/debugger.cpp]

static void send_debuggerd_packet(siginfo_t* info) {
  ...
  //建立与debuggerd的socket通道
  int s = socket_abstract_client(DEBUGGER_SOCKET_NAME, SOCK_STREAM | SOCK_CLOEXEC);
  ...
  debugger_msg_t msg;
  msg.action = DEBUGGER_ACTION_CRASH;
  msg.tid = gettid();
  msg.abort_msg_address = reinterpret_cast<uintptr_t>(g_abort_message);
  msg.original_si_code = (info != nullptr) ? info->si_code : 0;
  //将DEBUGGER_ACTION_CRASH消息发送给debuggerd服务端
  ret = TEMP_FAILURE_RETRY(write(s, &msg, sizeof(msg)));
  if (ret == sizeof(msg)) {
    char debuggerd_ack;
    //阻塞等待debuggerd服务端的回应数据
    ret = TEMP_FAILURE_RETRY(read(s, &debuggerd_ack, 1));
    int saved_errno = errno;
    notify_gdb_of_libraries();
    errno = saved_errno;
  }
  close(s);
}

After the data is sent to the debuggerd server, it will go through a series of processing, which is relatively complicated and limited to space, so skip it here. For details, you can go to: http://gityuan.com/2016/06/25/android-native-crash/
After the debuggerd server finishes processing the information, it then sends information to AMS, also through the socket. AMS will start a thread that monitors NativeCrash through the startObservingNativeCrashes method. Inside the thread is to monitor the information sent by debuggerd:
[-> NativeCrashListener.java]

public void run() {
    final byte[] ackSignal = new byte[1];
    {
        // DEBUGGERD_SOCKET_PATH= "/data/system/ndebugsocket"
        File socketFile = new File(DEBUGGERD_SOCKET_PATH);   

    try {
        FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
        // 创建socket服务端
        final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(
                DEBUGGERD_SOCKET_PATH);
        Os.bind(serverFd, sockAddr);
        Os.listen(serverFd, 1);

        while (true) {
            FileDescriptor peerFd = null;
            try {
                // 等待debuggerd建立连接
                peerFd = Os.accept(serverFd, null /* peerAddress */);
                //获取debuggerd的socket文件描述符
                if (peerFd != null) {
                    //只有超级用户才被允许通过该socket进行通信
                    StructUcred credentials =
                            Os.getsockoptUcred(peerFd, SOL_SOCKET, SO_PEERCRED);
                    if (credentials.uid == 0) {
                        // 这里面最终也会调用handleApplicationCrashInner,走到framework那套处理流程,这样弹窗就会出来了。
                        consumeNativeCrashData(peerFd);
                    }
                }
            } catch (Exception e) {
                Slog.w(TAG, "Error handling connection", e);
            } finally {
                //应答debuggerd已经建立连接
                if (peerFd != null) {
                    Os.write(peerFd, ackSignal, 0, 1);//写入应答消息
                    Os.close(peerFd);//关闭socket
                    ...
                }
            }
        }
    } catch (Exception e) {
        Slog.e(TAG, "Unable to init native debug socket!", e);
    }
}

Those two articles are very detailed, you can read more.

4. Native layer collects crash principle

As mentioned earlier, after an uncaught crash occurs in the native layer, the kernel will send us a semaphore. This semaphore still appears in our process. We can also set the receiving function of this semaphore:

const int handledSignals[] = {
    // 这几个信号都是致命的.
    SIGSEGV, // 信号11 无效的内存引用
    SIGABRT, // 信号   6   来自abort函数的终止信号
    SIGFPE,  // 信号   8   浮点异常
    SIGILL,  // 信号   4   非法指令
    SIGBUS,       // 信号   7   总线错误
    SIGALRM        // 信号  14 警报器发出的信号
};

const int handledSignalsNum = sizeof(handledSignals) / sizeof(handledSignals[0])}; 

// 旧的信号处理器,每个信号量可以设置不同的处理器。
struct sigaction old_handlers[handledSignalsNum];

// 当发生Native崩溃并且发生前面几个信号异常时,就会调用mySigaction完成信号处理。这个函数里面的info就包含了错误的堆栈信息等。
void mySigaction(int code, siginfo_t *info, void *reserved) {
    LOGD("收到信号了!%d", code);
    int index = 0;
    switch(code){
        case SIGSEGV:
            index = 0;
            break;
        case SIGABRT:
            index = 1;
            break;
        case SIGFPE:
            index = 2;
            break;
        case SIGILL:
            index = 3;
            break;
        case SIGBUS:
            index = 4;
            break;
        case SIGALRM :
            index = 5;
            break;
    }
    // 再交给旧的处理器去处理
    old_handlers[index].sa_sigaction(code, info, reserved);
}

// 开始之前调用一下这个方法,设置新的信号量处理
void setSigaction() {
    struct sigaction handler;
    memset(&handler, 0, sizeof(struct sigaction));
    handler.sa_sigaction = mySigaction;
    handler.sa_flags = SA_RESTART | SA_SIGINFO;
    // 关键就是这个sigaction函数,第一个参数表示要处理的信号量,第二个表示处理这个信号量新的句柄,第三个是旧的信号处理句柄。
    for (int i = 0;i < handledSignalsNum; ++i) {
         int result = sigaction(handledSignals[i],  
                &handler, 
                &old_handlers[i]);          
         if (result == 0) {
            LOGD("设置信号量成功");
         }
    }
}

About semaphore:
picture
Reference: https://juejin.im/entry/5962e439f265da6c2810c8aa

5. Native layer uncaught exception to avoid pop-up window solution

According to the previous statement, either the semaphore is not handed over to the system for processing, or some functions are hooked like the framework layer, and the crash information is not sent to the debuggerd server. Let's go back and look at this function again:

static void send_debuggerd_packet(siginfo_t* info) {
  ...
  //建立与debuggerd的socket通道
  int s = socket_abstract_client(DEBUGGER_SOCKET_NAME, SOCK_STREAM | SOCK_CLOEXEC);
  ...
  debugger_msg_t msg;
  msg.action = DEBUGGER_ACTION_CRASH;
  msg.tid = gettid();
  msg.abort_msg_address = reinterpret_cast<uintptr_t>(g_abort_message);
  msg.original_si_code = (info != nullptr) ? info->si_code : 0;
  //将DEBUGGER_ACTION_CRASH消息发送给debuggerd服务端
  ret = TEMP_FAILURE_RETRY(write(s, &msg, sizeof(msg)));
  if (ret == sizeof(msg)) {
    char debuggerd_ack;
    //阻塞等待debuggerd服务端的回应数据
    ret = TEMP_FAILURE_RETRY(read(s, &debuggerd_ack, 1));
    int saved_errno = errno;
    notify_gdb_of_libraries();
    errno = saved_errno;
  }
  close(s);
}

This function is very independent, that is, it sends the crash information to the system service, so can we hook this function to prevent it from sending the information? Tried it and found it is not feasible. . . First of all, this function should be in the libc.so library. If you want to hook it, you must first get its symbol, but it fails here, probably because this function is hidden, including the function of local initialization semaphore processing, which is also used __LIBC_HIDDEN__ This symbol is modified.

__LIBC_HIDDEN__ void debuggerd_init() {
    ...
}

The second reason is that this part of the code is actually a bit sensitive. If this function send_debuggerd_packet is hooked, then we can create malicious applications and modify the information in the parameters when this function is triggered, such as the package name or something to pretend to be other apps. Even triggering this function manually caused the system to pop up frequently. Therefore, there is currently no good solution to prevent native crashes from popping up. What can be done is to improve at the code level, such as using try/catch in C++ as much as possible.

Guess you like

Origin blog.csdn.net/aa642531/article/details/90110618