Why getting a stack has never been easy

chatter

In order not to make the article look too boring, the author thought about it and added a little bit of talk! Since the last time this article was sent, black technology! Let Native Crash and ANR have nowhere to vent! , it's very popular with readers, what the hell is the number of favorites more than the number of likes, hehe! From my point of view, the original purpose of Signal was to build a device similar to an airbag, to ensure the first restart and recovery after a crash, to achieve the purpose of application stability, but slowly writing and writing, I found a lot of crash monitoring The platform also uses the same core principles (most of them are not open source yet), but the purpose of the function is different, so why not make Signal a common basic piece! Whether it is airbags or monitoring, it is actually the application of the upper layer is different! Um! After having this idea, add some log monitoring logic to Signal , and it will be more perfect! Hence this article! It's a supplement! If you haven't seen Black Technology! Let Native Crash and ANR have nowhere to vent! New friends of this article, please read first! (It doesn't matter if you don't have ndk development experience, and it doesn't involve very complicated c knowledge)

get stack

Get the stack! Maybe many new friends will think, what's so difficult about this! Can you just get a new Throwable directly, or Thread.currentThread().stackTrace (kotlin) and so on! Um! Yes! We usually have a very fixed way of obtaining stacks in the java layer, which benefits from the design of the java virtual machine and the design of the java language. Because the differences at the bottom of the multi-platform are shielded, we can use a relatively unified api to get the current stack. This stack also refers specifically to the java virtual machine stack!

But for the native stack, the problem comes! We know that the native layer is usually related to many factors, such as linker, compiler, and various library versions, various abi, etc., and it is not so simple to obtain a stack message, because too many factors interfere, this It's also a burden of history! And for our android, there are historical changes in the way android officially acquires the stack.

In the 4.1.1above and 5.0below, android native uses the one that comes with the system . Since libcorkscrew.so5.0, there is no libcorkscrew.sohigher version of the Android source code in the system, and its optimized version is used instead libunwind. At the same time, for ndk, the version of the compiler is also constantly changing, from the default gcc to clang (ndk >=13). It can be seen that we will find a unified way under many versions and many factors, and also It's really not easy! But yeah! Today in 2022, Google has already launched a planned unified library breakpad , um! Although whether it can become a standard has not yet been determined, it is also an ecological progress

Signal's Choice

With so many solutions introduced earlier, is breakpad the first choice for Signal? Although breakpad is good, it covers the compilation of too many other systems, such as mac, window and other standards, and as an open source library, we still hope to reduce the import of these libraries, so as with most mainstream solutions, we choose to use unwind.h to implement stack printing, because this is directly built into our default compilation, and this can also be used in android! Let's take a look at the implementation below! That is, the implementation of the unwind-utils of the Signal project. So what are we to consider!

stack size

Of course, the log needs to set the stack size of the traceback. Too much content is not good (too bloated and difficult to troubleshoot), and too little content is not good (it is very likely to miss the key crash stack), so Signal defaults to 30, which can be determined according to the actual situation. project modification

std::string backtraceToLogcat() {
    默认30个
    const size_t max = 30;
    void *buffer[max];
    //ostringstream方便输出string
    std::ostringstream oss;
    dumpBacktrace(oss, buffer, captureBacktrace(buffer, max));
    return oss.str();
}

_Unwind_Backtrace

_Unwind_Backtrace is the stack backtrace function provided by unwind to us

_Unwind_Reason_Code _Unwind_Backtrace(_Unwind_Trace_Fn, void *);

那么这个_Unwind_Trace_Fn是个啥,其实点进去看

typedef _Unwind_Reason_Code (*_Unwind_Trace_Fn)(struct _Unwind_Context *,
                                                void *);

其实这就代表一个函数,对于我们常年写java的朋友有点不友好对吧,以java的方式,其实意思就是传xxx(随便函数名)( _Unwind_Context *,void *)这样的结构的函数即可,这里的意思就是一个callback函数,当我们获取到地址信息就会回调该参数,第二个就是需要传递给参数一的参数,这里有点绕对吧,我们怎么理解呢!参数一其实就是一个函数的引用,那么这个函数需要参数怎么办,就通过第二个参数传递!

我们看个例子:这个在Signal也有

static _Unwind_Reason_Code unwindCallback(struct _Unwind_Context *context, void *args) {
    BacktraceState *state = static_cast<BacktraceState *>(args);
    uintptr_t pc = _Unwind_GetIP(context);
    if (pc) {
        if (state->current == state->end) {
            return _URC_END_OF_STACK;
        } else {
            *state->current++ = reinterpret_cast<void *>(pc);
        }
    }
    return _URC_NO_REASON;
}


size_t captureBacktrace(void **buffer, size_t max) {
    BacktraceState state = {buffer, buffer + max};
    _Unwind_Backtrace(unwindCallback, &state);
    // 获取大小
    return state.current - buffer;
}
struct BacktraceState {
    void **current;
    void **end;
};

我们定义了一个结构体BacktraceState,其实是为了后面记录函数地址而用,这里有两个作用,end代表日志限定的大小,current表示实际日志条数大小(因为堆栈条数可能小于end)

_Unwind_GetIP

我们在unwindCallback这里拿到了系统回调给我们的参数,关键就是这个了 _Unwind_Context这个结构体参数了,这个参数的作用就是传递给_Unwind_GetIP这个函数,获取我们当前的执行地址,即pc值!那么这个pc值又有什么用呢!这个就是我们获取堆栈的关键!native堆栈的获取需要地址去解析!(不同于java)我们先有这个概念,后面会继续讲解

dladdr

经过了_Unwind_GetIP我们获取了pc值,这个时候就用上dladdr函数去解析了,这个是linux内核函数,专门用于地址符号解析

The function dladdr() determines whether the address specified in
       addr is located in one of the shared objects loaded by the
       calling application.  If it is, then dladdr() returns information
       about the shared object and symbol that overlaps addr.  This
       information is returned in a Dl_info structure:

           typedef struct {
               const char *dli_fname;  /* Pathname of shared object that
                                          contains address */
               void       *dli_fbase;  /* Base address at which shared
                                          object is loaded */
               const char *dli_sname;  /* Name of symbol whose definition
                                          overlaps addr */
               void       *dli_saddr;  /* Exact address of symbol named
                                          in dli_sname */
           } Dl_info;

       If no symbol matching addr could be found, then dli_sname and
       dli_saddr are set to NULL.

可以看到,每个地址会的解析信息会保存在Dl_info中,如果有运行符号满足,dli_sname和dli_saddr就会被设定为相应的so名称跟地址,dli_fbase是基址信息,因为我们的so库被加载到程序的位置是不固定的!所以一般采用地址偏移的方式去在运行时寻找真正的so库,所以就有这个dli_fbase信息。

Dl_info info;
if (dladdr(addr, &info) && info.dli_sname) {
    symbol = info.dli_sname;

}
os << " #" << idx << ": " << addr << " " <<"  "<<symbol <<"\n" ;

最终我们可以通过dladdr,一一把保存的地址信息解析出来,打印到native日志中比如Signal中demo crash信息(如果需要打印so名称,也可以通过dli_fname去获取,这里不举例)

image.png

native堆栈产生过程

通过上面的日志分析(最好看下demo中的app演示crash),我们其实在MainActivity中设定了一个crash函数

private external fun throwNativeCrash()

按照堆栈日志分析来看,只有在第16条才出现了调用符号,这跟我们在日常java开发中是不是很不一样!因为java层的堆栈一般都是最近的堆栈消息代表着错误消息,比如应该是第0条才导致的crash,但是演示中真正的堆栈crash却隐藏在了日志海里面!相信有不少朋友在看native crash日志也是,是不是也感到无从下手,因为首条日志往往并不是真正crash的主因!我们来看一下真正的过程:我们程序从正常态到crash,究竟发生了什么!

image.png

可以看到,我们真正dump_stack前,是有很多前置的步骤,为什么会有这么多呢!其实这就涉及到linux内核中断的原理,这里给一张粗略图

image.png crash产生后,一般会在用户态阶段调用中断进入内核态,把自己的中断信号(这里区分一下,不是我们signal.h里面的信号)放在eax寄存器中(大部分,也有其他的寄存器,这里仅举例)

然后内核层通过传来的中断信号,找到信号表,然后根据对应的处理程序,再抛回给用户态,这个时候才进行sigaction的逻辑

所以说,crash产生到真正dump日志,其实会有一个过程,这里面根据sigaction的设置也会有多个变化,我们要了解的一点是,真正的crash信息,往往藏在堆栈海中,需要我们一步步去解析,比如通过addr2line等工具去分析地址,才能得到真正的原因,而且一般的android项目,都是依赖于第三方的so,这也给我们的排查带来难度,不过只要我们能识别出特定的so(dli_fname信息就有),是不是就可以把锅甩出去了呢,对吧!

at last

Seeing this, readers and friends should have a rough model of the native stack, of course, don't be afraid! The Signal project includes the relevant unwind-utils tool class, which can be used directly, but the information printed at present is relatively simple, and you can add parameters according to your actual situation! The code is all inside, ask for star and ask for pr! Signal , of course, after reading this article, don't forget to leave your likes and comments!

Recommended in the past

I heard that the combination of Compose and RecyclerView will be uncomfortable?

Android gradle migrated to kts

I am participating in the recruitment of the creator signing program of the Nuggets Technology Community, click the link to register and submit .

Guess you like

Origin juejin.im/post/7118609781832548383