Summary of Linux kernel debugging methods coredump

What is core dump?

Analyzing core dump is an effective way for Linux application debugging. Like kernel debugging and grabbing ram dump, core dump is mainly used to obtain on-site information when the application crashes, such as memory, register status, stack pointer, and memory when the program is running. Management information, function call stack information, etc.

Core dump, also known as "core dump", is implemented by Linux based on signals. Signal in Linux is an asynchronous event processing mechanism. Each signal corresponds to a default exception handling operation. The default operation includes ignoring the signal (Ignore), suspending the process (Stop), terminating the process (Terminate), terminating and generating a core dump (Core) and so on.

Signal Value Action Comment
SIGHUP 1 Term Hangup detected on controlling terminal or death of controlling process
SIGINT 2 Term Interrupt from keyboard
SIGQUIT 3 Core Quit from keyboard
SEAL 4 Core Illegal Instruction
SIGTRAP 5 Core Trace/breakpoint trap
SIGABRT 6 Core Abort signal from abort(3)
SIGIOT 6 Core IOT trap. A synonym for SIGABRT
SIGEMT 7 Term  
SIGFPE 8 Core Floating point exception
SIGKILL 9 Term Kill signal, cannot be caught, blocked or ignored.
SIGBUS 10,7,10 Core Bus error (bad memory access)
SIGSEGV 11 Core Invalid memory reference
SIGPIPE 13 Term Broken pipe: write to pipe with no readers
SIGALRM 14 Term Timer signal from alarm(2)
SIGTERM 15 Term Termination signal
SIGUSR1 30,10,16 Term User-defined signal 1
SIGUSR2 31,12,17 Term User-defined signal 2
SIGCHLD 20,17,18 Ign Child stopped or terminated
SIGCONT 19,18,25 Account Continue if stopped
SIGSTOP 17,19,23 Stop Stop process, cannot be caught, blocked or ignored.
SIGTSTP 18,20,24 Stop Stop typed at terminal
SIGTTIN 21,21,26 Stop Terminal input for background process
SIGTTOU 22,22,27 Stop Terminal output for background process
SIGIO 23,29,22 Term I/O now possible (4.2BSD)
SIGPOLL   Term Pollable event (Sys V). Synonym for SIGIO
SIGPROF 27,27,29 Term Profiling timer expired
SIGSYS 12,31,12 Core Bad argument to routine (SVr4)
SIGURG 16,23,21 Ign Urgent condition on socket (4.2BSD)
SIGVTALRM  26,26,28 Term Virtual alarm clock (4.2BSD)
SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD)
SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD)
SIGSTKFLT 16 Term Stack fault on coprocessor (unused)
SIGCLD 18 Ign A synonym for SIGCHLD
SIGPWR 29,30,19 Term Power failure (System V)
SIGINFO 29   A synonym for SIGPWR, on an alpha
SIGLOST 29 Term File lock lost (unused), on a sparc
SIGWINCH 28,28,20 Ign Window resize signal (4.3BSD, Sun)
SIGNUS 31 Core Synonymous with SIGSYS

 

Under what circumstances will a core dump be generated?

The following situations will cause the application to crash and cause a core dump:

  1. Memory access out of bounds (array out of bounds, string without \n terminator, string read and write out of bounds)
  2. Multithreaded programs use thread-unsafe functions, such as non-reentrant functions
  3. Multi-threaded data read and write is not protected by locks (critical section resources require exclusive access)
  4. Illegal pointer (such as null pointer exception or illegal address access)
  5. Stack overflow

 

How to get core dump?

Linux provides a set of commands to configure core dump behavior:

1. ulimit -c check whether the core dump mechanism is enabled, if it is 0, no core dump will be generated by default, you can use ulimit -c unlimited to enable core dump

    

2. cat /proc/sys/kernel/core_pattern View the default save path of the core file. By default, it is saved in the current directory of the application, but if the current working directory is switched by calling the chdir() function in the application, it will be saved in Corresponding working directory

3. echo "/data/xxx/<core_file>"> /proc/sys/kernel/core_pattern specifies the storage path and file name of the core file. The following wildcards can be used in core_file:

%% single% character

%p Process ID of the dumped process

%u the actual user ID of the dumped process

%g The actual group ID of the dumped process

%s The signal that caused this core dump

%t core dump time (seconds since January 1, 1970)

%h hostname

%e program file name

4. ulimit –c [size] 指定core文件大小,默认是不限制大小的,如果自定义的话,size值必须大于4,单位是block(1block = 512bytes)

 

怎么分析core dump?

我们首先编写一个程序,人为地产生core dump并获取core dump文件。

 

程序如上图,我们通过除零操作产生core dump

 

编译运行产生了浮点数异常,从而引发core dump (注:编译时必须添加-g参数,表示添加调试信息,这样才可以使用gdb进行调试)

 

当前目录下产生了core文件,使用file命令查看core文件类型

 

发现core文件类型为ELF格式,使用readelf查看ELF文件头部信息如下

 

通过Type字段可以看到,该文件为core文件

 

前面我们讲到core dump可以查看应用程序崩溃时的现场信息,这里,我们需要gdb命令辅助实现,使用gdb test core(即test可执行文件和core文件)

 

“Program terminated with signal 8, Arithmetic exception”表示应用程序是因为接收到Linux内核发出的Signal 8信号量而终止执行,Signal 8是SIGFPE,即浮点数异常。同时打印出了出问题的代码行result = a/b。

通过bt –n (backtrace)命令可以显示函数调用栈信息,n表示显示的调用栈层数,不指定则打印完整调用栈。因为test.c调试程序不涉及函数调用,所以我们只能看到main函数的栈信息,如果程序是在main函数的字函数中出错,则可以打印更多的调用栈信息。

通过disassemble命令可以打印出错时的汇编代码片段,其中箭头指向的是出错的指令,即PC寄存器指向的地址,PC寄存器存放的是下一条执行指令。很多人会很困惑,因为通常程序执行的时候,PC寄存器指向的指令是待执行指令,就会怀疑gdb定位到的出错指令的准确性。其实,CPU确实是执行过这一条指令,但是CPU发现这条指令发生的异常,这个时候就会进入异常处理流程,gdb通过回溯调用栈准确地回到这一条指令执行前的状态,所以PC寄存器的值是完全可信的。

可以看到调用了div指令做除法操作,被除数是-0x8(%ebp),指当前栈基址向下偏移8个字节所在内存单元的数值,EBP是栈基址寄存器。同时我们可以看到前面通过movl $0x0, -0x8(%ebp)将0保存到该内存单元,证明被除数为0。

 

如上所示,gdb默认使用AT&T汇编语言格式打印汇编语句,可以通过set disassembly-flavor intel设置为intel汇编语言格式。

通过list命令可以查看当前指令附近的代码,前提是gdb工具可以找到源代码

Guess you like

Origin blog.csdn.net/daocaokafei/article/details/114967949