[Code debugging] Linux coredump analysis



1 Introduction

  In the previous article, I described how to use the Valgrind tool to check memory-related issues, including memory leaks, null pointer use, wild pointer use, and repeated release. In most cases, Valgrind's effectiveness is more in the "memory leak" check, because the access of null pointers and wild pointers will cause a segment fault (segment fault) and terminate. At this time, you can use the coredump file of the Linux system. Combining with the gdb tool can quickly locate the location of the problem. In addition, there are many reasons for the system to record the coredump file caused by the program crash. Wild pointer and null pointer access are just one of them. Stack overflow, memory out of bounds, etc. will cause coredump. Making good use of coredump files can help us solve actual projects. The unusual problem.


2 coredump

2.1 What is coredump

   Coredump refers to when the application program terminates abnormally due to various reasons, the operating system records the status information of the application program when the abnormality occurs as a coredump file. A coredump file mainly contains the memory information, register status, stack address, and function call context of the application. The developer analyzes this information to determine the call location when the program exception occurs. If it is a stack overflow, it is necessary to analyze the multi-layer function Call information.

  In layman's terms, coredump is the operating system recording the information of abnormal termination of the application, leaving us the basis for troubleshooting.


2.2 coredump meaning

  The role of coredump for analyzing program exceptions is self-evident. In the past, we learned ARM 32-bit MCU as an example (STM32). Due to the initial learning process, the code quality is uneven, which often causes hardware error interrupts (Hard Fault). In the face of this situation, we are helpless. On the one hand, there is no meaningful information recorded after the program error occurs (of course, the stack information can be obtained in real time through the emulator, but it is not realistic for the actual product); The problem is that the probability of recurrence is relatively low, and the recurrence conditions are uncertain. The linux system is a "thought-out" operating system. If the application program is abnormal, it will record some key information, which is convenient for our analysis. This is the meaning of coredump.

  • Analyze the cause of the program exception based on the recorded information
  • Reverse the conditions of the problem based on the recorded information, and reproduce the problem to verify

2.3 Scenarios generated by coredump

  When an exception occurs in an application, a coredump file record will be generated. These exceptions are almost all related to memory. In summary, there are several points.

[1] Memory access out of bounds

  • Array subscript out of bounds
  • Out of dynamic (malloc/new) memory application scope
  • Strings have no terminator, and some functions depend on the terminator of the string, such as strcpy, strcmp, sprintf

[2] Access illegal pointer

  • Null pointer (memory not requested)
  • Wild pointer (Memory has been released)
  • Repeatedly release pointer (memory)
  • Pointer forced conversion, pointer forced conversion requires special care, which may cause memory access errors due to alignment, starting address and other issues

[3] Stack overflow, allocating a large number of local variables, multiple function calls, deep function recursion, etc. may cause stack overflow

[4] Multithreaded access

  • Call non-reentrant function
  • Shared data is not mutually exclusive

2.2 Open coredump

  By default, the system does not enable the coredump recording function, execute to "ulimit -c"check whether it is enabled, and return 0 to indicate that the coredump recording function is not enabled.

  • Check whether to record coredump
acuity@ubuntu:~$ ulimit -c
1024

  You can use the “ulimit -c [size]”command to specify the size of the coredump file to record, that is, to enable coredump recording. It should be noted that the unit is block1block=512bytes.

  • Open coredump
acuity@ubuntu:~$ ulimit -c 1024

  What if the program is bad, and the specified coredump file size limit causes the file records to be missing or missing? At this time, the once and for all solution is to not limit the size of the coredump file; to execute the “ulimit -c unlimited”settings, root privileges are required for the settings.

  • No limit on coredump file size
root@ubuntu:/home/acuity# ulimit -c unlimited
root@ubuntu:/home/acuity# ulimit -c 
unlimited

  The above methods are all temporarily set to enable the coredump recording function in the terminal, and it becomes invalid after the system restarts. Obviously this is not an ideal method. The ideal method is to modify the configuration file so that the system always enables the coredump recording function, at least during the development and testing phase of the project. In principle, the software should also be recorded after it is released, and there can be a basis for tracing and analyzing the problem after the problem occurs.

  • Enable via configuration file

"/etc/profile"Added   in files " ulimit -c unlimited ".

Note:

The ulimit command is a command to set resource limits. In addition to coredump, you can also set other resource limits

  • -a: View current resource limit information
  • -c <core maximum>: set the maximum value of the core file, the unit is block (block)
  • -d <data segment size>: the maximum value of the process data segment, the unit is KB
  • -f <file size>: the maximum file size that a process can create, in block
  • -H: Set the hard limit of resources, which cannot be changed after setting
  • -l <memory size>: lockable memory size, in KB
  • -m <memory size>: specify the upper limit of available memory, the unit is KB
  • -n <number of files>: the maximum number of files that can be opened by the process (number of file descriptors)
  • -p <buffer size>: the size of the pipeline buffer, the unit is KB
  • -s <stack size>: the maximum stack size of a thread, in KB
  • -S: Set the elastic limit of resources, which cannot exceed the hard resource limit
  • -t <cpu time>: cpu maximum occupancy time, in seconds
  • -u <number of processes>: the maximum number of processes the user can create
  • -v <virtual memory size>: the maximum available virtual memory of the process, in KB

  ** In addition, coredump can be turned on by setting in the code. **However, this method is generally not recommended, because if there is no opening function added to the code and the application is abnormal, the system will not be able to record coredump. It is recommended to enable it in the system configuration file setting.

Access interface:

#include <sys/resource.h>

int getrlimit(int resource, struct rlimit *rlim);	/* 获取coredump 文件限制大
小 */
int setrlimit(int resource, const struct rlimit *rlim);/* 设置coredump 文件限制
大小 */
  • Function, get (set) system resource limit, coredump is only one type of system resource, such as virtual memory size, process stack, maximum number of processes, etc.

  • resource, The system resource identifier, for coredump, isRLIMIT_CORE

  • rlim, The resource limit data structure, which is the limit value

    struct rlimit 
    {
          
            
    	rlim_t rlim_cur;  
    	rlim_t rlim_max;
    };
    
  • Return 0 return success, else return -1, the error code stored in errorthe

example:

#include <sys/resource.h>

int main(int argc, char * argv [ ])
{
    
    
	struct rlimit rlmt;
	
	rlmt.rlim_cur = (rlim_t)1024;
    rlmt.rlim_max  = (rlim_t)1024;

    if (-1 == setrlimit(RLIMIT_CORE, &rlmt)) 
    {
    
    
        perror("setrlimit error");
        return -1; 
    }   
}

2.3 coredump storage location and naming

  The coredump file is stored in the application execution directory by default, and the file name is "core". Using the default file name is obviously not a good way. If multiple applications terminate abnormally, the core file will be overwritten; or the same application will be restarted and run by the daemon after the abnormal termination, and the core file will be overwritten when the exception occurs again. .

  • File name with process id (PID)

  Modify the "/proc/sys/kernel/core_uses_pid"file, you can use the process id as the extension, the file content is 1 to use the extension, the default is 0; when the process id extension is used, the generated core file format is "core.xxx", xxx is the process id.

  • More detailed name and storage location

  Modifying the "/proc/sys/kernel/core_pattern"file can set the storage location of the coredump file and a more detailed file name. The default location and name information is as follows:

root@ubuntu:/home/acuity# cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %d %P %E

  Meaning of extended characters:

%p - 扩展进程id(pid)
%P - 与%p作用相同
%u - 扩展用户id(uid)
%g - 扩展组id(gid)
%s - 扩展产生信号
%t - 扩展当前时间,从1970-01-0100:00:00开始的秒数
%h - 扩展主机名
%e - 扩展应用程序文件名称
%E - 扩展应用程序文件名称,包括文件绝对路径

  The coredump storage directory remains unchanged (stored in the current application directory), and the file extension name is increased by the application file name, process id, and current time. This is the basic usage commonly used in actual scenarios. Can it be applied to absolutely certain occasions. You can use vi to directly open the file for editing, or you can use to echomodify the content of the file, provided that you must modify it with root privileges.

  • Generate “core.name.pit.time”files in the current directory of the application
echo ./core.%e.%p.%t > /proc/sys/kernel/core_pattern

  If you need to specify another storage path, you can modify the path part.

  • “/home”Generate “core-name-pit-time”files in the directory
echo /home/core-%e-%p-%t > /proc/sys/kernel/core_pattern

Note:

If you specify certain directories, you can generate a coredump file, but the content of the file is empty. Maybe it is a permission problem? ?


3 Use coredump

  Write an "illegal" program, let the system record coredump, and analyze the process with gdb; it needs to be added during compilation to "-g"retain debugging information.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char * argv [ ])
{
    
    	
	int *p = NULL;

	p = malloc(4);
	if (p == NULL)
	{
    
    
		perror("malloc failed");
	}
    printf("address [0x%p]\r\n", p);
	
	free(p);	
	free(p);	/* 重复释放*/
	
    return 0;
}

  Compile and execute the program, and the program exits abnormally due to the access to the wild pointer, and a coredump file will be generated.

  • View coredump file
root@ubuntu:/usr# file core.coredump.2046.1591860958 
core.coredump.2046.1591860958: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from './coredump'

Note:

Sometimes coredump only generates an empty file, which can be viewed through the "file" command

  • Start gdb debug command
gdb exe-file core-file

  • View coredump information
gdb后,键入“bt”

  • Results of the

Insert picture description here

  Through analysis, the abnormality is found in line 17. Looking through the source code, line 17 performed the operation of repeatedly releasing the dynamic memory application.


4 Reference articles

[1] Detailed coredump

[2] Formation and analysis of Core Dump files on Linux

[3] A discussion triggered by coreDump

Guess you like

Origin blog.csdn.net/qq_20553613/article/details/106672319