core dump管理在linux中的前世今生

一、什么是core dump？

二、coredump是怎么来的？

三、怎么限制coredump文件的产生？

ulimit

半永久限制

永久限制

四、从源码分析如何对coredump文件的名字和路径管理

命名

管理

一些问题的答案

1、为什么新的ubuntu不能产生coredump了，需要手动管理？

2、systemd是什么时候出现在ubuntu中的？

3、为什么使用脚本管理coredump后ulimit就不能限制coredump文件的大小了。

4、为什么ulimit限制的大小并不准确？

5、使用脚本就没办法限制coredump的大小了嘛？

6、ubuntu现在对coredump文件的管理方式

一、什么是core dump？

core dump是Linux应用程序调试的一种有效方式，core dump又称为“核心转储”，是该进程实际使用的物理内存的“快照”。分析core dump文件可以获取应用程序崩溃时的现场信息，如程序运行时的CPU寄存器值、堆栈指针、栈数据、函数调用栈等信息。在嵌入式系统开发中是开发人员调试所需必备材料之一。每当程序异常行为产生coredump时我们就可以利用gdb工具去查看程序崩溃的原因。

二、coredump是怎么来的？

Core dump是Linux基于信号实现的。Linux中信号是一种异步事件处理机制，每种信号都对应有默认的异常处理操作，默认操作包括忽略该信号（Ignore）、暂停进程（Stop）、终止进程（Terminate）、终止并产生core dump（Core）等。

Signal	Value	Action	Comment
SIGHUP	1	Term	Hangup detected on controlling terminal or death of controlling process
SIGINT	2	Term	Interrupt from keyboard
SIGQUIT	3	Core	Quit from keyboard
SIGILL	4	Core	Illegal Instruction
SIGTRAP	5	Core	Trace/breakpoint trap
SIGABRT	6	Core	Abort signal from abort(3)
SIGIOT	6	Core	IOT trap. A synonym for SIGABRT
SIGEMT	7	Term
SIGFPE	8	Core	Floating point exception
SIGKILL	9	Term	Kill signal, cannot be caught, blocked or ignored.
SIGBUS	10,7,10	Core	Bus error (bad memory access)
SIGSEGV	11	Core	Invalid memory reference
SIGPIPE	13	Term	Broken pipe: write to pipe with no readers
SIGALRM	14	Term	Timer signal from alarm(2)
SIGTERM	15	Term	Termination signal
SIGUSR1	30,10,16	Term	User-defined signal 1
SIGUSR2	31,12,17	Term	User-defined signal 2
SIGCHLD	20,17,18	Ign	Child stopped or terminated
SIGCONT	19,18,25	Cont	Continue if stopped
SIGSTOP	17,19,23	Stop	Stop process, cannot be caught, blocked or ignored.
SIGTSTP	18,20,24	Stop	Stop typed at terminal
SIGTTIN	21,21,26	Stop	Terminal input for background process
SIGTTOU	22,22,27	Stop	Terminal output for background process
SIGIO	23,29,22	Term	I/O now possible (4.2BSD)
SIGPOLL		Term	Pollable event (Sys V). Synonym for SIGIO
SIGPROF	27,27,29	Term	Profiling timer expired
SIGSYS	12,31,12	Core	Bad argument to routine (SVr4)
SIGURG	16,23,21	Ign	Urgent condition on socket (4.2BSD)
SIGVTALRM	26,26,28	Term	Virtual alarm clock (4.2BSD)
SIGXCPU	24,24,30	Core	CPU time limit exceeded (4.2BSD)
SIGXFSZ	25,25,31	Core	File size limit exceeded (4.2BSD)
SIGSTKFLT	16	Term	Stack fault on coprocessor (unused)
SIGCLD	18	Ign	A synonym for SIGCHLD
SIGPWR	29,30,19	Term	Power failure (System V)
SIGINFO	29		A synonym for SIGPWR, on an alpha
SIGLOST	29	Term	File lock lost (unused), on a sparc
SIGWINCH	28,28,20	Ign	Window resize signal (4.3BSD, Sun)
SIGUNUSED	31	Core	Synonymous with SIGSYS

以下情况会出现应用程序崩溃导致产生core dump：

内存访问越界（数组越界、字符串无\n结束符、字符串读写越界）
多线程程序中使用了线程不安全的函数，如不可重入函数
多线程读写的数据未加锁保护（临界区资源需要互斥访问）
非法指针（如空指针异常或者非法地址访问）
堆栈溢出

三、怎么限制coredump文件的产生？

ulimit

说到限制就不得不提到ulimit工具了，这个工具其实有个坑，他并不是linux自带的工具，而是shell工具，也就是说他的级别是shell层的。

当我们新开启一个终端后之前终端的全部配置都会消失，并且他还看人下菜碟，每个linux用设置的都是自己的ulimit，比如你有两个用户，在同一个终端下对当前终端配置

一个 ulimit -c 1024

一个ulimit -c 2048

他们的限制都作用在各自的范围

即便同一个目录的同一个文件，本来这个coredump可以产生4096k个字节那么a的coredump就是1024，b的就是2048当然，不会很准确的限制，后面会从源码上给大家说明为什么。

还有就是可以通过配置文件，但是这个优先级虽然高但是可以改，这个文件只在重启时被读取一次，后面其它人再来改这个限制你之前的限制也就无效了。

半永久限制

vim /etc/profile
相当于每开一个shell时，自动执行ulimit -c unlimited（表示无限制其它具体数字通常表示多少k）

永久限制

编辑/etc/security/limits.conf，需要重新登录，或者重新打开ssh客户端连接，永久生效

下面就是软限制和硬限制

四、从源码分析如何对coredump文件的名字和路径管理

命名

在/etc/sysctrl.conf中保存的是用户给内核传递的参数，这里有个core_pattern参数是告诉内核怎么管理coredump的。

后面直接写等号就表示这个coredump文件的命名

%% 单个%字符

%p 所dump进程的进程ID

%u 所dump进程的实际用户ID

%g 所dump进程的实际组ID

%s 导致本次core dump的信号

%t core dump的时间 (由1970年1月1日计起的秒数)

%h 主机名

%e 程序文件名

管理

kernel.core_pattern=|/usr/bin/coredump_helper.sh core_%e_%I_%p_sig_%s_time_%t.gz

像上面那样配置core_pattern后内核就可以使用我们指定的脚本去管理coredump文件了。

这个路径必须是绝对路径，前面必须是|前后都不能有空格，coredump文件会作为参数传入脚本。

这里我直接将文件以压缩格式传递给了脚本

#!/bin/sh

if [ ! -d "/var/coredump" ];then
    mkdir -p /var/coredump
fi
gzip > "/var/coredump/$1"

上面的代码就是一个简单的示例可以把coredump都放到var下的coredump目录中，我们还可以添加一些遍历判断操作限制产生coredump的数量的单个coredump的大小。

#!/bin/bash  
  
# Set the maximum number of files allowed in /var/coredump  
max_files=1000  
  
# Check if /var/coredump directory exists  
if [ ! -d "/var/coredump" ]; then  
  mkdir -p /var/coredump  
fi  
  
# Get the current number of files in /var/coredump  
current_files=$(ls -1 /var/coredump | wc -l)  
  
# Check if the number of files exceeds the maximum limit  
if [ $current_files -ge $max_files ]; then  
  # Sort the files based on creation time (oldest first)  
  sorted_files=$(ls -1t /var/coredump | head -n 1)  
    
  # Remove the oldest files to reduce the number of files to below the limit  
  rm $sorted_files  
fi  
  
# Compress the file and store it in /var/coredump using gzip  
gzip > "/var/coredump/$1"

代码很简陋正常还要有出错管理，异常情况打印到日志备份等等。

一些问题的答案

1、为什么新的ubuntu不能产生coredump了，需要手动管理？

因为现在的ubuntu使用的是systemd，在这里有些服务会对内核管理coredump的接口做配置，导致coredump文件被服务写到接口的脚本管理了。

2、systemd是什么时候出现在ubuntu中的？

systemd是在Ubuntu 15.04版本中首次出现的。在此之前，Ubuntu使用Upstart作为默认的init系统。然而，从Ubuntu 15.04开始，Ubuntu采用了systemd作为默认的init系统和服务管理器。systemd是一个用于启动、停止和管理系统进程的软件套件，它提供了更快的启动速度、更好的系统资源管理和更强大的服务控制能力。

而在Upstart之前用的才是SysVinit。

3、为什么使用脚本管理coredump后ulimit就不能限制coredump文件的大小了。

这个问题就要看源码了

void do_coredump(const siginfo_t *siginfo)
{
	struct core_state core_state;
	struct core_name cn;
	struct mm_struct *mm = current->mm;
	struct linux_binfmt * binfmt;
	const struct cred *old_cred;
	struct cred *cred;
	int retval = 0;
	int flag = 0;
	int ispipe;
	struct files_struct *displaced;
	bool need_nonrelative = false;
	bool core_dumped = false;
	static atomic_t core_dump_count = ATOMIC_INIT(0);
	struct coredump_params cprm = {
		.siginfo = siginfo,
		.regs = signal_pt_regs(),
		.limit = rlimit(RLIMIT_CORE),
		/*
		 * We must use the same mm->flags while dumping core to avoid
		 * inconsistency of bit flags, since this flag is not protected
		 * by any locks.
		 */
		.mm_flags = mm->flags,
	};

	audit_core_dumps(siginfo->si_signo);

	binfmt = mm->binfmt;
	if (!binfmt || !binfmt->core_dump)
		goto fail;
	if (!__get_dumpable(cprm.mm_flags))
		goto fail;

	cred = prepare_creds();
	if (!cred)
		goto fail;
	/*
	 * We cannot trust fsuid as being the "true" uid of the process
	 * nor do we know its entire history. We only know it was tainted
	 * so we dump it as root in mode 2, and only into a controlled
	 * environment (pipe handler or fully qualified path).
	 */
	if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {
		/* Setuid core dump mode */
		flag = O_EXCL;		/* Stop rewrite attacks */
		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
		need_nonrelative = true;
	}

	retval = coredump_wait(siginfo->si_signo, &core_state);
	if (retval < 0)
		goto fail_creds;

	old_cred = override_creds(cred);

	ispipe = format_corename(&cn, &cprm);

	if (ispipe) {
		int dump_count;
		char **helper_argv;
		struct subprocess_info *sub_info;

		if (ispipe < 0) {
			printk(KERN_WARNING "format_corename failed\n");
			printk(KERN_WARNING "Aborting core\n");
			goto fail_unlock;
		}

		if (cprm.limit == 1) {
			/* See umh_pipe_setup() which sets RLIMIT_CORE = 1.
			 *
			 * Normally core limits are irrelevant to pipes, since
			 * we're not writing to the file system, but we use
			 * cprm.limit of 1 here as a speacial value, this is a
			 * consistent way to catch recursive crashes.
			 * We can still crash if the core_pattern binary sets
			 * RLIM_CORE = !1, but it runs as root, and can do
			 * lots of stupid things.
			 *
			 * Note that we use task_tgid_vnr here to grab the pid
			 * of the process group leader.  That way we get the
			 * right pid if a thread in a multi-threaded
			 * core_pattern process dies.
			 */
			printk(KERN_WARNING
				"Process %d(%s) has RLIMIT_CORE set to 1\n",
				task_tgid_vnr(current), current->comm);
			printk(KERN_WARNING "Aborting core\n");
			goto fail_unlock;
		}
		cprm.limit = RLIM_INFINITY;

		dump_count = atomic_inc_return(&core_dump_count);
		if (core_pipe_limit && (core_pipe_limit < dump_count)) {
			printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
			       task_tgid_vnr(current), current->comm);
			printk(KERN_WARNING "Skipping core dump\n");
			goto fail_dropcount;
		}

		helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);
		if (!helper_argv) {
			printk(KERN_WARNING "%s failed to allocate memory\n",
			       __func__);
			goto fail_dropcount;
		}

		retval = -ENOMEM;
		sub_info = call_usermodehelper_setup(helper_argv[0],
						helper_argv, NULL, GFP_KERNEL,
						umh_pipe_setup, NULL, &cprm);
		if (sub_info)
			retval = call_usermodehelper_exec(sub_info,
							  UMH_WAIT_EXEC);

		argv_free(helper_argv);
		if (retval) {
			printk(KERN_INFO "Core dump to |%s pipe failed\n",
			       cn.corename);
			goto close_fail;
		}
	} else {
		struct inode *inode;

		if (cprm.limit < binfmt->min_coredump)
			goto fail_unlock;

		if (need_nonrelative && cn.corename[0] != '/') {
			printk(KERN_WARNING "Pid %d(%s) can only dump core "\
				"to fully qualified path!\n",
				task_tgid_vnr(current), current->comm);
			printk(KERN_WARNING "Skipping core dump\n");
			goto fail_unlock;
		}

		cprm.file = filp_open(cn.corename,
				 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
				 0600);
		if (IS_ERR(cprm.file))
			goto fail_unlock;

		inode = file_inode(cprm.file);
		if (inode->i_nlink > 1)
			goto close_fail;
		if (d_unhashed(cprm.file->f_path.dentry))
			goto close_fail;
		/*
		 * AK: actually i see no reason to not allow this for named
		 * pipes etc, but keep the previous behaviour for now.
		 */
		if (!S_ISREG(inode->i_mode))
			goto close_fail;
		/*
		 * Dont allow local users get cute and trick others to coredump
		 * into their pre-created files.
		 */
		if (!uid_eq(inode->i_uid, current_fsuid()))
			goto close_fail;
		if (!cprm.file->f_op->write)
			goto close_fail;
		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
			goto close_fail;
	}

	/* get us an unshared descriptor table; almost always a no-op */
	retval = unshare_files(&displaced);
	if (retval)
		goto close_fail;
	if (displaced)
		put_files_struct(displaced);
	if (!dump_interrupted()) {
		file_start_write(cprm.file);
		core_dumped = binfmt->core_dump(&cprm);
		file_end_write(cprm.file);
	}
	if (ispipe && core_pipe_limit)
		wait_for_dump_helpers(cprm.file);
close_fail:
	if (cprm.file)
		filp_close(cprm.file, NULL);
fail_dropcount:
	if (ispipe)
		atomic_dec(&core_dump_count);
fail_unlock:
	kfree(cn.corename);
	coredump_finish(mm, core_dumped);
	revert_creds(old_cred);
fail_creds:
	put_cred(cred);
fail:
	return;
}

从源码中可以看到在产生coredump的源码中会对上面的core_pattern进行判断，如果传入了脚本就会走上面的分支，没传入走下面的分支，上面只判断了limit是不是1，1表示不产生coredump，如果不是1就赋值为0，0就是没有限制。

而下面有这个判断，当已经产生的coredump文件超过ulimit -c的限制时就会停止产生。有一个点需要注意，ulimit-c和这里的limit不是完全对应的，是在shell源码中有转化的，不过大体上还是对应的。

4、为什么ulimit限制的大小并不准确？

还是要看这张图，它每次写入的都是内核设计好的，就写这些。

在外面有个for循环调用do_coredump文件.

int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
			  struct pt_regs *regs, void *cookie)
{
	struct sighand_struct *sighand = current->sighand;
	struct signal_struct *signal = current->signal;
	int signr;

	if (unlikely(current->task_works))
		task_work_run();

	if (unlikely(uprobe_deny_signal()))
		return 0;

	/*
	 * Do this once, we can't return to user-mode if freezing() == T.
	 * do_signal_stop() and ptrace_stop() do freezable_schedule() and
	 * thus do not need another check after return.
	 */
	try_to_freeze();

relock:
	spin_lock_irq(&sighand->siglock);
	/*
	 * Every stopped thread goes here after wakeup. Check to see if
	 * we should notify the parent, prepare_signal(SIGCONT) encodes
	 * the CLD_ si_code into SIGNAL_CLD_MASK bits.
	 */
	if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
		int why;

		if (signal->flags & SIGNAL_CLD_CONTINUED)
			why = CLD_CONTINUED;
		else
			why = CLD_STOPPED;

		signal->flags &= ~SIGNAL_CLD_MASK;

		spin_unlock_irq(&sighand->siglock);

		/*
		 * Notify the parent that we're continuing.  This event is
		 * always per-process and doesn't make whole lot of sense
		 * for ptracers, who shouldn't consume the state via
		 * wait(2) either, but, for backward compatibility, notify
		 * the ptracer of the group leader too unless it's gonna be
		 * a duplicate.
		 */
		read_lock(&tasklist_lock);
		do_notify_parent_cldstop(current, false, why);

		if (ptrace_reparented(current->group_leader))
			do_notify_parent_cldstop(current->group_leader,
						true, why);
		read_unlock(&tasklist_lock);

		goto relock;
	}

	for (;;) {
		struct k_sigaction *ka;

		if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
		    do_signal_stop(0))
			goto relock;

		if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
			do_jobctl_trap();
			spin_unlock_irq(&sighand->siglock);
			goto relock;
		}

		signr = dequeue_signal(current, &current->blocked, info);

		if (!signr)
			break; /* will return 0 */

		if (unlikely(current->ptrace) && signr != SIGKILL) {
			signr = ptrace_signal(signr, info);
			if (!signr)
				continue;
		}

		ka = &sighand->action[signr-1];

		/* Trace actually delivered signals. */
		trace_signal_deliver(signr, info, ka);

		if (ka->sa.sa_handler == SIG_IGN) /* Do nothing.  */
			continue;
		if (ka->sa.sa_handler != SIG_DFL) {
			/* Run the handler.  */
			*return_ka = *ka;

			if (ka->sa.sa_flags & SA_ONESHOT)
				ka->sa.sa_handler = SIG_DFL;

			break; /* will return non-zero "signr" value */
		}

		/*
		 * Now we are doing the default action for this signal.
		 */
		if (sig_kernel_ignore(signr)) /* Default is nothing. */
			continue;

		/*
		 * Global init gets no signals it doesn't want.
		 * Container-init gets no signals it doesn't want from same
		 * container.
		 *
		 * Note that if global/container-init sees a sig_kernel_only()
		 * signal here, the signal must have been generated internally
		 * or must have come from an ancestor namespace. In either
		 * case, the signal cannot be dropped.
		 */
		if (unlikely(signal->flags & SIGNAL_UNKILLABLE) &&
				!sig_kernel_only(signr))
			continue;

		if (sig_kernel_stop(signr)) {
			/*
			 * The default action is to stop all threads in
			 * the thread group.  The job control signals
			 * do nothing in an orphaned pgrp, but SIGSTOP
			 * always works.  Note that siglock needs to be
			 * dropped during the call to is_orphaned_pgrp()
			 * because of lock ordering with tasklist_lock.
			 * This allows an intervening SIGCONT to be posted.
			 * We need to check for that and bail out if necessary.
			 */
			if (signr != SIGSTOP) {
				spin_unlock_irq(&sighand->siglock);

				/* signals can be posted during this window */

				if (is_current_pgrp_orphaned())
					goto relock;

				spin_lock_irq(&sighand->siglock);
			}

			if (likely(do_signal_stop(info->si_signo))) {
				/* It released the siglock.  */
				goto relock;
			}

			/*
			 * We didn't actually stop, due to a race
			 * with SIGCONT or something like that.
			 */
			continue;
		}

		spin_unlock_irq(&sighand->siglock);

		/*
		 * Anything else is fatal, maybe with a core dump.
		 */
		current->flags |= PF_SIGNALED;

		if (sig_kernel_coredump(signr)) {
			if (print_fatal_signals)
				print_fatal_signal(info->si_signo);
			proc_coredump_connector(current);
			/*
			 * If it was able to dump core, this kills all
			 * other threads in the group and synchronizes with
			 * their demise.  If we lost the race with another
			 * thread getting here, it set group_exit_code
			 * first and our do_group_exit call below will use
			 * that value and ignore the one we pass it.
			 */
			do_coredump(info);
		}

		/*
		 * Death signals, no core dump.
		 */
		do_group_exit(info->si_signo);
		/* NOTREACHED */
	}
	spin_unlock_irq(&sighand->siglock);
	return signr;
}

哪次判断超出了就会停止生成.

5、使用脚本就没办法限制coredump的大小了嘛？

可以的，但是limit这个系统调用肯定是不行了，源码中我们就看到了人家都不鸟你。但是我们可以ulimit -f，这个脚本coredump文件从内核态到用户态是通过这个传入的脚本，所以直接ulimit -f限制脚本生成的全部文件大小就ok了，但是这里要写成正常两倍大小因为是先获取在写入，我们一个字节计算两次，这次限制的大小就完全准确了，也有个坏处就是文件会损坏限制住了也没法去看这个coredump。

6、ubuntu现在对coredump文件的管理方式

由于使用systemd机制，所以现在的coredump是以服务的方式存在的，为了可以更好的保存和规范化管理，coredump内容被放到了日志中，使用coredump@和socket服务共同实现，将coredump文件从内核态转移到用户态后又用了socket机制做过滤和写入日志。完美限制大小和内容。只要日志管理的好可以想看任何时候的有用的coredump。

---------------------------------------------------------------------------------------------------------------------------------

这是我工作中遇到的第一个做了两周的问题的一部分，当时贼痛苦各种查资料翻源码，好在最后想到了还算不错的解决方案，工作的提升是真的大，不止是技术，主要是团队写作能力，做事情的条例，规范化做事才会少出错。这些很重要。还有表达能力，不然你写的永远都是垃圾。下周有时间更新另一半，日志的管理。