Kdump checks for Linux kernel crashes!

kdump is one way to get a crashed  Linux  kernel dump, but it can be a bit difficult to find documentation that explains its use and internals. In this article, I will study the basic usage of kdump and how kdump/kexec is implemented in the kernel.

kexec is a Linux kernel-to-kernel bootloader that helps boot from the context of a first kernel to a second kernel. kexec shuts down the first core, bypassing the BIOS or firmware phase, and jumps to the second core. Therefore, without the BIOS phase, the reboot becomes faster.

kdump can be used with the kexec application - for example, when the first kernel crashes a second kernel is started, the second kernel is used to copy the memory dump of the first kernel, and the crash can be analyzed using tools such as gdb and crash s reason. (In this article, I will use the terms "first kernel" as the currently running kernel, "second kernel" as the kernel running with kexec, and "capture kernel" to mean the kernel running when the current kernel crashes.)

The kexec mechanism has components in the kernel as well as user space. The kernel provides several system calls for the kexec restart functionality. A userspace tool called kexec-tools uses these calls and provides an executable file to load and boot the "second kernel". Some distributions also add wrappers on kexec-tools, which help capture and save dumps of various dump target configurations. In this article, I will use a tool called distro-kexec-tools to avoid confusion between the upstream kexec tools and the distribution-specific kexec-tools code. My examples will use the Fedora Linux distribution.

Fedora kexec-tools tool

Use the dnf install kexec-tools  command to install fedora-kexec-tools on the Fedora machine. After installing fedora-kexec-tools, you can execute the systemctl start kdump  command to start the kdump service. When this service starts, it creates a root file system (initramfs) that contains the resources to be mounted to the target location, to hold the vmcore, and the commands to copy and dump the vmcore to the target location. The service then loads the kernel and initramfs into appropriate locations within the crashed kernel region so that they can be executed if the kernel crashes.

The Fedora wrapper provides two user profiles:

/etc/kdump.conf specifies the configuration parameters that need to be rebuilt after modification. For example, if you change the dump target from a local disk to an NFS-mounted disk, you need the NFS-related kernel modules loaded by Capture Kernel.
/etc/sysconfig/kdump specifies configuration parameters that do not require rebuilding the initramfs after modification. For example, if you only need to modify the command line parameters passed to the "capture kernel", you do not need to rebuild the initramfs.
If the kernel fails after the kdump service is started, then a "capture kernel" is performed, which further performs the vmcore save process in the initramfs and then reboots to a stable kernel.

kexec-tools tools

Compiling the source code of kexec-tools results in an executable file named kexec. This eponymous executable can be used to load and execute a "second kernel", or load a "capture kernel", which can be executed when the kernel crashes.

Command to load the "second kernel":

# kexec -l kernel.img --initrd=initramfs-image.img –reuse-cmdline

The --reuse-command parameter means to use the same command line as the "first kernel". Use --initrd to pass initramfs. -l indicates that you are loading a "second kernel", which can be executed by the kexec application itself (kexec -e). Kernels loaded with -l cannot be executed on a kernel panic. In order to load a "capture kernel" that can be executed on a kernel crash, the parameter -p must be passed instead of -l.

Command to load the capture kernel:

# kexec -p kernel.img --initrd=initramfs-image.img –reuse-cmdline

echo c > /pros/sysrq-trigger can be used to crash the kernel for testing. For more information about the options provided by kexec-tools, see man kexec.

kdump: end-to-end streaming

The figure below shows the flow chart. The crashkernel's memory must be reserved for the capture kernel during booting of the "first kernel". You can pass crashkernel=Y@X on the kernel command line, where @X is optional. crashkernel=256M works on most x86_64 systems; however, choosing the appropriate memory for a crashkernel depends on many factors, such as the kernel size and initramfs, as well as the modules contained in the initramfs and the memory requirements of the application runtime. See the kernel-parameters documentation for more ways to pass crash kernel parameters.

Kdump checks for Linux kernel crashes!  Kdump checks for Linux kernel crashes!

pratyush_f1.png

You can pass the kernel and initramfs images to the kexec executable as shown in the commands in the (kexec-tools) section. The "capture kernel" can be the same as the "first kernel", or it can be different. Usually, the same will do. Initramfs is optional; for example, you don't need it when the kernel is compiled with CONFIG_INITRAMFS_SOURCE. Typically, a different capture initramfs is saved from the first initramfs because better results are achieved by automatically executing a copy of vmcore in the capture initramfs. When kexec is executed, it also loads the elfcorehdr data and the purgatory executable file (LCTT Annotation: purgatory is a boot loader, customized for kdump. It was given the weird name "Purgatory" and it should just be a kind of Ridiculously). elfcorehdr has information about the system's memory organization, while purgatory can be executed before the "capture kernel" is executed and verifies that the second-stage binary or data has the correct SHA. purgatory is also optional.
When the "first kernel" crashes, it performs the necessary exit procedures and switches to purgatory (if present). purgatory verifies the SHA256 of the loaded binary and, if correct, passes control to the "capture kernel". "Capture Core" creates a vmcore based on the system memory information received from elfcorehdr. Therefore, after Capture Kernel starts, you will see a dump of the "first core" in /proc/vmcore. Depending on which initramfs you are using, you can now analyze the dump, copy it to any disk, or automatically, and then reboot to a stable kernel.

kernel system call

The kernel provides two system calls: kexec_load() and kexec_file_load(), which can be used to load the "second kernel" when executing kexec -l. It also provides an additional flag to the reboot() system call, which can be used to boot into a "second kernel" using kexec -e.

kexec_load(): The kexec_load() system call loads a new kernel that can later be executed via reboot(). Its prototype is defined as follows:

long kexec_load(unsigned long entry, unsigned long nr_segments,
struct kexec_segment *segments, unsigned long flags);

User space needs to pass different segments for different components, such as kernel, initramfs, etc. Therefore, the kexec executable helps prepare these segments. The structure of kexec_segment is as follows:

struct kexec_segment {
void *buf;
/* User space buffer */
size_t bufsz;
/*Buffer length in user space*/
void *mem;
/* Physical address of the kernel */
size_t memsz;
/*Physical address length*/
};

When reboot() is called with LINUX_REBOOT_CMD_KEXEC, it boots into the kernel loaded by kexec_load. If the flag KEXEC_ON_CRASH is passed to kexec_load(), the loaded kernel will not be booted with reboot(LINUX_REBOOT_CMD_KEXEC); instead, this will be performed in a kernel panic. CONFIG_KEXEC must be defined to use kexec, and CONFIG_CRASH_DUMP must be defined for kdump.

kexec_file_load(): As a user, you only need to pass two parameters (i.e. kernel and initramfs) to the kexec executable file. kexec then reads data from sysfs or other kernel information sources and creates all segments. So using kexec_file_load() can simplify the user space and only pass the file descriptors of the kernel and initramfs. The rest is done by the kernel itself. CONFIG_KEXEC_FILE should be enabled when using this system call. Its prototype is as follows:

long kexec_file_load(int kernel_fd, int initrd_fd, unsigned long
cmdline_len, const char __user * cmdline_ptr, unsigned long
flags);

Note that kexec_file_load also accepts the command line, while kexec_load() does not. The kernel accepts and executes command lines depending on the system architecture. Therefore, in the case of kexec_load(), kexec-tools will pass the command line through one of the sections (such as in dtb or ELF boot comments, etc.).

Currently, kexec_file_load() only supports x86 and PowerPC.

What happens when the kernel crashes

When the first kernel crashes, before control is passed to the purgatory or "capturing kernel", the following actions are performed:

  • Prepare CPU registers (see crash_setup_regs() in kernel code);
  • Update vmcoreinfo remarks (see crash_save_vmcoreinfo());
  • Shut down the non-crash CPU and save prepared registers (see machine_crash_shutdown() and crash_save_cpu());
  • You may need to disable the interrupt controller here;
  • Finally, it performs a kexec restart (see machine_kexec()), which loads or flushes the kexec segment into memory and passes control to the executable file entering the segment. The input segment can be the purgatory or starting address of the next kernel.

ELF program header

Most dump cores involved in kdump are in ELF format. Therefore, it is important to understand the ELF program header, especially if you want to find problems with vmcore preparation. Every ELF file has a program header:

  • Read by the system loader,
  • Describe how to load a program into memory,
  • You can use Objdump -p elf_file to view the program header.

An example of vmcore's ELF program header is as follows:

# objdump -p vmcore
vmcore:
file format elf64-littleaarch64
Program Header:
NOTE off 0x0000000000010000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**0 filesz
0x00000000000013e8 memsz 0x00000000000013e8 flags ---
LOAD off 0x0000000000020000 vaddr 0xffff000008080000 paddr 0x0000004000280000 align 2**0 filesz
0x0000000001460000 memsz 0x0000000001460000 flags rwx
LOAD off 0x0000000001480000 vaddr 0xffff800000200000 paddr 0x0000004000200000 align 2**0 filesz
0x000000007fc00000 memsz 0x000000007fc00000 flags rwx
LOAD off 0x0000000081080000 vaddr 0xffff8000ffe00000 paddr 0x00000040ffe00000 align 2**0 filesz
0x00000002fa7a0000 memsz 0x00000002fa7a0000 flags rwx
LOAD off 0x000000037b820000 vaddr 0xffff8003fa9e0000 paddr 0x00000043fa9e0000 align 2**0 filesz
0x0000000004fc0000 memsz 0x0000000004fc0000 flags rwx
LOAD off 0x00000003807e0000 vaddr 0xffff8003ff9b0000 paddr 0x00000043ff9b0000 align 2**0 filesz
0x0000000000010000 memsz 0x0000000000010000 flags rwx
LOAD off 0x00000003807f0000 vaddr 0xffff8003ff9f0000 paddr 0x00000043ff9f0000 align 2**0 filesz
0x0000000000610000 memsz 0x0000000000610000 flags rwx

In this example, there is a note section and the rest are load sections. The note section provides information about the CPU, and the load section provides information about the copied system memory components.
vmcore starts with elfcorehdr, which has the same structure as the ELF program header. See the representation of elfcorehdr in the figure below:

Kdump checks for Linux kernel crashes!  Kdump checks for Linux kernel crashes!

pratyush_f2.png

kexec-tools reads /sys/devices/system/cpu/cpu%d/crash_notes and prepares the CPU PT_NOTE header. Likewise, it reads /sys/kernel/vmcoreinfo and prepares the vmcoreinfo PT_NOTE header, and reads system memory from /proc/iomem and prepares the storage PT_LOAD header. When the "capture core" receives the elfcorehdr, it reads the data from the address mentioned in the header and prepares the vmcore.

Crash note

Crash notes are an area in each CPU used to store CPU state in the event of a system crash; it has information about the current PID and CPU registers.

vmcoreinfo

This note section has various kernel debugging information, such as structure size, symbol value, page size, etc. These values ​​are parsed by the capture kernel and embedded into /proc/vmcore. vmcoreinfo is primarily used by the makedumpfile application. In the Linux kernel, the include/linux/kexec.h macro defines a new vmcoreinfo. Some example macros are as follows:

  • VMCOREINFO_PAGESIZE()
  • VMCOREINFO_SYMBOL()
  • VMCOREINFO_SIZE()
  • VMCOREINFO_STRUCT_SIZE()

makedumpfile

Much of the information in vmcore, such as available pages, is not useful. makedumpfile is an application used to exclude unnecessary pages such as:

  • pages filled with zeros;
  • Cache pages without the private flag (non-private cache);
  • Cache pages with private flag (private cache);
  • User process data page;
  • Available pages.

Additionally, makedumpfile compresses the data in /proc/vmcore when copying. It can also remove sensitive symbol information from the dump; however, in order to do this, it first requires the kernel's debugging information. This debugging information comes from VMLINUX or vmcoreinfo, the output of which can be in ELF format or kdump compressed format.

Typical usage:

# makedumpfile -l --message-level 1 -d 31 /proc/vmcore makedumpfilecore

See man makedumpfile for details.

kdump debugging

Problems novices may encounter when using kdump:

kexec -p kernel_image without success

Check if crash memory is allocated.

  • cat /sys/kernel/kexec_crash_size should not have a zero value.
  • cat /proc/iomem | grep "Crash kernel" There should be an allocated range.
  • If not assigned, pass the correct crashkernel= parameter on the command line.
  • If not, pass the -d parameter to the kexec command and send the output to the kexec-tools mailing list.

After the last message of "first kernel" I don't see anything on the console (like "bye")

  • Check whether the kexec -l kernel_image command after kexec -e works.
  • Supported architectures or machine-specific options may be missing.
  • It may be that purgatory's SHA verification failed. If your architecture does not support the console in purgatory, it can be difficult to debug.
  • It may be that the "second kernel" has already crashed.
  • Pass your system's earlycon or earlyprintk option to the "second kernel" command line.
  • Use the kexec-tools mailing list to share the first kernel and capture the kernel's dmesg logs.

resource

fedora-kexec-tools

  • GitHub repository: git://pkgs.fedoraproject.org/kexec-tools
  • Mailing list: [email protected]
  • Description: Specs files and scripts provide user-friendly commands and services so that kexec-tools can be automated in different user scenarios.

kexec-tools

  • GitHub 仓库:git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
  • Mailing list: [email protected]
  • Description: Use the kernel system call and provide the user command kexec.

Linux kernel

  • GitHub repository: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  • Mailing list: [email protected]
  • Description: Implements the kexec_load(), kexec_file_load(), reboot() system calls and architecture-specific code such as machine_kexec() and machine_crash_shutdown().

Makedumpfile

  • GitHub repository: git://git.code.sf.net/p/makedumpfile/code
  • Mailing list: [email protected]
  • Description: Compress and filter unnecessary components from dump files.

(Title: Penguin, Boot, modification: Opensource.com. CC BY-SA 4.0)

About the Author:

Pratyush Anand - Pratyush is working with Red Hat as a Linux kernel expert. He is mainly responsible for several kexec/kdump issues faced by Red Hat products and upstream. He also handles other kernel debugging, tracing, and performance issues surrounding the ARM64 platform supported by Red Hat. In addition to the Linux kernel, he has contributed to the upstream kexec-tools and makedumpfile projects. He is an open source enthusiast and promotes FOSS by giving volunteer lectures at educational institutions.

Guess you like

Origin blog.csdn.net/yaxuan88521/article/details/133499624