Table of contents
environment
- VMware® Workstation 16 Pro (Version: 16.1.2 build-17966106)
- ubuntu-22.04.2-desktop-amd64
problem situation
- When I am running a server program with millions of concurrency, the program runs and reports: Segmentation fault (core has been dumped) , causing the program to exit abnormally, as follows
Solutions _
- The first step is to determine the generation path and size limit of the core dump file. Then use a debugger such as GDB to analyze the core dump file and stack trace information to fix what caused the " segmentation fault " in the code.
Cause Analysis
1. What is a segmentation fault?
- Segmentation Fault (Segmentation Fault) is a common program error that usually occurs when an invalid memory address is accessed. When a program tries to access a memory segment that does not belong to it, the operating system sends a signal (SIGSEGV (segment fault signal)) to the program, called a segment fault.
2. Segmentation errors may occur
- Memory Access Error: One of the most common causes is when a program tries to access an invalid memory address or an uninitialized pointer. This could be due to a code error, buffer overflow, or memory out of bounds, etc. The operating system throws a segmentation fault when a program tries to access an area of memory that the system is not allowed to access.
- Invalid instruction or operation: Another common cause is that the program performed an invalid instruction or operation. This could be due to compilation errors, wrong code logic, or architectural incompatibilities, etc. A segmentation fault is caused when the processor attempts to execute an invalid instruction or operation.
- Dynamic memory allocation issues: When using dynamic memory allocation such as
malloc
ornew
, problems such as memory leaks, repeated freeing of freed memory, or access to freed memory can result in segfaults. These problems can be caused by incorrect memory management.- Stack overflow: If the stack space of the program exceeds its allowed range, such as stack overflow caused by infinite recursive calls or the use of a large number of local variables, a segmentation fault will occur.
- Library or dependency issues: Sometimes, segmentation faults can be caused by using broken libraries, incompatible versions, or missing dependencies. Misuse of the library or configuration issues can lead to segfaults.
- Hardware problems: Although rare, hardware failures, such as memory corruption, can also cause a program to report a segmentation fault and generate a core dump.
3. Where is the core dumped?
- When a program segfaults, the operating system generates a core dump file called
core
orcore.<进程ID>
, which contains the memory image and other relevant information when the program crashed. This core file is usually dumped to the current working directory.- But my core file is not generated in the working directory of the program, see the solution below...
solution _
1. Check the core dump file generation settings of the operating system
- Use the command
ulimit -a
to view the current core dump file size limit and other limit information- Look for the "core file size" field in the output, a red box
0
means core file generation is currently disabled. This limit can be changed to enable generation of core dump files.
2. Change the limit of "core file size" field to enable core dump file generation
The size limit of the core file can be set to unlimited by usingulimit -c unlimited
the command, but( not recommended )ulimit
the parameters set by the command only take effect in the current shell process, that is, the current session. Once the terminal window is closed, the settings will be reset to default. Therefore, this modification is not permanent.- If you want to permanently modify the core file generation size limit at the system level, you need to make configuration changes to the operating system.
/etc/security/limits.conf
The core file size limit can be set by modifying the file. Add or modify the following two lines:
- * soft core unlimited
- * hard core unlimited
- Restart the virtual machine and reload the system parameter configuration to ensure that the changes take effect ( restart command: sudo reboot )
3. Not much to say, just test it directly
- I don’t want to run the server program anymore, it’s too time-consuming, so write a test chestnut directly, the code is as follows:
- After running the test chestnut, no core file is generated in the project directory.
4. Determine the generation path of the core file
- Find information that the Linux kernel has a parameter
kernel.core_pattern
for specifying the file name and path mode when generating a core dump file, and the related configuration file is/proc/sys/kernel/core_pattern
. In Linux, however,sysctl
commands can be used to examine and change the generation path restrictions for core dump files.- Then use
sysctl kernel.core_pattern
the command to view the current core dump file generation path. It outputs the following line:
- kernel.core_pattern = |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
- What does the above line mean? The explanation is as follows:
|/usr/share/apport/apport
: It is a special core dump handler (core dump handler), which is a tool for collecting and reporting failures. When a process receivesSIGSEGV
or similar signal, the kernel willkernel.core_pattern
process and process the core dump file using the handler specified in .%p
: Process ID.%s
: The ID of the currently running thread.%c
: Signal codes to generate a core dump file.%d
: Sequence number, used to ensure unique names for core dump files generated within the same directory.%P
: Parent process ID.%u
: username.%g
: group name.%E
: The full path of the executable file that generated the core dump file.- Specifically,
/usr/share/apport/apport
a tool for Ubuntu systems that collects information about crashes and failures and generates corresponding error reports.
5. Modify the generation path of the core file
sudo sysctl -w kernel.core_pattern=<path_to_directory>/core
This can be restored to the desired path using the command. Make sure<path_to_directory>
it is a valid directory path. for example
- sudo sysctl -w kernel.core_pattern=core
- Recompile,
core.<进程ID>
the core dump file generated in the current directory is as follows- Note:
kernel.core_pattern
The value modified in the above way only takes effect at runtime and is not permanent. After the system restarts, the change will be reset to the default value.- You can learn about the last section " External Knowledge "
6. Use the debugging tool gdb to load and analyze the core file
- Generate the core file: Remember to add the -g command when compiling with gcc.
- Load the core file: use the gdb command line to load the core file, and load the core file into the debugging environment.
- gdb <path to executable file> <path to core file>
- View the stack trace: After running gdb, use
bt
the command ( orbacktrace
) to view the stack trace, which will show the function call chain of the program at the time of the crash.
- (gdb) bt
- Check variable values: You can use
- (gdb) print variable_name
- Jump to a specific frame: Use
frame
commands to navigate between stack frames and view stack information on a specific frame. Frame numbers are usually assigned in reverse order starting from 0, that is, the bottommost frame is numbered 0.
- (gdb) frame frame_number
- Analyzing the cause: Analyzing the stack backtrace and variable values can help you locate the cause of the program crash. Typically, the bottommost stack frame provides the location of the original crash.
- The operation is as follows: Note that *P is not initialized
Extra knowledge
1. Can write permission be added to /proc/sys/kernel/core_pattern?
- The answer is no. The default permissions are as follows (owner has read and write permissions, group users and other users only have read permissions).
- For
/proc/sys/kernel/core_pattern
files, read permissions cannot be added directly. This is because/proc
the directory and the files under it areprocfs
part of the virtual file system ( ), used to provide access to kernel and process information, and their permissions and ownership are controlled by the kernel, not restricted by the Linux file system permission model .- In
/proc
a directory, the permissions of each file and directory are usually set to read-only, and users are not allowed to modify their permissions directly. This is to ensure the integrity and consistency of the information provided and to prevent unauthorized changes to the kernel and process state.- Therefore, there is no way to directly add read permission to or change its permissions, via regular
chmod
commands or otherwise ./proc/sys/kernel/core_pattern
I get an error when trying to execute a command like:
- sudo chmod +w /proc/sys/kernel/core_pattern
- You will get an error message like "Operation not permitted" or "Operation not permitted".
/proc/sys/kernel/core_pattern
2. Why are changes to files reset to default after a system reboot ?
- This is because
/proc/sys/
the files in the directory are dynamically generated during kernel startup, and their values come from kernel parameters or other system settings. On system restart, these files are reloaded with their default values or values specified by certain configuration files.
3. How to achieve permanent
/proc/sys/kernel/core_pattern
file modification? (There is a problem with this one)
- Edit
/etc/sysctl.conf
the file: ( This method is not very good. After each system restart, you need to execute the sudo sysctl -p command to modify the value of /proc/sys/kernel/core_pattern )
- The file can be edited
/etc/sysctl.conf
to add changes to the core dump file schema to the file as follows:
- kernel.core_pattern = core
- After saving and exiting the file, use the following command to reload the configuration for the new core dump file mode to take effect:
- sudo sysctl -p
- Create and edit a system startup script: You can write a script to set the core dump file mode to the desired value at system startup. Place the script in an appropriate location, such as
/etc/init.d/
a directory, and set it to execute at system startup. ( Tested to no avail )
- Create Startup Script File: Creates a new file in the selected directory
- sudo vim /etc/init.d/my_startup_script.sh
- Write the startup script: add the following content in the script, save it successfully and exit.
- #!/bin/bash
- echo "core" >> /proc/sys/kernel/core_pattern
- exit 0
- Give the script execution permission: use the following command to give the execution permission to the startup script file
- sudo chmod +x /etc/init.d/my_startup_script.sh
- Configure startup script execution: Add startup scripts to the system's startup process to ensure execution when the system starts
- sudo update-rc.d my_startup_script.sh defaults
- To disable the startup of the script, you can use the following command (learn)
- sudo update-rc.d -f my_startup_script.sh remove
Notice
- It should be noted that changing
/proc/sys/kernel/core_pattern
the permissions and content of files is a sensitive operation, which may affect the stability and security of the system. Always be careful and make sure you understand the impact of the changes you make.