Encountered an unkillable zombie process




1. Know the zombie process


1. How did the zombie process come into being?

When a process is running, a parent process and some child processes will be generated;
when the child process is executed, it will send an Exit signal and then die, and then its parent process will call (wait / waitpid) to read its exit status,
If the reading is successful, the child process is deleted from the process table,
otherwise it cannot be deleted from the process table, then it becomes a zombie process.

When you use the ps command to observe the process status, you can see that the status of these processes is defunct.
ps aux | grep Z


2. Is the zombie process harmful?

If the number of zombie processes is huge and exists for a long time, it is equivalent to a large amount of zombie process information remaining in the process table,
and this information needs to be stored in memory, so it will waste resources.




Second, the conventional killing method of the zombie process


1. Kill -9 parent process pid

Since the zombie process has died (only the task_struct structure is retained), and the dead process cannot be
killed directly, the zombie process is generally killed indirectly by killing the parent process.

Kill the parent process, the zombie process will become an orphan process, and then adopted to the first process, and the first process will scan the child process under its name, and reclaim the Z state process;

ps -ef  | grep 66046
qtest    66046      12321  99 Apr07 ?        992-23:20:31 [kvsvr] <defunct>

kill -9  12321

2. Operational risk reminder

Before killing the parent process, it is recommended to evaluate the operational risk and see what other processes associated with the parent process can tolerate being killed.




Third, the violent killing method of zombie process


1. Restart

If the parent process of the zombie process is process No. 1 (ppid=1),

ps -ef  | grep 66046
qtest    66046      1  99 Apr07 ?        992-23:20:31 [kvsvr] <defunct>

So kill will not work anymore, you can only restart the server to solve;


2. Operational risk reminder

Restarting is relatively simple and rude, but it is also effective. It is
recommended to determine whether it can be solved by restarting based on whether the above services can tolerate the impact of restart.




4. Why can the ppid of a zombie process be 1?


1. Theoretically, the process taken over by Init will not become a zombie

At the end of each process, the system will scan all the processes running in the current system to
see if any process is a child process of the process that just ended.
If so, the Init process will take over it and become it. The parent process to ensure that each process will have a parent process.


Generally speaking,
once the init process takes over the process in the Z state, it will call wait to recycle it.
Therefore, in theory, all processes taken over by Init will not become zombie processes.


So why there are zombie processes with ppid 1?



2. Try to speculate on a possibility

Back to the root cause of the zombie process "process exit" Here, here is an attempt to speculate on a possibility:

The kernel function do_exit is called when the process ends. This function has two key logics:

do_exit()
  ->exit_notify()
     -> do_notify_parent()

2.1 As a parent process: find a new parent process for your child process (if it exists)**

If the process to be exited is a multi-threaded process, you can entrust the child process to your own brother thread,
if there is no such thread, entrust to the init process; in
short, the init process will do the trick.


2.2 As a child process: notify your parent process to release task_struct for yourself**

For single-threaded processes, this process is also relatively simple;
but for multi-threaded processes, it is slightly more complicated:
because only the main thread of the thread group is eligible to notify the parent process,
when other threads of the thread group terminate, they will not be notified. The parent process does not even need to reserve resources to enter the zombie state. It is done by directly calling the release_task function to release all resources.

Since the parent process only recognizes the main thread of the child process, in the thread group, if the main thread terminates, but if there are other threads running in the thread group, then the parent process will not be notified to release task_struct for itself until the thread group It will only be released when the last thread exits.

Therefore, in the user mode, you can call pthread_exit to let the main thread exit first, but in the kernel mode, the task_struct of the main thread may need to be retained because there are other threads running in the thread group; in this case, the main thread Will become Z zombie state, even if init takes over the main thread, it will not change.


So the phenomenon that the ppid of the "zombie process" is 1.

Guess you like

Origin blog.csdn.net/weixin_44648216/article/details/111877287