Why does the parent process in Linux need to know the cause of death of the child process?

White-haired man to black-haired man

A common common sense is that in Linux, “white-haired people send black-haired people”. The child process dies. The parent process waits for the child process to die through wait() and cleans up the child process zombies. Of course, the parent process can also be caused by this. Get the cause of death of the child process.

Zi once said: "Talk is cheap. Show me the code", let's take a look at the actual code:

Why does the parent process in Linux need to know the cause of death of the child process?

In the above code, the child process waits for the signal through pause() on line 18, and the parent process waits for the end of the child process through waitpid() on line 22 of the code. The parameter status is an output parameter that can get the cause of the death of the child process.

For example, we now run the above program:

./a.out

child process id: 3320

Then use signal 2 to kill the child process 3320:

kill -2 3320

The parent process waitpid() returns, and the reason is learned in the status, and the parent process prints:

child process is killed by signal 2

If we delete the pause() in the child process and directly exit _exit(1) instead:

Why does the parent process in Linux need to know the cause of death of the child process?

After the parent process detects the death of the child process, it can print its exit status:

$ ./a.out

child process id: 3362

child process exits, status=1

It can be seen that the parent process is well aware of the death and cause of death of the child process.

Why does the parent process in Linux need to know the cause of death of the child process?

This can also be seen from the kernel source code:

Why does the parent process in Linux need to know the cause of death of the child process?

In wait_task_zombie(), the parent process obtains the exit_code combination of the child process through the zombie analysis of the child process, and further assembles the status.

[Article benefits] C/C++ Linux server architect learning materials plus group 812855908 (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Insert picture description here

Everything happened for a reason

So, why does the parent process have to know the death of the child process? Why must the parent process struggle to know the cause of death of the child process?

The previous question is easy to answer. If we use the init process to start an httpd service for customers to visit our website, then the httpd process hangs in the middle of the night. First of all, as the company's network manager, he can't know that httpd is dead; secondly, if he knows that httpd is dead, he can't drive in the middle of the night to re-enter the httpd command. Therefore, this process should be automatically completed by a certain mechanism of Linux. For example, if init knows that httpd is dead, it can automatically restart an httpd process internally.

The latter question is a bit more complicated, we have to combine an actual init project example to answer. Here we take systemd as an example. systemd is the init project used by the current mainstream Linux hair styles. For example, my Ubuntu 18.10 is:

$ ls -l /sbin/init

… /sbin/init -> /lib/systemd/systemd

/sbin/init is a symbolic link to systemd.

In systemd, if we want to add a background service that starts at boot, we can add a service file in the /lib/systemd/system/ directory. For example, here I added a very simple service file:

/lib/systemd/system/simple-server.service

Its content is as follows:

Why does the parent process in Linux need to know the cause of death of the child process?

simple-server is an extremely simple print hello world service I wrote:

Why does the parent process in Linux need to know the cause of death of the child process?

We enable this service in Ubuntu:

$ sudo systemctl enable simple-server

Created symlink

/etc/systemd/system/multi-user.target.wants/simple-server.service

→ /lib/systemd/system/simple-server.service.

Start this service on the spot:

$ sudo systemctl start simple-server

Next, we check the status and find that it is active:

Why does the parent process in Linux need to know the cause of death of the child process?

At this time, we can see the simple-server process in the system, which is a child process of the init process (PID 1) of the top-level systemd:

Why does the parent process in Linux need to know the cause of death of the child process?

At this time, we kill the simple-server process:

$ sudo killall simple-server

Check the status again:

Why does the parent process in Linux need to know the cause of death of the child process?

At this time, we see that systemd has detected that the process corresponding to simple-server has been killed by the TERM signal, and the service status is inactive.

We found that the simple-server process no longer exists:

Why does the parent process in Linux need to know the cause of death of the child process?

pidof has nothing! ! !

pidof has nothing! ! !

pidof has nothing! ! !

Didn't you just say that after init detects that the service is dead, it "can" automatically restart the service? For example, init restarts httpd? So, now that I killed simple-server, why didn't systemd restart it automatically?

Note that I said "can", not "must".

Adapt to local conditions

In fact, in systemd, whether a service should be restarted after death, and under what circumstances can be customized by the user.

We can specify in the Restart field in the [Service] of the .service file under what circumstances we should restart the dead child process. For example, we can add a line in the .service file:

Why does the parent process in Linux need to know the cause of death of the child process?

The Restart=always in line 6 actually means that no matter what reason the simple-server dies, it will be restarted unconditionally.

There is a table in the systemd documentation,

Why does the parent process in Linux need to know the cause of death of the child process?

It explains in detail whether systemd should restart the service when Restart is set to no, always, on-success, on-failure, etc. So systemd actually distinguishes 5 different reasons, which can be read further:

Why does the parent process in Linux need to know the cause of death of the child process?

Whether the service is restarted

If set to no (the default), the service will not be restarted. If set to on-success, it will be restarted only when the service process exits cleanly. In this context, a clean exit means an exit code of 0, or one of the signals SIGHUP, SIGINT, SIGTERM or SIGPIPE, and additionally, exit statuses and signals specified in SuccessExitStatus=. If set to on-failure, the service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered. If set to on-abnormal, the service will be restarted when the process is terminated by a signal (including on core dump, excluding the aforementioned four signals), when an operation times out, or when the watchdog timeout is triggered. If set to on-abort, the service will be restarted only if the service process exits due to an uncaught signal not specified as a clean exit status. If set to on-watchdog, the service will be restarted only if the watchdog timeout for the service expires. If set to always, the service will be restarted regardless of whether it exited cleanly or not, got terminated abnormally by a signal, or hit a timeout.

As a parent process, systemd can decide further countermeasures based on the cause of death of the child process. For example, if we set it to on-failure, it means that when the process dies abnormally (for example, the exit code is not 0, and it is killed by a signal that causes coredump such as segment fault), we will restart it.

This can be tailored according to the characteristics of the real service. For example, for the oneshot service (a service that only needs to be run once at boot, such as a process of booting up a certain setting, completing a file system check, and automatically exiting after completion). In this way, we cannot execute:

Restart=always

or

Restart=on-success

Because, since this oneshot service has been successfully executed, we don't need to start it again.

Why does the parent process in Linux need to know the cause of death of the child process?

Guess you like

Origin blog.csdn.net/qq_40989769/article/details/110819782