White-haired man to black-haired man
A common common sense is that in Linux, “white-haired people send black-haired people”. The child process dies. The parent process waits for the child process to die through wait() and cleans up the child process zombies. Of course, the parent process can also be caused by this. Get the cause of death of the child process.
Zi once said: "Talk is cheap. Show me the code", let's take a look at the actual code:
In the above code, the child process waits for the signal through pause() on line 18, and the parent process waits for the end of the child process through waitpid() on line 22 of the code. The parameter status is an output parameter that can get the cause of the death of the child process.
For example, we now run the above program:
./a.out
child process id: 3320
Then use signal 2 to kill the child process 3320:
kill -2 3320
The parent process waitpid() returns, and the reason is learned in the status, and the parent process prints:
child process is killed by signal 2
If we delete the pause() in the child process and directly exit _exit(1) instead:
After the parent process detects the death of the child process, it can print its exit status:
$ ./a.out
child process id: 3362
child process exits, status=1
It can be seen that the parent process is well aware of the death and cause of death of the child process.
This can also be seen from the kernel source code:
In wait_task_zombie(), the parent process obtains the exit_code combination of the child process through the zombie analysis of the child process, and further assembles the status.
[Article benefits] C/C++ Linux server architect learning materials plus group 812855908 (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)
Everything happened for a reason
So, why does the parent process have to know the death of the child process? Why must the parent process struggle to know the cause of death of the child process?
The previous question is easy to answer. If we use the init process to start an httpd service for customers to visit our website, then the httpd process hangs in the middle of the night. First of all, as the company's network manager, he can't know that httpd is dead; secondly, if he knows that httpd is dead, he can't drive in the middle of the night to re-enter the httpd command. Therefore, this process should be automatically completed by a certain mechanism of Linux. For example, if init knows that httpd is dead, it can automatically restart an httpd process internally.
The latter question is a bit more complicated, we have to combine an actual init project example to answer. Here we take systemd as an example. systemd is the init project used by the current mainstream Linux hair styles. For example, my Ubuntu 18.10 is:
$ ls -l /sbin/init
… /sbin/init -> /lib/systemd/systemd
/sbin/init is a symbolic link to systemd.
In systemd, if we want to add a background service that starts at boot, we can add a service file in the /lib/systemd/system/ directory. For example, here I added a very simple service file:
/lib/systemd/system/simple-server.service
Its content is as follows:
simple-server is an extremely simple print hello world service I wrote:
We enable this service in Ubuntu:
$ sudo systemctl enable simple-server
Created symlink
/etc/systemd/system/multi-user.target.wants/simple-server.service
→ /lib/systemd/system/simple-server.service.
Start this service on the spot:
$ sudo systemctl start simple-server
Next, we check the status and find that it is active:
At this time, we can see the simple-server process in the system, which is a child process of the init process (PID 1) of the top-level systemd:
At this time, we kill the simple-server process:
$ sudo killall simple-server
Check the status again:
At this time, we see that systemd has detected that the process corresponding to simple-server has been killed by the TERM signal, and the service status is inactive.
We found that the simple-server process no longer exists:
pidof has nothing! ! !
pidof has nothing! ! !
pidof has nothing! ! !
Didn't you just say that after init detects that the service is dead, it "can" automatically restart the service? For example, init restarts httpd? So, now that I killed simple-server, why didn't systemd restart it automatically?
Note that I said "can", not "must".
Adapt to local conditions
In fact, in systemd, whether a service should be restarted after death, and under what circumstances can be customized by the user.
We can specify in the Restart field in the [Service] of the .service file under what circumstances we should restart the dead child process. For example, we can add a line in the .service file:
The Restart=always in line 6 actually means that no matter what reason the simple-server dies, it will be restarted unconditionally.
There is a table in the systemd documentation,
It explains in detail whether systemd should restart the service when Restart is set to no, always, on-success, on-failure, etc. So systemd actually distinguishes 5 different reasons, which can be read further:
Whether the service is restarted
If set to no (the default), the service will not be restarted. If set to on-success, it will be restarted only when the service process exits cleanly. In this context, a clean exit means an exit code of 0, or one of the signals SIGHUP, SIGINT, SIGTERM or SIGPIPE, and additionally, exit statuses and signals specified in SuccessExitStatus=. If set to on-failure, the service will be restarted when the process exits with a non-zero exit code, is terminated by a signal (including on core dump, but excluding the aforementioned four signals), when an operation (such as service reload) times out, and when the configured watchdog timeout is triggered. If set to on-abnormal, the service will be restarted when the process is terminated by a signal (including on core dump, excluding the aforementioned four signals), when an operation times out, or when the watchdog timeout is triggered. If set to on-abort, the service will be restarted only if the service process exits due to an uncaught signal not specified as a clean exit status. If set to on-watchdog, the service will be restarted only if the watchdog timeout for the service expires. If set to always, the service will be restarted regardless of whether it exited cleanly or not, got terminated abnormally by a signal, or hit a timeout.
As a parent process, systemd can decide further countermeasures based on the cause of death of the child process. For example, if we set it to on-failure, it means that when the process dies abnormally (for example, the exit code is not 0, and it is killed by a signal that causes coredump such as segment fault), we will restart it.
This can be tailored according to the characteristics of the real service. For example, for the oneshot service (a service that only needs to be run once at boot, such as a process of booting up a certain setting, completing a file system check, and automatically exiting after completion). In this way, we cannot execute:
Restart=always
or
Restart=on-success
Because, since this oneshot service has been successfully executed, we don't need to start it again.