Linux C system programming (07) process management process control

1 Process identifier

1.1 Actual users / user groups; effective users / user groups

In the process of linux / Unix, multiple user IDs and user group IDs are involved, including:

Actual user ID and actual user group ID: Identify who I am (It is said that this is a perverted philosophical problem, it is difficult to kill a philosopher). That is, the uid and gid of the logged-in user. For example, my Linux is logged in with taskiller. The actual user ID of all commands running in Linux is the uid of taskiller, and the actual user group ID is the gid of taskiller (you can use the id command to view) .
Effective user ID and effective user group ID: processes are used to determine our access to resources. In general, the effective user ID is equal to the actual user ID, and the effective user group ID is equal to the actual user group ID. When the setting-user-ID (SUID) bit is set, the effective user ID is equal to the uid of the file owner, not the actual user ID; similarly, if the setting-user group-ID (SGID) bit is set, the effective user group The ID is equal to the gid of the file owner, not the actual user group ID.

1.2 Process ID

The basic attributes of a process are similar to everyone's ID number. According to the process ID, a process can be accurately determined, and multiple process identifiers can correspond to a program.

1.3 Important ID value in the process

Each process has 6 important ID values, namely process ID, parent process ID, effective user ID, effective group ID, actual user ID, actual user group ID. These 6 IDs are stored in the data structure of the kernel. Only sometimes users need these IDs. Under linux, use the getpid and getppid functions to get the process ID and parent process ID of the process. The function prototype is as follows:

#include <sys/types.h>
#include <unistd.h>
pid_t getpid(void);
pid_t getppid(void);

See the linux function reference manual for details . Under linux, use getuid and geteuid functions to get the actual user and effective user of the process. The function prototype is as follows:

#include <unistd.h>
#include <sys/types.h>
uid_t getuid(void);
uid_t geteuid(void);

See the linux function reference manual for details . Under Linux, use the getgid and getegid functions to get the actual user group ID and effective user group ID of the process. The prototype of the function is as follows:

#include <unistd.h>
#include <sys/types.h>
gid_t getgid(void);
gid_t getegid(void);

See the linux function reference manual for details. note:

The two identifiers of process ID and parent process ID cannot be changed, and the other 4 IDs can be changed under appropriate circumstances.
For a general process, the actual user ID and the effective user ID are the same, only different in some special occasions.

2 Process operation

2.1 The fork function creates a process

Process is the basic execution unit in the system. The Linux system allows any user process to create a child process. After creation, the child process exists in the system and is independent of the parent process. The child process can accept system scheduling and can allocate system resources. The system can also detect its existence and give it the same power as the parent process. (Note: Under Linux, all processes are created by other processes except process No. 0)
Under Linux, use the fork function to create a new process. The prototype of the fork function:

#include <unistd.h>
pid_t fork(void);
函数执行成功有两个返回值；为0，表示子进程；为正数，表示父进程。失败则返回-1。

See the linux function reference manual for details. note:

Under normal circumstances, when the program is running, the parent process or the child process cannot be guaranteed to run first, and additional operations must be performed to ensure that the process is running.
For the fork function, the parent process and the child process share the code segment, but other resources such as the data segment and the stack segment are completely copied from the parent process.
When the child process inherits the parent process, the file lock, unprocessed alarm signal and pending signal will not be inherited.
The current Linux kernel often implements the fork function as the child process copies resources before the parent process. When the child process modifies these contents, the copy will occur, and the kernel will allocate process space to the child process to copy the content in the parent process , Continue to the next operation. This is actually an important manifestation of the write-time operation.

Fork function error situation:

The number of processes in the system exceeds the limit specified by the system.
There are too many user processes calling the fork function.

2.2 vfork creates a process

Linux provides a function vfork function similar to the function of the fork function, the difference between them is:

Fork is the data segment and code segment of the child process copying the parent process; vfork is the data segment shared by the child process and the parent process
Fork is the execution order of the parent and child processes is uncertain; vfork guarantees that the child process runs first. Before calling exec or exit, the data is shared with the parent process. The parent process may be scheduled to run after it calls exec or exit.
Vfork is to ensure that the child process runs first, and the parent process may be scheduled to run after she calls exec or exit. If the child process depends on further actions of the parent process before calling these two functions, it will cause a deadlock.

Prototype of vfork function:

#include <sys/types.h>
#include <unistd.h>
pid_t vfork(void);
函数执行成功有两个返回值；为0，表示子进程；为正数，表示父进程。失败则返回-1。

See the linux function reference manual for details . Note: For vfork functions, generally do not call in functions other than main. The reason is that the child process runs before the parent process, and the stack frame is overwritten. Finally, a segfault occurs when the parent process operates. That is, the influence of the child process on the parent process is huge.

2.3 Exit a process

Exiting a process under linux generally uses the exit function. The prototype of the exit function:

#include <unistd.h>
void exit(int status);
参数status：表示的是进程退出的状态，这个状态值是一个整型。在shell中可以检查到退出的状态。正常退出，exit中的参数为0，异常退出为非0。

See the linux function reference manual for details . You can use the errno variable as a parameter to pass to the exit function, so that you can check the reason for the program exit after the program exits. That is, the cause of the error can be determined in the shell.

The exit function actually encapsulates the _exit function called by the Linux system. The main difference between the two is that the exit function will do some aftercare work in user space, such as synchronizing the content to the disk, clear the user buffer, etc., and then enter the kernel to release The address space of the user process. The _exit function directly enters the kernel to release the user's address space, and all user space buffer contents will be lost.

2.4 Set Process Owner

Each process has two user IDs, the actual user ID and the effective user ID. Use setuid under linux to change the actual user ID and effective user ID of a process. Prototype of setuid function:

#include <sys/types.h>
#include <unistd.h>
int setuid(uid_t uid);
参数uid：改变后的新用户ID；函数执行成功返回0，失败返回-1。

See the linux function reference manual for details . Only two kinds of users can modify the actual user ID and effective user ID of the process: follow the user and the user equal to the actual user ID of the process.
The general situation is that a process needs to have a certain authority, set the effective user ID to a user ID with such authority. When the process does not need such authority, the process restores its effective user ID to restore its authority. For the function seteuid (uid_t euid), only the effective user ID is changed. The same series of functions also have setgid and setegid functions, which are similar to setuid and seteuid functions, respectively, except that the group ID is affected.

2.5 Debugging multiple processes

There are two ways to debug multi-processes with gdb:
@ 1 Set the tracking stream: The setting method is as follows:
set follow-fork-mode [parent | child]
Select one process to track, the other process is not affected. Then set a breakpoint in the subprocess code.
If you want to disconnect the test of a process after the fork function, use the command:

    set detach-on-fork [on,off]
    #若选中on，则断开调试follow-fork-mode指定的进程。
    #若选中off，gdb将控制父进程和子进程

@ 2 Use the attach command: The attach command in the gdb debugger can debug an already running program. After the process calls the fork function, you can use the attach command to debug the subprocess. The premise is to know the process ID of the child process, and the child process can wait for the start of debugging, so add auxiliary code when using the attach command.

3 Execution procedures

3.1 exec family functions

Use the exec function to execute a new program in the Linux environment. This function searches the file system for the file at the specified path and copies the file content to the address space of the exec function to replace the content of the original process. The process still maintains the parent process ’s content. The content of the process space, except that the code segment and data segment of the process have been replaced. (Note: The exec function does not create a new process. Although the process content has changed, the process ID has not changed and it is still a process.)
There are six exec family functions:

The suffix l indicates list, indicating that the command line parameters of the execution program are provided in a list and end with NULL, but the number of parameters is not limited. Receive a comma-separated list of parameters.
The vector is suffixed with v, indicating that the command line parameters of the execution program are provided in the form of a two-dimensional array. Receives a pointer to an array of strings ending in NULL.
The suffix e represents environment, which represents the list of environment variables passed to the new program. This list is a two-dimensional array, and each row is an environment variable.
The exec function ending with p indicates that the first parameter is not a complete path name, but a program name. This requires the PATH environment variable and this parameter to be combined into a complete path.

The prototype of the exec family function is as follows:

#include <unistd.h>
extern char **environ;
int execl(const char *path, const char *arg, ...);
int execlp(const char *file, const char *arg, ...);
int execle(const char *path, const char *arg, ..., char * const envp[]);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);
int execve(const char *filename, char *const argv[], char *const envp[]);

See the linux function reference manual for details .

3.2 Execute shell commands in the program
Use the system function to call shell commands under Linux. The prototype of the system function is as follows:

#include <stdlib.h>
int system(const char *command);

See the linux function reference manual for details . The parameter command is the command to be executed. The return value of the function is more complicated. In fact, the system function encapsulates the three system calls fork, exec, and waitpid. The return value should also be discussed according to the situation of these system calls:

If the fork and waitpid functions fail to execute, the system function returns -1.
If the exec function fails to execute, the function returns that the file is not executable.
If all three functions are executed successfully, the system function returns to the termination state of the execution program.
If the value of the parameter command is NULL, the system function returns 1, in fact this can be used to test whether the system supports the system function.

The use of the system function requires a careful look at the requirements analysis. In general, the system has the following advantages:

The system function adds error handling operations.
The system function adds signal processing operations
The system function calls the wait function to ensure that there are no zombie processes.

4 Relational operations

For the child process, the state at the time of its exit can be obtained by the parent process. The operation of getting the process launch information is called a relational operation. The Linux kernel saves a certain amount of information for each terminated child process, including process ID, process termination status, and process statistics. This information is obtained by the parent process and processed accordingly

4.1 Two functions waiting for the process to exit

Use the wait function and waitpid function in Linux to get some statistics of the child process. The prototype of wait and waitpid functions:

#include <sys/types.h>
#include <sys/wait.h>
pid_t wait(int *status);
pid_t waitpid(pid_t pid, int *status, int options);

See the linux function reference manual for details .
Compared with waitpid and wait function, there are the following three points:

The waitpid function can specify a child process.
The waitpid function can wait for a process without blocking.
The waitpid function supports job control.

4.2 Zombie process

When the child process exits, the exit status information of the process is stored in the kernel. At this time, the parent process does not call the wait function to process. The process ID of the child process is also saved in the system process list. The process at this time is called Zombie process. The zombie process is a great threat to the system. It takes up system resources but does nothing. Create a zombie process is to call the fork function first, the parent process does not need the wait function. The zombie process is indicated as Z when viewed.

Ways to solve the zombie process:

The parent process waits for the child process to end through functions such as wait and waitpid, which causes the parent process to hang.
If the parent process is busy, you can use the signal function to install a handler for SIGCHLD, because after the child process ends, the parent process will receive the signal, and you can call wait in the handler to recycle.
If the parent process does not care when the child process will end, you can use signal (SIGCHLD, SIG_IGN) to notify the kernel that you are not interested in the end of the child process, then after the child process ends, the kernel will recycle and no longer send signals to the parent process .
Fork twice, the parent process forks a child process, and then continue to work, the child process forks a grand process and then quits, then the grand process is taken over by init, and after the grand process ends, init will be recycled. However, the recycling of the child process must be done by yourself.

4.3 Output process statistics

The wait3 and wait4 functions are basically equivalent to the wait function and waitpid function, the difference is that the wait3 and wait4 functions can also get more detailed information. Most of this information is about the kernel, which is returned to the user through the structure. The prototypes of wait3 and wait4 functions are as follows:

#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/wait.h>
pid_t wait3(int *status, int options,struct rusage *rusage);
pid_t wait4(pid_t pid, int *status, int options, struct rusage *rusage);

See the linux function reference manual for details . To get more detailed information, you need to use the value result parameter, the corresponding structure is rusage.

AGS-wangdsh

Published 289 original articles · praised 47 · 30,000+ views

Private letter concerns

Linux C system programming (07) process management process control

Guess you like