Linux System Programming (3): Process

References

1. Process related concepts

1.1 Procedures and processes

  • A program refers to a compiled binary file that is on disk and does not occupy system resources (CPU, memory, open files, devices, locks...)
    • program → script(paper)
  • Processes are closely linked to operating system principles. A process is an active program that occupies system resources and is executed in memory (when the program is run, a process is generated)
    • Process → Play (stage, actors, lighting, props...)

The same script (program) can be performed on multiple stages (processes) at the same time. Similarly, the same program can also be loaded into different processes (without affecting each other). For example: open two terminals at the same time, each has a bash but the ID is different.

1.2 Concurrency

  • Concurrency, in the operating system, multiple processes are in a state between starting and running to completion in a period of time. But there is still only one process running at any point in time.
    • For example, nowadays, when we use computers, we can listen to music, chat and surf the Internet. If they are generally regarded as one process, why can they run at the same time? Because of concurrency.
  • Time-sharing multiplexing CPU
    • 1-4 indicates the order in which different processes obtain the CPU. It is not allocated to a single process at a time (guaranteed by using clock interrupts), but each process is allocated (buffered) a little at a time.

Insert image description here

1.3 Single-channel and multi-channel programming

1.3.1 Single-channel programming
  • All processes are executed one by one. If A blocks, B can only wait, even if the CPU is idle . The occurrence of blocking is inevitable during human-computer interaction. All such models are extremely unreasonable in terms of system resource utilization. Most of them were eliminated not long after their existence in the history of computer development.
    • For example: Microsoft's DOS system
1.3.2 Multiprogramming
  • Several independent programs are stored in the computer memory at the same time. They run interleavedly under the control of the management program. Multiprogramming must have a hardware foundation as a guarantee.
  • Clock interrupt : This is the theoretical basis of the multiprogramming model. When concurrency occurs, no process wants to give up the CPU during execution. Therefore, the system needs a means to force the process to give up CPU resources . Clock interrupts are guaranteed by hardware foundation and are irresistible to the process. The interrupt handling function in the operating system is responsible for scheduling program execution
  • In the multiprogramming model, multiple processes take turns using the CPU (time-sharing multiplexing of CPU resources). Today 's common CPUs are at the nanosecond level and can execute approximately 1 billion instructions in 1 second. Since the reaction speed of the human eye is on the millisecond level, it seems to be running at the same time.

1.4 CPU sum MMU

  • storage media
    • The further down the pyramid, the greater the amount of storage, but the slower the storage speed.
    • Hard disk reading is a physical operation, and memory reading is an electrical signal, so the memory reading speed is much faster than the disk
    • cache cache is an intermediate product between memory and registers
    • Register storage size is 4 bytes (32-bit operating system)

Insert image description here

  • MMU virtual memory mapping unit
    • Virtual memory and physical memory mapping relationship

Insert image description here

Insert image description here

1.5 Process control block PCB

  • Each process has a process control block in the kernel to maintain process-related information. The process control block of the Linux kernel is the task_struct structure.
  • You can view the struct task_struct {} structure definition in the /usr/src/linux-headers-5.4.0-152-generic/include/linux/sched.h file. There are many internal members. Just focus on the following parts.
    • process id
      • Each process in the system has a unique id, which is represented by the pid_t type in C language and is a non-negative integer.
      • View all: ps aux or view specific: ps aux | xxx
    • process status
    • Some CPU registers that need to be saved and restored when switching processes
    • Information describing the virtual address space
    • Information describing the control terminal
    • current working directory
    • umask mask
    • file descriptor table
      • Contains many pointers to file structures
    • Information related to signals
    • user id and group id
    • Sessions and process groups
    • The upper limit of resources that a process can use

1.6 Process status

  • There are 5 basic states of a process
    • They are initial state, ready state, running state, suspend state and termination state respectively.
    • The initial state is the process preparation stage, which is often combined with the ready state.

Insert image description here

1.7 Environment variables

  • Environment variables refer to some parameters used in the operating system to specify the operating environment of the operating system. Usually have the following characteristics
    • String(essence)
    • There is a unified format: name = value [: value]
    • Value used to describe process environment information
    • Storage format: similar to command line parameters. char *[] array, array name environ, internal storage string, NULL as sentinel end
    • Usage form: similar to command line parameters
    • Loading position: located in the user area, higher than the starting position of the stack
    • Introducing the environment variable table: environment variables must be declared. extern char ** environ
Common environment variables
  • Environment variable strings are in the form of name = value . Most names are composed of uppercase letters and underscores. Generally, the name part is called an environment variable, and the value part is the value of the environment variable. Environment variables define the running environment of the process . The meanings of some of the more important environment variables are as follows:

  • PATH

    • Search path for executable files. The ls command is also a program. To execute it, you do not need to provide the complete path name /bin/ls. However, usually to execute the program a.out in the current directory, you need to provide the complete path name ./a.out. This is because of the PATH environment variable . The value contains the directory /bin where the ls command is located, but does not include the directory where a.out is located.
    • The value of the PATH environment variable can contain multiple directories, separated by:
    • You can view the value of this environment variable using the echo command in the Shell.
    $ echo $PATH
    /opt/ros/melodic/bin:/opt/gcc-arm-none-eabi-9-2020-q2-update/bin:/home/yue/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
    
  • SHELL

    • The current shell, its value is usually /bin/bash
  • TERM

    • The current terminal type. In a graphical interface terminal, its value is usually xterm. The terminal type determines the output display mode of some programs. For example, a graphical interface terminal can display Chinese characters, but a character terminal generally cannot.
  • LANG

    • Language and locale determine the character encoding and display format of time, currency and other information.
  • HOME

    • The path to the current user's home directory. Many programs need to save configuration files in the home directory, so that each user has his own set of configurations when running the program.

2. Process environment

2.1 main function

  • main function prototype
    • argc is the number of command line parameters, argv is an array of pointers to parameters
    int main(in argc, char* argv[]);
    
  • When the kernel executes a C program (using an exec function), it calls a special startup routine before calling main
    • The executable program file specifies this startup routine as the program's starting address: this is set by the linkage editor, which is called by the C compiler
    • The startup routine obtains command line arguments and environment variable values ​​from the kernel, and then makes arrangements for calling the main function as described above

2.2 Process termination

  • There are 8 ways to terminate a process
    • Five of them are normal terminations , they are
      • (1) Return from main
      • (2) Call exit
      • (3) Call _exit or _Exit
      • (4) The last thread returns from its startup routine
      • (5) Call pthread_exit from the last thread
    • There are three ways of abnormal termination , they are
      • (6) Call abort
      • (7) Receive a signal
      • (8) The last thread responds to the cancellation request
2.2.1 Exit function
  • 3 functions for gracefully terminating a program

    • _exit and _Exit immediately enter the kernel
    • exit first performs some cleanup processing and then returns to the kernel
    #include <stdlib.h>
    void exit(int status);
    void _Exit(int status);
    
    #include <unistd.h>
    void _exit(int status);
    
  • The three exit functions all take an integer parameter, called the termination status (or exit status). Most UNIX system shells provide a way to check the termination status of a process

    • If (a) these functions are called without a termination status, or (b) main executes a return statement without a return value, or © main does not declare a return type of integer, the termination status of the process is undefined .
    • However, if the return type of main is an integer and main returns when executing the last statement (implicit return), then the process termination status is 0
  • The main function returning an integer value is equivalent to calling exit with that value.

    // 下两行等价
    exit(0);
    return(0);
    
  • Compile a program, run it, and print the termination status

    $ gcc hello.c
    $ ./a.out
    hello world
    $ echo $?    # 打印终止状态
    0
    
2.2.2 Function atexit
#include <stdlib.h>

int atexit(void (*function)(void));
  • function return value

    • If successful, return 0
    • If an error occurs, return non-zero
  • According to ISO C, a process can register up to 32 functions, which will be automatically called by exit. Call these functions termination handlers and call the atexit function to register them

  • The parameter of atexit is a function address. When using this function, you do not need to pass any parameters to it, and you do not expect it to return a value . exit calls these functions in the reverse order in which they were registered. If the same function is registered multiple times, it will also be called multiple times.

  • How a C program is started and terminated

Insert image description here

Note that the only way for the kernel to cause a program to execute is to call an exec function . The only way for a process to terminate voluntarily is to call _exit or _Exit, either explicitly or implicitly (by calling exit). A process can also be terminated involuntarily by a signal

2.3 Command line parameters

  • When a program is executed, the process calling exec can pass command line arguments to the new program
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <pthread.h>
    #include <sys/stat.h>
    #include <dirent.h>
    
    int main(int argc, char* argv[]) {
          
          
        int i;
    
        for (i = 0; argv[i] != NULL; ++i) {
          
          
            printf("argv[%d] : %s\n", i, argv[i]);
        }
    
        exit(0);
    }
    
    $ gcc exec.c -o exec
    $ ./exec arg1 TEST foo
    argv[0] : ./exec
    argv[1] : arg1
    argv[2] : TEST
    argv[3] : foo
    

2.4 Environment table

  • Each program receives an environment table. Like the parameter list, the environment table is an array of character pointers , each of which contains the address of a null-terminated C character. The global variable environ contains the address of the pointer array:
    extern char **environ;
    
  • For example, if the environment contained 5 characters, it would look like the following image
    • There is an explicit NULL byte at the end of each character
    • Call environ the environment pointer, the pointer array is the environment table, and the string pointed to by each pointer is the environment string.

Insert image description here

2.5 Storage space layout of C program

  • Historically, C programs have always been composed of the following parts:
    • body paragraph
      • This is the part of the machine instructions that is executed by the CPU
    • initialize data segment
      • This segment is usually called the data segment, which contains variables in the program that need to be explicitly assigned initial values.
    • Uninitialized data segment
      • This segment is usually called the bss segment, which means "block started by symbol". Before the program starts executing, the kernel initializes the data in this segment to 0 or a null pointer.
    • stack
      • Automatic variables and information that need to be saved each time the function is called are stored in this section
      • Each time a function is called, its return address and the caller's environment information (such as the values ​​of some machine registers) are stored on the stack.
      • The most recently called function allocates storage space on the stack for its automatic and temporary variables.
    • heap
      • Dynamic storage allocation is usually done in the heap
      • The heap is located between the uninitialized data segment and the stack

Insert image description here

  • The size command reports the length (in bytes) of the text segment, data segment, and bss segment.
    $ size /usr/bin/cc /bin/sh
       text	   data	    bss	    dec	    hex	filename
    1025621	  15120	  10600	1051341	 100acd	/usr/bin/cc
     110609	   4816	  11312	 126737	  1ef11	/bin/sh
    

2.6 Shared libraries

  • On different systems, programs may use different methods to indicate whether to use shared libraries
    • Typical options include the cc and ld commands.
$ gcc -static hello.c    # 阻止 gcc 使用共享库
$ gcc hello.c            # gcc 默认使用共享库

2.7 Storage space allocation

  • ISO C specifies 3 functions for dynamic allocation of storage space

    • (1) malloc, allocates a storage area with a specified number of bytes. The initial value in this bucket is undefined
    • (2) calloc, allocates storage space for a specified number of objects of a specified length. Every bit in this space is initialized to 0
    • (3) realloc, increase or decrease the length of the previously allocated area
      • When increasing the length, the contents of the previously allocated area may need to be moved to another area large enough to provide the increased storage area at the end, and the initial value in the new area is uncertain
    #include <stdlib.h>
    
    void *malloc(size_t size);
    void *calloc(size_t nmemb, size_t size);
    void *realloc(void *ptr, size_t size);
    
    void free(void *ptr);
    
  • function return value

    • If successful, return a non-null pointer
    • If an error occurs, NULL is returned
  • The function free releases the storage space pointed to by ptr. The freed space is usually sent to the available storage pool, and can be re-allocated later when the above three allocation functions are called.

  • Possible fatal errors

    • Free an already freed block
    • The pointer used when calling free is not the return value of the three alloc functions.

    If a process calls the malloc function but forgets to call the free function, the storage space occupied by the process will continue to increase, which is called a leakage . If the free function is not called to release the no longer used space, the length of the process address space will slowly increase until there is no more free space. At this time, due to excessive paging overhead, performance will decrease.

2.8 Environment variables

2.8.1 Get the environment variable value getenv
#include <stdlib.h>

char* getenv(const char* name);
  • function return value
    • pointer to the value associated with name
    • If not found, return NULL
  • Note that this function returns a pointer to the value in the name = value string
    • You should use getenv to get the value of a specified environment variable from the environment instead of directly accessing environ
2.8.2 Set environment variable value setenv
#include <stdlib.h>

int putenv(char *string);
  • function return value
    • If successful, return 0
    • If an error occurs, return non-zero
  • putenv takes a string of the form name = value and puts it into the environment table. If name already exists, delete its original definition first
#include <stdlib.h>

int setenv(const char *name, const char *value, int overwrite);
int unsetenv(const char *name);
  • function return value

    • If successful, return 0
    • If an error occurs, -1 is returned
  • setenv sets name to value. If name already exists in the environment, then

    • (a) If overwrite is non-zero, delete its existing definition first.
    • (b) If overwrite is 0, its existing definition will not be deleted (name will not be set to a new value, and no error will occur)
  • unsetenv deletes the definition of name. Even if there is no such definition, it is not wrong

3. Process control

3.1 Process identification

  • Each process has a unique process ID represented by a non-negative integer

    • Because process ID identifiers are always unique, they are often used as part of other identifiers to ensure their uniqueness.
  • Although the process ID is unique, the process ID is reusable

    • When a process terminates, its process ID becomes a candidate for reuse. Most UNIX systems implement a delayed reuse algorithm such that newly created processes are given a different ID than the ID used by the most recently terminated process. This prevents the new process from being mistaken for some terminated previous process using the same ID
  • There are some dedicated processes in the system, but the details vary by implementation

    • The process with ID 0 is usually the scheduling process and is often called the swapper process . This process is part of the kernel and does not execute any programs on the disk, so it is also called a system process.
    • The process with ID 1 is usually the init process , called by the kernel at the end of the boot process
      • init can become the parent process of all orphan processes
    • Each UNIX system implementation has its own set of kernel processes that provide operating system services. For example, in some UNIX virtual memory implementations, process ID 2 is the page daemon . This process is responsible for supporting virtual memory. System paging operations
  • In addition to the process ID, each process has some other identifiers, and the following functions return these identifiers

    #include <unistd.h>
    
    pid_t getpid(void);     // 返回值:调用进程的进程 ID
    pid_t getppid(void);    // 返回值:调用进程的父进程 ID
    
    uid_t getuid(void);     // 返回值:调用进程的实际用户 ID
    uid_t geteuid(void);    // 返回值:调用进程的有效用户 ID
    
    gid_t getgid(void);     // 返回值:调用进程的实际组 ID
    gid_t getegid(void);    // 返回值:调用进程的有效组 ID
    

3.2 Function fork

#include <unistd.h>

pid_t fork(void);
  • An existing process can call the fork function to create a new process (called a child process)
    • fork function is called once but returns twice
      • The difference between the two returns is: the return value of the child process is 0, while the return value of the parent process is the process ID of the newly created child process.
    • The reason for returning the child process ID to the parent process
      • Because a process can have multiple child processes, and there is no function that allows a process to obtain the process IDs of all its child processes.
    • The reason why fork causes the child process to get a return value of 0
      • A process will only have one parent process, so a child process can always call getppid to obtain the process ID of its parent process (process ID 0 is always used by the kernel swap process, so the process ID of a child process cannot be 0 )
    • The child process and the parent process continue to execute the instructions after the fork call. The child process is a copy of the parent process.
      • For example, the child process gets a copy of the parent process's data space, heap, and stack. Note that this is the copy owned by the child process. The parent process and the child process do not share these portions of storage space
  • function return value
    • The child process returns 0, and the parent process returns the child process ID.
    • If an error occurs, -1 is returned

Insert image description here

  • There are two common situations in processing file descriptors after fork:

    • (1) The parent process waits for the child process to complete . In this case, the parent process does not need to do anything with its descriptor. When the child process terminates, the file offset of any shared descriptor it has read or written has been updated accordingly.
    • (2) The parent process and the child process each execute different program segments . In this case, after the fork, the parent process and the child process each close the file descriptors they do not need to use, so that they do not interfere with the file descriptors used by the other party. This method is often used by network service processes
  • Comparison between parent process and child process

    • difference
      • The return value of fork is different
      • Process ID is different
      • The parent process ID of these two processes is different
        • The parent process ID of the child process is the ID of the process that created it, while the parent process ID of the parent process remains unchanged.
      • The child process's tms_utime, tms_stime, tms cutime, and tms_ustime values ​​are set to 0
      • Child process does not inherit file locks set by parent process
      • The unhandled alarm of the child process is cleared
      • The child process's unhandled signal set is set to the empty set
    • Similar points (just after forking)
      • data segment, text segment
      • Heap, stack
      • environment variables, global variables
      • Host directory location, process working directory location
      • Signal processing method
  • Two main reasons why forks fail

    • There are already too many processes in the system (usually means there is a problem somewhere)
    • The total number of processes for this real user ID exceeds the system limit
  • fork has the following two uses:

    • A parent process wants to duplicate itself so that the parent process and the child process execute different code segments at the same time
      • This is common in network service processes: the parent process waits for service requests from clients. When such a request arrives, the parent process calls fork to allow the child process to handle the request, while the parent process continues to wait for the next service request.
    • A process wants to execute a different program
      • This is a common situation with shells. In this case, the child process calls exec immediately after returning from fork
Case 1
  • Changes made to a variable by the child process do not affect the value of the variable in the parent process
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>
#include <dirent.h>

int globvar = 6;
char buf[] = "a write to stdout\n";

int main(int argc, char* argv[]) {
    
    
    int var;
    pid_t pid;

    var = 88;
    if (write(STDOUT_FILENO, buf, sizeof(buf) - 1) != sizeof(buf) - 1) {
    
    
        perror("write error");
        exit(1);
    }
    printf("before fork\n");

    if ((pid = fork()) < 0) {
    
    
        perror("fork error");
        exit(1);
    } else if (pid == 0) {
    
    
        globvar++;
        var++;
    } else {
    
    
        sleep(2);  // 父进程使自己休眠 2s,以此使子进程先执行
    }

    printf("pid = %ld, glob = %d, var = %d\n", (long)getpid(), globvar, var);

    return 0;
}
$ gcc fork.c -o fork
$ ./fork
a write to stdout
before fork
pid = 2244, glob = 7, var = 89   # 子进程的变量值改变了
pid = 2243, glob = 6, var = 88   # 父进程的变量值没改变
  • Generally speaking, it is uncertain whether the parent process or the child process executes first after fork , depending on the scheduling algorithm used by the kernel. If parent and child processes are required to synchronize with each other, some form of inter-process communication is required
Case 2
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/stat.h>
#include <dirent.h>

int main(int argc, char* argv[]) {
    
    
    printf("before fork-1-\n");
    printf("before fork-2-\n");
    printf("before fork-3-\n");
    printf("before fork-4-\n");

    pid_t pid = fork();
    if (pid == -1) {
    
    
        perror("fork error");
        exit(1);
    } else if (pid == 0) {
    
    
        printf("---child is created, pid = %d, parent-pid : %d\n", getpid(), getppid());
    } else if (pid > 0) {
    
    
        sleep(1);  // 给父进程增加一个等待命令,这样能保证子进程完成时,父进程处于执行状态,子进程就不会成孤儿
        printf("---parent process : my child is %d, my pid : %d, my parent pid : %d\n", pid, getpid(), getppid());
    }

    printf("------end of file\n");

    return 0;
}
$ gcc fork2.c -o fork2
$ ./fork2
before fork-1-
before fork-2-
before fork-3-
before fork-4-
---child is created, pid = 2475, parent-pid : 2474
------end of file
---parent process : my child is 2475, my pid : 2474, my parent pid : 1887
------end of file

# 写的所有进程都是 bash 的子进程
$ ps aux | grep 1887
yue       1887  0.0  0.0  25124  6048 pts/0    Ss   08:41   0:00 bash
yue       2477  0.0  0.0  16180  1088 pts/0    S+   09:39   0:00 grep --color=auto 1887
Case 3
  • Create multiple child processes in a loop
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/stat.h>
#include <dirent.h>

int main(int argc, char* argv[]) {
    
    
    int i;
    pid_t pid;

    for (i = 0; i < 5; i++) {
    
    
        if (fork() == 0) {
    
    
            break;
        }
    }

    if (5 == i) {
    
    
        sleep(5);
        printf("I'm parent \n ");
    } else {
    
    
        sleep(i);
        printf("I'm %dth child\n", i + 1);
    }

    return 0;
}
$ gcc mulfork.c -o mulfork
$ ./mulfork
I'm 1th child
I'm 2th child
I'm 3th child
I'm 4th child
I'm 5th child
I'm parent
Case 4
  • Parent-child process sharing: sharing when reading, copying when writing (mainly for global variables)
    • Share two things: file descriptor and mmap mapping area
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int var = 100;            //.data 

int main(void) {
    
    
    pid_t pid;
    pid = fork();
    
    if(pid == -1){
    
    	// son
        perror("fork error");
        exit(1);
    } else if (pid > 0) {
    
    
        var = 288;
        printf("parent, var = %d\n", var);
        printf("I'm parent pid = %d, getppid = %d\n", getpid(), getppid());
    } else if (pid == 0) {
    
    
        var = 200;
        printf("I'm child pid = %d, ppid = %d\n", getpid(), getppid());
        printf("child, var = %d\n", var);
    }
    
    printf("------finish------\n");
    
    return 0;
}
$ gcc shared.c -o shared
$ ./shared
parent, var = 288
I'm parent pid = 2702, getppid = 1887
------finish------
I'm child pid = 2703, ppid = 2702
child, var = 200
------finish------
Case 5
  • Parent and child process gdb debugging
    • When using gdb to debug, gdb can only track one process. You can set the gdb debugging tool to track the parent process or track the child process through instructions before calling the fork function. The parent process is tracked by default.
    • The set follow-fork-mode child command sets gdb to follow the child process after fork.
    • set follow-fork-mode parent sets the tracking parent process

    Note that it must be set before the fork function is called to be effective.

$ gcc mulfork.c -o mulfork -g
$ gdb mulfork
(gdb) list
1	#include <stdio.h>
2	#include <stdlib.h>
3	#include <string.h>
4	#include <unistd.h>
5	#include <pthread.h>
6	#include <sys/stat.h>
7	#include <dirent.h>
8	
9	int main(int argc, char* argv[]) {
    
    	
10		int i;
(gdb) l
11		pid_t pid;
12	
13		for (i = 0; i < 5; i++) {
    
    
14			if (fork() == 0) {
    
    
15				break;
16			}
17		}
18	
19		if (5 == i) {
    
    
20			sleep(5);
(gdb) b 13
Breakpoint 1 at 0x6e9: file mulfork.c, line 13.
(gdb) r
Starting program: /home/yue/test/mulfork 

Breakpoint 1, main (argc=1, argv=0x7fffffffdbe8) at mulfork.c:13
13		for (i = 0; i < 5; i++) {
    
    
(gdb) n
14			if (fork() == 0) {
    
    
(gdb) set follow-fork-mode child 
(gdb) n
[New process 2831]
[Switching to process 2831]
main (argc=1, argv=0x7fffffffdbe8) at mulfork.c:15
15				break;
(gdb) I'm 2th child
I'm 3th child
I'm 4th child
I'm 5th child
I'm parent 
 n
19		if (5 == i) {
(gdb) n
23			sleep(i);
(gdb) n
24			printf("I'm %dth child\n", i + 1);
(gdb) n
I'm 1th child
27		return 0;
(gdb) 

3.3 Function exit

  • There are 8 ways to terminate a process

    • Five of them are normal terminations , they are
      • (1) Return from main
      • (2) Call exit
      • (3) Call _exit or _Exit
      • (4) The last thread returns from its startup routine
      • (5) Call pthread_exit from the last thread
    • There are three ways of abnormal termination , they are
      • (6) Call abort
      • (7) Receive a signal
      • (8) The last thread responds to the cancellation request
  • No matter how the process terminates, the same piece of code in the kernel will eventually be executed.

    • This code closes all open descriptors for the corresponding process, releases the memory used by it, etc.
  • For any of the above termination situations, it is expected that the terminating process can notify its parent process how it terminated.

    • For the 3 termination functions (exit, _exit and _Exit), the implementation method is to pass its exit status as a parameter to the function (return to the parent process)
    • In the event of an abnormal termination, the kernel (not the process itself) generates a termination status indicating the reason for its abnormal termination. In either case, the parent process of the terminated process can obtain its termination status using the wait or waitpid function.
  • Orphan process

    • If the parent process ends before the child process, the child process becomes an orphan process, and the parent process of the child process becomes the init process, which is called the init process adopting the orphan process.
  • Zombie process (zombie)

    • In UNIX terminology, a process has terminated but its parent process has not yet dealt with it (obtaining information about the terminated child process and releasing the resources it still occupies). The child process residual resources (PCB) are stored in the kernel .
    • The ps(1) command prints the status of the zombie process as Z
    • If you write a long-running program that forks many child processes, then unless the parent process waits to obtain the termination status of the child processes, these child processes will become zombie processes after termination.
  • Will a process adopted by the init process become a zombie process when it terminates?

    • Won't. Because init is written so that whenever a child process terminates, init will call a wait function to obtain its termination status. This also prevents the system from being filled with zombie processes.
Case 1: Orphan process
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char* argv[]) {
    
    
    pid_t pid;
    pid = fork();
    
    if (pid == 0) {
    
    
        while (1) {
    
    
            printf("I am child, my parent pid = %d\n", getppid());
            sleep(1);
        }
    } else if (pid > 0) {
    
    
        printf("I am parent, my pid is = %d\n", getpid());
        sleep(9);
        printf("------parent going to die------\n");
    } else {
    
    
        perror("fork");
        return 1;
    }
    
    return 0;
}
$ gcc orphan.c -o orphan
$ ./orphan
I am parent, my pid is = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
I am child, my parent pid = 4464
------parent going to die------
I am child, my parent pid = 1112
I am child, my parent pid = 1112
I am child, my parent pid = 1112
...
# 父进程死亡前
$ ps ajx
4231  4383  4383  4231 pts/0     4383 S+    1000   0:00 ./orphan
4383  4384  4383  4231 pts/0     4383 S+    1000   0:00 ./orphan

# 父进程死亡后
$ ps ajx
1112  4384  4383  4231 pts/0     4231 S     1000   0:00 ./orphan
Case 2: Zombie process
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char* argv[]) {
    
    
    pid_t pid;
    pid = fork();

    if (pid == 0) {
    
    
        printf("------child, my parent = %d, going to sleep 10s\n", getppid());
        sleep(10);
        printf("------child die------\n");
    } else if (pid > 0) {
    
    
        while (1) {
    
    
            printf("I am parent, pid = %d, myson = %d\n", getpid(), pid);
            sleep(1);
        }
    } else {
    
    
        perror("fork");
        return 1;
    }

    return 0;
}
$ gcc zoom.c -o zoom
$ ./zoom
I am parent, pid = 4660, myson = 4661
------child, my parent = 4660, going to sleep 10s
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
------child die------
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
I am parent, pid = 4660, myson = 4661
...
# 子进程死亡前
$ ps ajx
4505  4660  4660  4505 pts/3     4660 S+    1000   0:00 ./zoom
4660  4661  4660  4505 pts/3     4660 S+    1000   0:00 ./zoom

# 子进程死亡后
$ ps ajx
4505  4660  4660  4505 pts/3     4660 S+    1000   0:00 ./zoom
4660  4661  4660  4505 pts/3     4660 Z+    1000   0:00 [zoom] <defunct> # defunct 代表死亡

# 每个进程结束后都必然会经历僵尸态,时间长短的差别而已
# 回收僵尸进程,得 kill 它的父进程,让孤儿院去回收它
$ kill -9 4660

3.4 Functions wait and waitpid

  • When a process terminates normally or abnormally, the kernel sends the SIGCHLD signal to its parent process . Because the child process termination is an asynchronous event (this can occur at any time while the parent process is running), this signal is also an asynchronous notification sent by the kernel to the parent process.
    • The parent process can choose to ignore the signal, or provide a function (signal handler) that is called when the signal occurs.
  • The function of calling wait or waitpid
    • If all its child processes are still running, block ( block and wait for the child process to exit )
    • If a child process has terminated and is waiting for the parent process to obtain its termination status, return immediately after obtaining the termination status of the child process ( recycle the remaining resources of the child process )
    • If it does not have any child processes, return an error immediately ( get the child process end status/exit reason )

    A wait/waitpid function call can only recycle one process.

#include <sys/wait.h>

pid_t wait(int* status);
pid_t waitpid(pid_t pid, int* status, int options);
  • function return value

    • If successful, return process ID
    • If an error occurs, return 0 or -1
  • The difference between these two functions

    • Wait causes the caller to block before a child process terminates, while waitpid has an option to prevent the caller from blocking.
    • waitpid does not wait for the first terminating child process after its call. It has several options to control the process it waits for.
    • If the child process has terminated and is a zombie process , wait returns immediately and obtains the status of the child process; otherwise wait blocks its caller until a child process terminates
    • If the caller blocks and it has multiple child processes , wait will return immediately when one of its child processes terminates. Because wait returns the process ID of the terminating child process, it always knows which child process terminated.
  • The parameter status of these two functions is an integer pointer

    • If status is not a null pointer, the termination status of the terminating process is stored in the unit it points to.
    • If you do not care about the termination status, you can specify this parameter as a null pointer NULL
  • Macros that check the termination status returned by wait and waitpid

Insert image description here

  • If you want to wait for a specified process to terminate (assuming you know the ID of the process you want to wait for), what should you do?

    • In earlier versions of UNIX, you had to call wait and then compare the process ID it returned with the expected process ID.
      • If terminating the process is not expected, save the process ID and termination status and call wait again. Do this repeatedly until the desired process terminates. The next time you want to wait for a specific process, first check the terminated process list. If there is a process to wait for, get the relevant information, otherwise call wat
    • In fact, what is needed is a function that waits for a specific process. POSIX defines the waitpid function to provide this functionality. The role of the pid parameter in the waitpid function is explained as follows:
      • pid = -1 recycles any child process. In this case, waitpid is equivalent to wait
      • pid > 0 recycles the child process with the specified ID
      • pid = 0 recycles and currently calls waitpid all child processes of a process group
      • pid < -1 recycles any child process in the specified process group
  • The waitpid function returns the process ID of the terminated child process and stores the termination status of the child process in the storage unit pointed to by ststus . For wait, the only error is that the calling process has no child processes (the function call may also return another error when it is interrupted by a signal), but for waitpid, if the specified process or process group does not exist, or the process specified by the parameter pid An error may occur if it is not a child process of the calling process.

  • options constant of waitpid

Insert image description here

  • The waitpid function provides 3 functions not provided by the wait function.
    • (1) waitpid can wait for a specified/specific process , while wait returns the status of any terminated child process
    • (2) waitpid provides a non-blocking version of wait. Sometimes you want to get the status of a child process, but don't want to block it
    • (3) waitpid supports job control through the WUNTRACED and WCONTINUED options
wait case
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char* argv[]) {
    
    
    pid_t pid, wpid;
    int status;

    pid = fork();
    // 返回的值为 0,表示当前进程是子进程
    if (pid == 0) {
    
    
        printf("---child, my id = %d, going to sleep 5s\n", getpid());
        sleep(10);
        printf("------child die------\n");

        // 子进程执行完毕后,将返回 66,表示子进程正常终止
        return 66;
    } else if (pid > 0) {
    
    
     // wpid = wait(NULL);     // 不关心子进程结束原因
        wpid = wait(&status);  // wait() 函数会使当前进程阻塞,直到一个子进程终止
        if (wpid == -1) {
    
    
            perror("wait error");
            exit(1);
        }
        if (WIFEXITED(status)) {
    
        // 判断子进程是否正常终止
            printf("child exit with %d\n", WEXITSTATUS(status));
        }
        if (WIFSIGNALED(status)) {
    
      // 判断子进程是否被信号终止
            printf("child kill with signal %d\n", WTERMSIG(status));
        }

        printf("------parent wait finish: %d\n", wpid);
    } else {
    
    
        perror("fork");
        return 1;
    }

    return 0;
}
$ gcc zoom_test.c -o zoom_test
$ ./zoom_test
---child, my id = 2774, going to sleep 10s
------child die------
child exit with 66
------parent wait finish: 2774

# 测试子进程被信号终止
$ ./zoom_test
---child, my id = 2864, going to sleep 5s
child kill with signal 9
------parent wait finish: 2864
# 另开一个终端,输入下列指令
$ kill -9 2864
waitpid case 1
  • Specify a child process to recycle
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>
#include <pthread.h>


int main(int argc, char *argv[]) {
    
    
    int i;
    pid_t pid, wpid, tmpid;

    for (i = 0; i < 5; i++) {
    
           
        pid = fork();
        if (pid == 0) {
    
           // 循环期间, 子进程不 fork 
            break;
        }
        if (i == 2) {
    
    
            tmpid = pid;
            printf("--------pid = %d\n", tmpid);
        }
    }

    if (5 == i) {
    
           // 父进程, 从表达式 2 跳出
     // sleep(5);

        //wait(NULL);                            // 一次wait/waitpid函数调用,只能回收一个子进程.
        //wpid = waitpid(-1, NULL, WNOHANG);     // 回收任意子进程,没有结束的子进程,父进程直接返回0 
        //wpid = waitpid(tmpid, NULL, 0);        // 指定一个进程回收, 阻塞等待
        printf("i am parent , before waitpid, pid = %d\n", tmpid);

        //wpid = waitpid(tmpid, NULL, WNOHANG);  // 指定一个进程回收, 不阻塞
        wpid = waitpid(tmpid, NULL, 0);          // 指定一个进程回收, 阻塞回收
        if (wpid == -1) {
    
    
            perror("waitpid error");
            exit(1);
        }
        printf("I'm parent, wait a child finish : %d \n", wpid);

    } else {
    
                // 子进程, 从 break 跳出
        sleep(i);
        printf("I'm %dth child, pid= %d\n", i+1, getpid());
    }

    return 0;
}
$ gcc waitpid_test.c -o waitpid_test
$ ./waitpid_test
--------pid = 3133
i am parent , before waitpid, pid = 3133
I'm 1th child, pid= 3131
I'm 2th child, pid= 3132
I'm 3th child, pid= 3133
I'm parent, wait a child finish : 3133 
$ I'm 4th child, pid= 3134
I'm 5th child, pid= 3135
waitpid case 2
  • Recycle multiple child processes
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>
#include <pthread.h>

int main(int argc, char *argv[]) {
    
    
    int i;
    pid_t pid, wpid, tmpid;

    for (i = 0; i < 5; i++) {
    
    
        pid = fork();
        if (pid == 0) {
    
    
            break;
        }
    }

    if (5 == i) {
    
    
        while ((wpid = waitpid(-1, NULL, WNOHANG)) != -1) {
    
    
            if (wpid > 0) {
    
    
                printf("wait child %d\n", wpid);
            } else if (wpid == 0) {
    
    
                sleep(1);
                continue;
            }
        }
    } else {
    
    
        sleep(i);
        printf("I'm %dth child, pid = %d\n", i+1, getpid());
    }

    return 0;
}
$ gcc waitpid_while.c -o waitpid_while
$ ./waitpid_while
I'm 1th child, pid = 3360
wait child 3360
I'm 2th child, pid = 3361
wait child 3361
I'm 3th child, pid = 3362
I'm 4th child, pid = 3363
wait child 3362
wait child 3363
I'm 5th child, pid = 3364
wait child 3364

3.5 Function exec

Insert image description here

  • After using the fork function to create a new child process, the child process often calls an exec function to execute another program.
    • When a process calls an exec function, the program executed by the process is completely replaced with a new program, and the new program starts executing from its main function
    • Because calling exec does not create a new process, the process IDs before and after have not changed.
    • exec simply replaces the text segment, data segment, heap segment, and stack segment of the current process with a new program on disk
    • Replace the .text and .data of the current process with the .text and .data of the program to be loaded, and then let the process start executing from the first instruction of the new .text, but the process ID remains unchanged. Changing cores does not change shells.
  • Use fork to create a new process, and use exec to initially execute a new program. The exit function and wait function handle termination and waiting for termination
#include <unistd.h>

extern char **environ;

// 字母 p(path) 表示该函数取 filename 作为参数,并且用 PATH 环境变量寻找可执行文件
// 字母 l(list) 表该函数取一个参数表,它与字母 v 互斤
// 字母 v(vector) 表示该函数取一个 argv[] 矢量
// 字母 e(environment) 表示该函数取 envp[] 数组,而不使用当前环境
int execl(const char *path, const char *arg, ... /* (char  *) NULL */);
int execlp(const char *file, const char *arg, ... /* (char  *) NULL */);

int execle(const char *path, const char *arg, ... /*, (char *) NULL, char * const envp[] */);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);
int execve(const char *path, char *const argv[], char *const envp[]);
int fexecve(int fd, char *const argv[], char *const envp[]);
  • function return value

    • If successful, do not return
    • If an error occurs, -1 is returned
  • In many UNIX implementations, only execve among these seven functions is a kernel system call . The other 6 are just library functions that eventually call this system call. The relationship between these seven functions is as shown below

    • The library functions execlp and execvp use the PATH environment variable to find the first pathname prefix that contains an executable file named filename. The fexecve library function uses /proc to convert the file descriptor parameter into a path name, and execve uses the path name to execute the program.

Insert image description here

Case 1
  • The execlp function is usually used to call system programs. Such as ls, date, cp, cat commands
  • The execl function loads a process by (path + program name)
  • The execvp function loads a process and uses the custom environment variable env
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/stat.h>
#include <dirent.h>

int main(int argc, char* argv[]) {
    
    
    pid_t pid = fork();
    if (pid == -1) {
    
    
        perror("fork error");
        exit(1);
    } else if (pid == 0) {
    
    
    //  execlp("ls", "-l", "-h", NULL); // 错误,可变参数是从 argv[0] 开始计算
    //  execlp("ls", "ls", "-l", "-h", NULL);
    //  execlp("date", "date", NULL);
    //  execl("/bin/ls", "ls", "-l", "-h", NULL);

        // NULL 为必须提供的,称为 “哨兵”
        char* argv[] = {
    
    "ls", "-l", "-h", NULL};
        execvp("ls", argv);

        perror("exec error");
        exit(1);
    } else if (pid > 0) {
    
    
        sleep(1);  // 让父进程延时 1 秒,保证终端提示符不和输出干扰
        printf("I'm parent : %d\n", getpid());
    }

    return 0;
}
$ gcc fork_exec.c -o fork_exec
$ ./fork_exec
total 152K
-rwxrwxr-x 1 yue yue 8.6K 9月  16 15:42 a.out
-rwxrwxr-x 1 yue yue 8.2K 9月  16 16:15 exec
-rw-rw-r-- 1 yue yue  282 9月  16 16:15 exec.c
-rwxrwxr-x 1 yue yue 8.2K 9月  15 16:55 fcntl
-rwxrwxr-x 1 yue yue 8.3K 9月  15 18:46 fcntl2
-rwxrwxr-x 1 yue yue 8.5K 9月  17 08:52 fork
-rwxrwxr-x 1 yue yue 8.5K 9月  17 09:38 fork2
-rw-rw-r-- 1 yue yue  672 9月  17 09:37 fork2.c
-rw-rw-r-- 1 yue yue  627 9月  17 08:52 fork.c
-rwxrwxr-x 1 yue yue 8.4K 9月  17 16:33 fork_exec
-rw-rw-r-- 1 yue yue  447 9月  17 16:33 fork_exec.c
-rwxrwxr-x 1 yue yue 8.6K 9月  15 19:33 ls-R
-rw-rw-r-- 1 yue yue  943 9月  15 19:31 ls-R.c
-rwxrwxr-x 1 yue yue  12K 9月  17 14:18 mulfork
-rw-rw-r-- 1 yue yue  398 9月  17 11:30 mulfork.c
-rw-r--r-- 1 yue yue  262 9月  15 18:46 mycat.c
-rwxrwxr-x 1 yue yue 8.4K 9月  17 11:32 shared
-rw-rw-r-- 1 yue yue  572 9月  17 11:32 shared.c
I'm parent : 3964
Case 2
  • Use execlp to execute process view and output the results to a file
    • Equivalent to implementing the ps aux instruction
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <fcntl.h>
#include <dirent.h>

int main(int argc, char* argv[]) {
    
    
    int fd;
    fd = open("ps.out", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd < 0) {
    
    
        perror("open error");
        exit(1);
    }

    dup2(fd, STDOUT_FILENO);
    execlp("ps", "ps", "aux", NULL);

    close(fd);

    return 0;
}
$ gcc exec_ps.c -o exec_ps
$ ./exec_ps
$ cat ps_out

Guess you like

Origin blog.csdn.net/qq_42994487/article/details/133051080