[Linux] Basic knowledge of process control

The fork function is a very important function in Linux. It creates a new process from an existing process. The new process is the child process, and the original process is the parent process.

#include <unistd.h>

pid_t fork(void);

Return value: 0 is returned from the process, the parent process returns the child process id, and -1 is returned on error.

The process calls fork. After control is transferred to the fork code in the kernel, the kernel does:

Allocate new memory blocks and kernel data structures to the child process;

Copy part of the data structure content of the parent process to the child process;

Add the child process to the system process list;

fork returns and starts scheduler scheduling.

Since the understanding of fork has already appeared before, I will not explain it in detail here.

In the previous address space chapter ( [Linux] Address Space Concept_Huaguoshan~~Programmer’s Blog-CSDN Blog ), we have initially understood that when creating a child process, the system will adopt the copy-on-write decision. We can reverse Think about it, what if the system directly copies a copy to the child process? We know that process = kernel data structure + code & data. Once a process is created, the code is in a read-only state, but the data can be made writable. So why don't we copy a copy of the data to the child process? Because the system does not know which data is going to be used, copying data that is rarely used will cause memory utilization to decrease.

So why does the OS choose copy-on-write technology to separate parent and child processes?

1. When used, allocating it to the child process is an efficient memory performance.

2. When the system executes code, it cannot know which memory will be accessed.

Before and after data modification:

Conclusion: In a computer system, when a parent process creates a child process, the child process inherits the code and data of the parent process. Initially, permissions for this code and data are read-only. When the child process needs to modify these codes and data, it will perform a copy-on-write operation, which changes the permissions of the part of the data that needs to be modified from read-only to writable.

2. Process termination

1. When the process terminates, what does the operating system do?

When the process terminates, the operating system releases the kernel data structure, code and data requested by the process, which is essentially memory release.

2. Process termination, common way

1. Main function, return + return code

When we write a C/C++ program, when it is run, the following situations will occur:

(a, the code is run and the result is correct;

(b, after the code is run, the result is incorrect;

When we write the main function, we often return a 0 (not always 0). This is the process exit code, which is provided to the upper-level parent process. If the returned value is not 0, it means that the result is incorrect; otherwise, exit normally. , then return 0;

Supplement: Get the last process exit code command echo $?

So what is the significance of the return value of the main() function?

When we call a program in a shell script, we can determine whether the program executed successfully by checking the program's main function return code. If the return code is 0, it means that the program execution was successful; if the return code is other non-zero value, it means that the program execution failed or an error occurred. In this way, we can perform subsequent processing based on the return code of the main function (which means that there will be no subsequent processing until the program has finished running), such as outputting corresponding prompt information or performing error handling (for example: strerror() function the cause of the error) . (from chatgpt)

(c, the code is not finished and the program crashes. (Explained here in the signal section)

2. exit() function

Calling exit anywhere in the code will terminate the process. Here we add a system-level interface_exit()

Next let's experiment with the difference between the two:

Line buffer, if we do not add a newline character, the print data will be stored in the buffer area first and flushed to the display after the process ends.

    int main()
  6 {
  7   cout << "lisan";
  8   sleep(3);
  9   exit(11); // _exit(11);                                     
 10   return 0;                    
 11 }                              
 12

Try two functions. The _exit() function will not print lisan when the process exits. The following is a diagram of the reason.

Since _exit() directly terminates the program, the data in the buffer is not flushed out. So where do we want the buffer to be? We know that _exit() is the interface of the operating system, and exit() is a library function, so we can roughly guess that the program that manages the buffer is on the operating system .

3. Process waiting

Why is the process required to wait? The parent process needs to get a piece of data, create a child process, and wait for the child process to return the data before the parent process can proceed to the next step. Also, if the child process exits, if the parent process exits early, the child process will become a zombie process, causing a memory leak.

In short;

As mentioned before, if the child process exits and the parent process is ignored, it may cause a 'zombie process' problem, which in turn may cause a memory leak.
In addition, once the process becomes a zombie, it will be invulnerable, and the "kill without blinking" kill -9 will be useless, because no one can kill a dead process.
Finally, we need to know how well the tasks assigned by the parent process to the child process are completed. For example, if the child process is completed, the result is correct or incorrect, or whether it exits normally.
The parent process recycles the resources of the child process and obtains the exit information of the child process through process waiting.

Execute the following procedure:

        int main()
  6 {
 12     pid_t pd = fork();
 14     if (pd < 0)
 15     {
 16       // 程序失败
 17       perror("fork ");
 18     }else if(pd == 0)
 19     {
 20       // 子进程
 21       int a = 5;
 22       while(a--)                                                   
 23       {
 24       printf("是子进程：getpid:%d,getppid:%d\n",getpid(), getppid()    );
 25       sleep(1);
 26       }      
 27     }else{
 28       // 父进程
 29       while(1)
 30       {
 31       printf("是父进程：getpid:%d,getppid:%d\n",getpid(), getppid()    );
 32       sleep(1);
 33       }   
 34     }
 35 }

So how do we receive the process? (Although the parent process ends early, the child process will be adopted and recycled by the operating system. This idea: it is a programming idea, which we will learn in the future)

1. Recycling process method

(1. wait method

#include<sys/types.h>

#include<sys/wait.h>

pid_t wait(int* status);

Return value: Returns the pid of the waiting process on success , -1 on failure .

Parameters: Output parameters, to obtain the exit status of the child process. If you don’t care, you can set it to NULL .

Additional understanding: The difference between zombie processes and memory leaks

We know that once a child process enters the zombie state, although its code and data can be released, the kernel data structure of its PCB (task_struct) will be retained. If the operating system never recycles it, it will also be a memory leak; in the application we wrote , the memory we apply for from the heap area through new and malloc needs to be released after use, otherwise it will cause memory leaks. understand:

The former is at the operating system level, while the latter is in-process. The latter process exits and the system reclaims the memory, so there is no memory leak. In the former, the operating system does not handle the PCB of the zombie process and will never be able to reclaim the memory.

(2.waitpid function

pid_ t waitpid(pid_t pid, int* status, int options);

return value:

When returning normally, waitpid returns the collected process ID of the child process ;

If the option WNOHANG is set and waitpid finds that there are no exited child processes to collect during the call, 0 is returned ;

If an error occurs during the call , -1 is returned , and errno will be set to the corresponding value to indicate the error;

a, parameter pid

Pid=-1, wait for any child process. Equivalent to wait .

Pid>0. Wait for the child process whose process ID is equal to pid .

To add some knowledge about status , we know that it is used to record the return code of the child process. At the same time, we also know that there will be three situations when the program ends:

How to express these different situations in terms of status?

b, parameter status

Both wait and waitpid have a status parameter, which is an output parameter and is filled in by the operating system.

If NULL is passed, it means that the exit status information of the child process is not concerned. Otherwise, the operating system will feedback the exit information of the child process to the parent process based on this parameter. Status cannot simply be treated as an integer, but can be treated as a bitmap. The specific details are as follows (only the lower 16 bits of status are studied - little-endian machines)

32, we study the top 15

So how do we get the exit status? ?

(status >> 8) & 0xff // 0xff -> 0000 0000....1111 1111 retains the last 8 bits

This is the normal end of the process. What about the abnormal end of the process? We know that the process exits abnormally. In fact, the system kills the process and sends a kill signal to the process. Once the process exits abnormally, its process return code loses its meaning.

So how to get this signal?

(status >> 7) & 0x7F // 0000... 111 1111 retain the last 7 bits, (note: if status has been shifted to the right, this bit operation will be based on the last one) shift right)

The process ends abnormally, not only because of the internal code problem of the process, but also because of external reasons, such as: kill -9 kills the process, the error message is 9

However, you still need to know the composition of status, and then you need to perform bit operations. This understanding is okay, but it is inconvenient to use in the long run. Therefore, for the convenience of use, the following is provided:

Commonly used to obtain process exit status (recommended)

WIFEXITED(status): True if it is the status returned by the normal termination of the child process. (Check whether the process exited normally )

WEXITSTATUS(status): If WIFEXITED is non-zero, extract the child process exit code. (View the exit code of the process )

Replenish:

c, parameter options

The option defaults to 0, which means that when the child process is running, the parent process is blocked and waiting; the WNOHANG parameter is a macro definition, which means that the parent process is in a non-blocking state. (WNOHANG understanding: HANG is a professional term. If a process is stuck, the process is either in the blocking queue or waiting to be scheduled, so it is called the process HANG. So NOHANG means non-blocking waiting)

Linux is written in C language. Wait is essentially a function in the system. We understand it through a pseudo code:

Does non-blocking waiting mean not waiting for the child process? Essentially, non-blocking waiting is a polling scheme based on non-blocking calls. In human terms, I ask Zhang San for help, and Zhang San says he is busy. I will do my thing first, and then call him every minute. , to see if he has finished his work.

4. Process replacement

1. Concepts and principles

After using fork to create a child process, it executes the same program as the parent process ( but may execute different code branches ). The child process often calls an exec function to execute another program. When a process calls an exec function , the user space code and data of the process are completely replaced by the new program , and execution starts from the new program's startup routine. Calling exec does not create a new process , so the ID of the process does not change before and after calling exec . (meaning calling exec will not create a new child process)

2. Process replacement method

Method: through execl function

Let's ask man

Today we will learn the simplest execl.

int execl (const char* path, const char* arg, ...) // Path, just write it on the command line.

path: the address + path of the target program

arg: function parameter

... : means a variable parameter list. Note: the parameter list must end with NULL, which indicates the end of parameter extraction.

Here are examples:

From the above observations:

1. After the process is replaced, "Process End" is not printed. This can be proved by the fact that once the execl function is called successfully, all the code and data of the original process are replaced by the new process.

2. If the execl call fails, continue the original process, but you can terminate the process directly at this time.

3. Try fork + execl function

Look at the code below:

  1 #include <iostream>
  2 #include <unistd.h>
  3 #include <sys/types.h>
  4 #include <sys/wait.h>
  5 using namespace std;
  6 
  7 int main()
  8 {
  9    pid_t pd = fork();
 10    if (pd == 0)
 11    {
 12     // 子进程
 13      cout << "子进程开始, pid:" << getpid() <<  endl;
 14      execl("/usr/bin/ls", "ls","-l", "-a", "--color=auto", NULL);
 15      exit(-1);
 16 
 17    }else if (pd)
 18    {
 19      // 父进程
 20      int status = 100;
 21      cout << "父进程开始" << endl;
 22      pid_t ret = waitpid(-1, &status, 0);                               
 23      if (ret)
 24      {
 25        cout << "子进程退出，打印子进程退出码:" << WEXITSTATUS(status) <<    endl;
 26      }else 
 27      {
 28        cout << "子进程未退出" << endl;
 29      }
 30 
 32    }
 33    else 
 34    {
 35      cout << "创建子进程失败" <<endl;
 36    }
 37   return 0;
 38 }

result:

Q: Why do we need to create a child process to replace it?

Answer: In order to realize the idea that the parent process reads data, analyzes the data, and then assigns the child process to complete a certain task.

Question: Code sharing between parent and child processes, data copy when writing? What about the execl function replacing the process? Will the code be copied on write?

Answer: Yes, because if the parent and child processes are shared, when the execl function is called, the code will be copied on write, otherwise the parent process will be affected.

Other functions added:

The process replacement function actually has quite a few interfaces, as follows:

1. The execv function uses a graph flow as follows:

2. execlp function,

3. execvp function, this is very easy to use. It can be understood in this way. The instruction mode is stored in Vector , and "P" omits the file path and automatically searches for environment variables.

4. In the execle function, "e" means environment variables. By passing environment variables to the new program, the values of these environment variables can be used in the new program. For example, you can set environment variables to affect the behavior of the new program, or pass some configuration information that needs to be used in the new program.

Here is an example showing how to pass environment variables using the execle function:

#include <unistd.h>

int main() {
    char *envp[] = {"MYVAR=Hello", "OTHERVAR=World", NULL};
    execle("/path/to/program", "/path/to/program", NULL, envp);
    return 0;
}

In the above example, we defined two environment variables MYVARand OTHERVARpassed them to the new program. New programs can use getenvfunctions to obtain the values of these environment variables.

It should be noted that when using the execle function, a complete array of environment variables must be passed , including the system default environment variables. If you only want to pass custom environment variables, you can use the execve function (this is a real system call, other exec** functions are just encapsulation) and pass environthe variables to it as parameters. (from chatgpt)

There is a point worth noting here. Even if it is a process replacement, the environment variables are system data. The child process will copy the environment variables of the parent process and will not be replaced.

Naming summary:

These function prototypes seem easy to confuse , but they are easy to remember as long as you master the rules.

l(list): indicates that the parameters are in a list

v(vector): Array for parameters

p(path): Automatically search the environment variable PATH with p

e(env): Indicates that you maintain environment variables yourself

4. How to use the execl function to run other executable programs

No, the following is the mypro program I call on the Test program.

makefile: can compile multiple files at one time.

The picture on the far right uses command line parameters. You can refer to the command line parameters section of this blog [Linux] Basic Concepts of Processes [Part 2] - CSDN Blog

At this point, we can understand the function of the exec*** function - the interface of the underlying loader

5. Exercise - Make a simple shell command interpreter

Target:

Make a shell that can read and execute instructions.

shell execution command:

1. Making the framework:

We need to create an infinite loop to continuously receive instructions.

// 属于是死循环
   13   while (1)
   14   {
   15     // 首先是打印地址
   16     cout << "[afeng@_myshell]$ ";
   17     fflush(stdout); // 解决缓冲区的问题
        }

We can simply print a shell name, but it cannot wrap the line, but if it cannot wrap the line, there will be a buffer problem, and it can be refreshed through the fflush function.

2. Receive and process instructions

You cannot use cin and scanf because the instructions are accompanied by spaces. When cin and scanf encounter a space, the input ends early. Here we use functions that can receive space characters, such as: getline, input stream function fgets. We first save the instruction into the pointer array. Since we are simply making a shell, we choose to call the instruction program, and if we choose to call it, we must replace the process. (What we need to distinguish here is that we use child process replacement only to start other programs, and the parent process is not modified.) To use the process replacement function exec***, we have to split the instructions.

    // 然后开始接收指令
   20     char instruct[NUM];
   21     memset(instruct, '\0', sizeof instruct);
   22     if (fgets(instruct, sizeof instruct, stdin) == NULL)
   23     {
   24       continue;
   25     }
   26     instruct[strlen(instruct) - 1] = '\0';
          // 在输入指令后，我们会通过回车键确认，但回车键被当做'\n'记录，所以需要纠正。
   27 
   28     // 开始拆分出指令
   29     char* argv[100] = {0};
   30     argv[0] = strtok(instruct," ");
   31     int i = 1;                                                        
W> 32     while (argv[i++] = strtok(NULL, " "));

3. Child process replacement, parent process waits

The next step is to write the child process and the parent process, and replace the child process. We know that the basis for us to be able to execute the corresponding instructions without a path in Linux is that the path already exists in the environment variable, so the system will automatically search.

36     // 内置命令 1.我们通过子进程替换打印我们需要的结果，父进程不受影响
   37     // 当需要更改路径时，目标是父进程
   38     if (strcmp(argv[0],"cd") == 0)
   39     {
   40         if (argv[1] != NULL)
   41           chdir(argv[1]);
   42         continue;
   43     }
   44 
   45     pid_t pd = fork();
   46     if (pd == 0) // child
   47     {
   48       execvp(argv[0], argv);
   49       exit(-1);                                                       
   50     }
   51     else{
   52       // parent
   53       int status;
   54       pid_t ret = waitpid(pd, &status, 0);
   55       if (ret > 0 )
   56       {
   57         cout << "子进程运行成功，退出码:" << WEXITSTATUS(status)<< endl;
   58       }else{
   59         cout << "子进程运行失败,退出码:" << WEXITSTATUS(status)<< endl;
            }
   61     }
   62   }
   63   return 0;
   64 }

Conclusion

That’s it for this section. Thank you for browsing. If you have any suggestions, please leave them in the comment area. If you bring something to your friends, please leave a like. Your likes and attention will become a blog post . The main driving force of creation .