Linux | Process

Table of contents

Preface

1. What is process

1. Process under Window

2. Deeply understand the process 

3. Meet the processes under Linux

2. Creation of process

1. Initial fork 

2. The return value of fork

3. Deeply understand the fork function

4. Remaining issues 

3. Process status

1. Process status in the operating system

2. Process status in Linux

3. Status demonstration

(1) Running and waiting state demonstration 

(2) Stop state demonstration 

(3) Zombie status demonstration

4. Orphan process

4. Process priority

1. What is process priority?

2. Why is there process priority?

3. How to modify the priority

5. Other concepts

1. Competitiveness

2. Independence

3. Concurrency and parallelism

4. Process switching

6. Environment variables

1. What are environment variables?

2. How to obtain environment variables

(1) Command line acquisition

(2) Main function parameter acquisition

(3) getenv gets environment variables (most commonly used)

3. How to modify our environment variables

(1) Modify environment variables on the command line

(2) Code to modify environment variables

4. Global nature of environment variables

(1) We use the command line to add a special environment variable to our bash process

(2) Let’s write another piece of code to check this special environment variable

(3) Experimental principle


Preface

        This article mainly explains the process and related knowledge. It is the editor’s knowledge summary of this part of the study. This article is designed to be easy to understand and improve your interest in learning system knowledge. If there are any errors, please contact the editor in time;

1. What is process

Process: is the execution process of a program on a certain data set, and is the basic unit of resource allocation;

        The above are concepts from books. If we understand the process in this way, I think it is a bit general and not particularly profound. Next, we will understand the process from the perspective of the operating system;

1. Process under Window

        Have you ever used Task Manager? Let's click on the process page, as shown above. I use CCtalk as a process, the pft editor is a process, and Google Chrome is also a process; for example, our Google Chrome, we have this software on the desktop. For the shortcut, we right-click and view the properties. We can see that we are actually installed in the path shown below, and we find this directory;

        Sure enough, we found the file in the path, and we can run the software by double-clicking it. In fact, we load the executable program into the memory before it can be run, because in order for the executable program to run, it must first Loaded into memory, this is determined by the von Neumann architecture. If you still don’t understand this explanation, you can click on the link below to view the following article;

Understanding von Neumann in simple terms

        The CPU only deals with our memory, so after the program is loaded into the memory, can the CPU take the executable program from the memory and execute it? Before answering this question, we have to answer the following question;

2. Deeply understand the process 

Question 1:Can we only have one executable program in the computer memory?

        The answer is definitely no. We can answer it without even thinking about it. If there is only one program, then wouldn’t we be able to listen to music and play games at the same time? And our computer can have a large number of executable programs stored in the memory waiting to be executed at the same time;

Question 2:Can our computer execute multiple executable programs on one CPU at the same time?

        The answer is also no. In our computer, at a certain moment, a certain CPU can only execute a certain program. It just executes multiple programs in a certain period of time through time slice rotation. This is also concept;Concurrency

Question 3:Since there are a large number of executable programs in the memory and our CPU resources are limited, our operating system must manage the execution of these executable programs. So how does our operating system manage these executable programs?

        The answer is to describe first and then organize. We first describe the executable program loaded into memory. The description here can be understood as using a structure to record information about this program, such as identifier, status, priority, program counter, memory pointer, etc., and then These structure objects are organized through some kind of data structure, such as linked list, sequence list, etc.; among them, the structure we describe is called PCB control block. In Linux, this structure is called task_struct;

        Our process is composed of the PCB control block and the code and data of the executable program. The concept mentions that the process is the execution process of a certain data set. The execution process here is a certain moment, because the data in the PCB will change. For example, the address of the next execution instruction is recorded, and it will be updated every time the current instruction is executed;

        At this point, we can answer the questions we left in the first summary. After our executable program is loaded into the memory, we must first create a corresponding PCB control block. We record this through this PCB control block. segment executable program information, and then use a queue to maintain these PCB control blocks to queue up to obtain CPU resources. We call this queue ready queue;

3. Meet the processes under Linux

        Earlier we introduced the basic concept of process and developed a deep understanding of it. Now let's take a look at the so-called process. Before that, we first learn the first attribute of the process, its identifier. This identifier is An integer number that is unique on the current machine, we call it pid. We can get the pid of the current process through the getpid system call; by the way, we introduce another system call interface---getppid, to get the pid of the parent process of the current process. pid;

        We query manual No. 2 through man to query the above system call-related information. As shown in the figure above, we need to include two header files. We can write the following code;

        We can view the information of all processes through the ps -axj command, and use the grep statement to filter out the current process information; it is not difficult to see that the pid of the process we are running is 13354;

        In fact, a directory named 13354 will be automatically generated in our /proc directory to record the current process information; as shown in the figure below;

        For example, the following two files, CWD is referred to as Current Work Directory, the current working directory is the directory of the executable program for the current process. The EXE file is the path of the executable program;

2. Creation of process

1. Initial fork 

        Under Linux, if we want to create a process in our own program, we must use the fork function to complete the task. Let's introduce the use of the fork function;

        The above is the result of our query through the man command. Before we use it, we need to add the header file unistd.h, and the function has no parameters and has a pid return value. Let us ignore these and directly call the function to see the effect. We run the following code ;

        We compile and run, and the result is as shown in the figure below;

        We were surprised to find that our exit function was printed twice; yes, the other one was the result printed by the subprocess we created;

2. The return value of fork

        We continue to query the use of our fork function through the man command, as shown in the figure below;

        Roughly translated, if the function call is successful, the pid of the child process will be returned to the parent process, and 0 will be returned to the child process; if the function call fails, -1 will be returned to the parent process, the child process will not be created, and the error code will be set up;

        So it seems that if our function call is successful, two return values ​​will be returned? From the perspective of C language and C++, this is impossible. Our grammar stipulates that only one return value can be returned, but this function has two return values. This is awesome; we will not introduce the specific principle yet. , let’s practice it directly to see if it’s as we expected;

        The running results are shown in the figure below;

        We have split the execution flow through if, and we have executed two branches in the if statement at the same time; what we see now is the phenomenon of two execution flows after the fork function, and we use the if statement to make it look at the same code. , and execute different statements, this is our preliminary understanding;

Question:Why is the pid of the child process returned to the parent process and 0 to the child process, but not the other way around?

        First of all, we must clearly understand that a child process can only have one parent process, and a parent process can have multiple child processes. So as a parent process, do we need to manage the child processes? So how do we manage it? If we receive 0, the parent process cannot distinguish between multiple child processes.

3. Deeply understand the fork function

        How does the fork function return two return values? Can the return statement only return one value? Next, let’s explore this issue together;

pid_t fork()
{
    // 1、子进程的创建
    ......


    // 2、返回返回值
    return pid;
}

        Although we don’t know how the fork function creates a child process, we can know that the fork function does two specific things. One is to create a child process, and the other is to return a return value. These two steps are required, so here comes the question. ;

Question:After the first step is executed, is the main logic of the fork function completed? ?

        The answer is undoubtedly yes. After we finish creating the child process, there are already two execution streams! ! ! The phenomenon we saw before is that there are two execution streams after the fork function is executed. This understanding is wrong! ! Since there are two execution streams after the child process is created in our fork function, the code after creation should be shared! So these two execution flows will return two values! ! Therefore, the fork function has two return values;

4. Remaining issues 

        We found that in our previous code, we used id to receive the fork return value, and id actually had two values. Here we suspected that the two ids were not the same id variable, so we tried to print the addresses of the two ids. The code as follows;

        We run the above code and the results are as follows;

        A magical scene happened. We found that the two id values ​​were the same, as we expected, and their addresses were also the same. This was completely contrary to what we had learned before. Regarding this part, we must first Understand the address space of the process, so we will explain it in a later article;

Linux | Process Address Space-CSDN Blog

3. Process status

        The life cycle of a process can be divided into a set of states, which describe the entire process; this is a concept that appears in textbooks;

1. Process status in the operating system

        In the operating system course, we divide the process status into five states, and there are also seven states. Among them, two suspended states are added to the original five states; as shown in the figure below;

        I believe you have often seen the picture above. The seven ovals in the picture above are the seven states of the process, and each arrow above is the switching between states. Let me use this to analyze the mutual conversion between the following states.

NULL->New state:After the program is loaded into the memory, the relevant PCB control blocks, etc. are created;

New state->Ready state:After the process is created, it will enter the ready state queue;

Ready state->Running state:The process at the head of the team will be scheduled, enter the CPU to prepare for running, and enter the running state;

Ready state->Suspended ready state:Due to tight memory, the ready process is moved to the swap area of ​​the disk for temporary storage;

Running state->Ready state:Because the time slice is up, the CPU is forced to give up and re-enters the ready queue;

Running state->Blocking state:Due to an IO event encountered during operation, the CPU actively gives up and enters the blocking state;

Running state->Termination state:Exit the program and enter the termination state when it reaches the end of the program;

Running state->Suspend ready state:The suspension event of a higher priority program ends and the CPU needs to be preempted. At this time, the memory happens to be tight. A process in the running state may directly enter the suspended ready state from the running state;

Blocking state->Ready state:The IO events required by the process have been completed. At this time, the blocking state will enter the waiting queue of the ready state for queuing;

Blocked state->Suspend blocked state: Due to memory shortage, the operating system suspends the blocked process in the memory to the swap area in the disk to relieve the memory Nervous question;

Suspend blocking state->Blocking state:When the memory shortage problem is alleviated and the priority of our process in the blocking and suspending state is higher, the operating system may This process will be put into memory and wait for IO processing;

Suspend blocking state ->Suspend ready state:Waiting for the occurrence of IO events does not actually require transferring code and data into memory, but when we wait for the event to be satisfied Finally, we will transition from the suspended blocking state to the suspended ready state;

Pending ready state ->Ready state:When there is no process in the ready queue or the priority of the pending ready state process is higher than the priority of the ready state process, the operating system Will convert the process from suspended ready state to ready state;

2. Process status in Linux

        It is said that the operating system is a philosophy in the computer world and the theoretical knowledge of computers. In Linux, the process state is not designed as mentioned above. In the implementation of operating system code, there may be some subtle differences from the above theoretical knowledge; in total There are the following 5 states;

R:Running state, the running state here includes the state of waiting in the ready queue, we collectively call it the running state;

S:Sleep state, the sleep state here refers to interruptible sleep, which is waiting for the completion of IO events;

D:Disk sleep state. The sleep state here refers to uninterruptible sleep. In this state, it usually waits for the disk IO to be completed;

T:Stop state, we can usually send the SIGSTOP signal to stop the process, or we can send the SIGCONT signal to continue running;

t:Stop state, the stop state here is generally to stop the program during the debugging phase;

X:Death state, this state generally refers to a state after the process has finished running;

Z: Zombie state. The zombie state refers to the fact that after the current process exits, the parent process does not wait for it. At this time, the process will enter the zombie state after running. If the parent process does not recycle the child process, the child process will always be in a zombie state;

3. Status demonstration

        Next, I will show the following part of the status. First, we wrote the following code;

(1) Running and waiting state demonstration 

        ​ ​ ​ Next we monitor the status of this process through the ps command. The specific shell script is written as follows;


while :; do ps -axj | head -1 && ps -axj | grep test | grep -v grep; sleep 1; echo "----------------"; done;

        We first run the code, which will continuously print hello world. Then we open an ssh session and enter our shell script. The result is as shown in the figure below;

        We found that most of them are in the waiting state (S), and only a small part are in the running state (R). Why is this? Isn’t our code constantly printing? Shouldn't it be always running?

        In fact, in our above code, most of the events are in IO, because the time consumed by IO is much greater than the running time, and IO generally waits for the completion of IO events, so most of our process is in a waiting state;

Replenish:

        In the above state, what we see is R+ and S+. This plus sign means that the process is a foreground process. The so-called foreground process means that we run it in the foreground mode on the command line. At this time, we cannot continue to enter commands until the program The run is over, and if we run it in the background, we can still enter commands on the command line and it will respond; (adding & after the runtime means running the program as a background process)

(2) Stop state demonstration 

        ​​​​​ Let’s demonstrate the stop state again; we can use kill -l to see what signals there are, as shown in the figure below;

        We use signal No. 19 (pause) and signal No. 18 (continue) to test our T status. First, we run the program and monitor the status, and then send signal No. 19, as shown below;

        After we send signal No. 19 to the specified process, our program stops and the process state is converted from S+ to T state; then we try to send signal No. 18 again;

        After we send signal No. 18, it is not difficult to find that our program is running and the status has changed accordingly. Here is a detail. The program that is re-run here is no longer a foreground program, but a background program! This also means that we can enter commands. For example, if we enter to send signal No. 9 to terminate the process, you can try it;

        Because the screen swipe was too fast, the command I entered was swiped up, but the command I entered could be executed and the process was successfully killed;

        Let’s try the program stop caused by the gdb breakpoint, as shown in the figure below, we use gdb for debugging, after encountering the breakpoint, the status becomes t;

(3) Zombie status demonstration

        First, we write the following code, mainly to let the child process run for 3 seconds and then exit, and the parent process keeps executing;

        Run the code and observe its running status diagram, as shown below;

        We found that in the first three seconds, when the child process did not exit, it was in the S state. After 3 seconds, the child process exited and the parent process did not recycle it. At this time, it became a zombie state;

        The zombie process will cause resource leakage caused by the long-term non-release of resources. We must let the parent process recycle resources, which will be introduced later;

4. Orphan process

        We discussed earlier that if the child process exits while the parent process is still running and does not recycle the child process, the child process will be called a zombie process. So what happens when the parent process exits and the child process is still running? Yes, at this time the child process will become an orphan process. The orphan process will be adopted by process No. 1, and of course it will be recycled by process No. 1. We modify the above code and we can get the following code;

        We found that after running for 3 seconds, the parent process exited and the child process was managed by process No. 1;

4. Process priority

1. What is process priority?

        The so-called process priority is the order in which CPU resources are allocated to processes. Processes with higher priority get CPU resources first, and vice versa; they lag behind other processes in getting CPU resources;

2. Why is there process priority?

        We all know that there may be a large number of processes on the computer, and our CPU resources are limited. Because of the large number of processes and a small amount of CPU resources, competition for resources occurs, so there is the concept of priority;

3. How to modify the priority

New priority = old priority + nice value

We can check the process priority through the ps -al command; as shown in the figure below;

UID: the identity of the executor;

PID: process pid;

PPID: pid of the parent process;

PRI: old priority (initially always 80);

NI: nice value (the old priority [80] plus the nice value is the new priority, the value range is -20 ~ 19);

        We can modify the priority through the top command; for example, we modify the process test priority to 89;

1. Enter the top command

2. Enter r and enter the pid value of the process to be modified.

3. Enter the nice value to be modified (because you want to change the priority value to 89, enter 10)

4. Check the priority again

        As we expected, the priority value was modified to 89; similarly, we can enter a negative number to make the priority value smaller, but it is modified on the basis of 80;

Notice:

1. The smaller the priority value, the higher the priority, and vice versa;

2. The nice value range is -20 ~ 19, so the priority value range is 60 ~ 99;

5. Other concepts

1. Competitiveness

Competition:Due to the limited number of CPUs, there is competition between processes;

2. Independence

Independence:Each process is independent of each other and does not hinder each other's execution, so the process hasindependence ;

3. Concurrency and parallelism

Parallel:Multiple CPUsAt the same time executing multiple processes is called Parallel;

Concurrency:A CPUa period of timeexecuting multiple processes is called Concurrency;

4. Process switching

        Most of our systems now use a time-sharing system; each process is executed in the CPU for a period of time and then exits, usually this period of time is very short; so how do we ensure that the next time we switch to the program, it will end from the last run? What about the next operation of the location?

        Our CPU is usually equipped with a set of registers. These registers are used to save temporary data generated when running the program. If we save the data in the register when the program leaves, we can directly use the last saved data the next time it runs. Resume the operation, and then you can continue running with the results of the last run;

6. Environment variables

1. What are environment variables?

        Environment variables refer to some variables in the operating system that are used to specify the operating environment of the operating system; this is a basic concept, so you may still be a little confused. Next, we will introduce our environment with a question. variable;

Question:We all know that the commands we usually run under Linux, such as ls, etc., are all executable programs written in C language, so why do we not call the system commands? A path is required, and when we call the executable program we wrote, we need to specify the path?

        Actually, this is because we added the path of the ls command to the environment variable PATH, so we can run the system command without specifying the path; we can passecho $PATH View the PATH variable;

        We checked the path of the ls command, and then we checked the environment variable PATH, and we found that there is also the path of ls; in fact, when we execute the command, we will first check whether there is an ls command in all paths of PATH, and if so, execute it. , if not, an error will be reported and the specified path cannot be found;

2. How to obtain environment variables

(1) Command line acquisition

        We can directly enter env on the command line to obtain the environment variables of the current process (usually bash);

        We can see that our environment variables exist in the form of key-value pairs. The key is on the left side of the equal sign, and the value is on the right side of the equal sign. Here is a brief introduction to several environment variables;

PATH:Specify command search path

HOME:The current user’s home working directory

SHELL:SHELL path diameter

USER:Current use

PWD:Current path

(2) Main function parameter acquisition

        In fact, our main function has three parameters. The first parameter is the number of command line parameters, the second parameter is the command line parameter string array, and the third parameter is the environment variable string array. Here we mainly explain the third parameter. The three parameters are as follows;

        Our environment variable array always ends with NULL, so we can use traversal to determine whether it reaches NULL. If it reaches NULL, it will stop. At this time, we can take out all the environment variables;

        We can also use the global variable environ to obtain our environment variables. The global variable environ is in the header file unistd.h. The following is the man manual query result;

        Therefore, the above code can also be changed to this;

        We still output all environment variables;

(3) getenv gets environment variables (most commonly used)

        We can directly obtain the value of an environment variable through the environment variable; we usually use this method, the following are the man manual query results;

        Therefore, we can query the PATH environment variable like this, as shown in the following code;

        We run the above code and the result is as shown in the figure below;

3. How to modify our environment variables

(1) Modify environment variables on the command line

        We have just viewed our environment variables through the command line through echo. Similarly, we can also modify our environment variables through the command line. We modify the environment variables through export; suppose that we want to make what we just The written executable program test can also be run without a path. We can do the following operations;

        At this point, we check our environment variables again. At this time, our environment variables contain our current path;

        Similarly, at this time we can execute our myproc executable program without a path;

Note:The environment variables we modified here will be reset the next time we log in. Unless we change the relevant configuration files, the changes will only be effective this time; 

(2) Code to modify environment variables

        We can modify environment variables through the function putenv. First, we introduce putenv, which is declared as follows;

int putenv(char* str);

        The str parameter is an environment variable key-value pair, that is, a string of the type key=value. If our key already exists in the original environment variable, change the value directly. If it does not exist, add this new environment variable; About Return value, if 0 is returned, the call is successful, if non-0 is returned, the call fails;

        We can type the above code to add environment variables; the results after running are as follows;

        Some friends found that if we use env, we have not added this environment variable; this is because our environment variable settings are only set in the current process, and our command line is in the bash process, and our newly run program When it is a sub-process of bash, the way to modify the environment variables on the command line is to modify the environment variables of the bash process;

4. Global nature of environment variables

        Are our environment variables global? The so-called globality is usually reflected in the fact that it will be inherited by child processes. We can verify this conjecture through the following experiments;

Experiment: 

(1) We use the command line to add a special environment variable to our bash process

(2) Let’s write another piece of code to check this special environment variable

        We ran the above code and sure enough we found this environment variable;

(3) Experimental principle

        We modified the environment variables of the Bash process, and the programs we wrote are all run under Bash, so the executable programs we run are all sub-processes of Bash; if we find them in the sub-process, we add them in the Bash process special environment variables, indicating that the environment variables will be inherited to the child process;

Guess you like

Origin blog.csdn.net/Nice_W/article/details/133977895