[Linux] The concept of the process and the operation process

1. What is a process

background:

We know through the von Neumann architecture that the program needs to run, and it needs to be transferred from the disk to the memory, and then transferred to the CPU, and the CPU will run it.

With this background in mind, we can come to the following understanding of the process :

  • A process is a program that is loaded into memory, or a program that is run is called a process .

This is also the saying in many textbooks, but it is not entirely correct , and even somewhat one-sided.

As shown above, after our computer is running, whether it is the software we open or the programs used in the background, there are a lot of them. If these programs are directly managed by the operating system, it will increase the burden on the operating system, so the operation The system needs to sort out each program, package it into an object, and more abstractly speaking, turn it into a variable. The operating system only manages these variables, which will become easier.

For example, in a school, each student has his own information, such as name, age, gender, grades of various subjects, etc. The students here are programs in the operating system. To manage them, we can put the student Attributes, such as name, age, etc., are put into the structure, and the corresponding objects are created through the structure, and these objects can be managed, so that we can get the concept of process.

The program we write generates an executable program after compiling and linking. The program is essentially a file and placed on the disk. Double-click to run in Windows, and run in Linux ./to load it into memory.

After the process is loaded into the memory, the operating system must manage it, that is, manage the data of the program, abstract all the attributes from these programs to build a structure, and then create each process into a structure Objects, and finally use a certain efficient (in Linux, use a double-linked list to organize;

When you want to get the corresponding code in each process, you can call the corresponding code in the memory according to the structure object;

In this way, all applications are run as processes , and each process has its own independent address space, so that the addresses between processes are isolated from each other. The CPU is allocated uniformly by the operating system, and each process will have the opportunity to get the CPU according to the level of the process优先级 .

So it is not correct for us to say that a process is just a program that is running or will be running. Its understanding should be as follows:

Process = kernel data structure related to the process + code and data on the corresponding memory of the process

Among them, in the operating system, it is used to describe and organize process data structures (abstract ones can be regarded as structures), called进程控制块-PCB

2. Describe the process - PCB

From the above knowledge, we know that a single process is described by a specific structure, which 进程控制块can be understood as a collection of process attributes.

In the textbook, the process control block is called PCB (process control block) , and different operating systems have different names of PCBs. The PCB under the Linux operating system is:task_struct

insert image description here

That is to say, after each process is loaded into the memory, the Linux operating system will create task_structan object for it and associate it with the code and data loaded into the memory, and then connect all the task_structuse pointers into a double-linked list (at this time The process is the managed process in the operating system), so the management of the process is the addition, deletion, checking and modification of the linked list. (This is a simple route, the remaining content will be explained later)

PCB is used to describe the process. In C++, we use a class to describe a transaction. In C language, we use a structure description, and Linux is written in C language, so Linux is used as a process control block to describe a process. Structure The body task_struct is implemented.

  • task_struct is a data structure of the Linux kernel, which is loaded into memory (RAM) and contains process information.

After understanding these, let's take a look at the various attributes of the program contained in task_struct in Linux, as follows:

task_struct content classification

  • Identifier: Describe the unique identifier of this process, which is used to distinguish other processes.
  • Status: task status, exit code, exit signal, etc.
  • Priority: Priority relative to other processes.
  • Program Counter: The address of the next instruction in the program to be executed.
  • Memory pointers: including pointers to program code and process-related data, as well as pointers to memory blocks shared with other processes.
  • Context data : The data in the registers of the processor when the process is executed.
  • I/O status information: including the displayed I/O requests, the I/O devices allocated to the process and the list of files used by the process.
  • Billing information: May include sum of processor time, sum of clocks used, time limits, billing account number, etc.
  • other information.

Abstracted, it can be represented by the following structure (take LInux as an example)

struct task_struct{
    
    
    //进程的所有属性
    ....
    //进程对应的代码和数据的地址
    ....
    //下一个进程的地址
    struct task_struct* next;
}

For a specific description of the task_struct structure of the process control block PCB of the process, please refer to this blog Process control block PCB-------task_struct structure in Linux

Knowing this, we can also feel the concept of process more specifically:

Process = kernel data structure related to the process + code and data on the corresponding memory of the process

Summarize:

  1. What is a PCB

    It is a struct structure, which contains most of the attributes of the process

  2. Why use PCB in process management?

    To manage the process

3. The specific operation of the process

Knowing what a process is, let's take a look at the specific operation of a process under LInux.

3.1 The relationship between process attributes and file attributes

We know that on disk文件 = 内容 + 属性。

Which file is loaded into the memory from the disk, is the content of the file loaded or the attributes?

Answer: content

We know that each process has its own set of attributes called PCB, so is the attribute of the file on the disk related to the attribute of the process?

Answer: Yes, but not much

For example, in a disk, the attributes of a file generally include the file's permissions, owner, group, when it was created and modified, and what the file name is, etc., which determine whether the user has permission to operate the file and some basic information about the file ,

The PCB structure is a kind of kernel data structure, which is created and maintained dynamically by the operating system, and has nothing to do with the attributes in the disk. That is to say, the attributes in the process are created and maintained by the operating system itself based on the obtained code and data, which is different from the file attributes in the disk.

For example, the pid (process id, process id number) of the process is not available in the file, and if the operating system wants to know the file name or file size corresponding to a certain process, it can also know it, so we say They are related, but not much.

3.2 View process

Preparation

  • Use vim , gcc , makefile , if you are not familiar with it, just click on the name to view the corresponding blog

We create a new file in Linux myprocess.cand write the following code

insert image description here

Code function: We know that when the program runs in the hardware in process mode, after executing the program, it falls into an infinite loop and keeps outputting "hello process". At this time, the process corresponding to the program must exist and we can check it.

So we can draw the following conclusions:

in conclusion:

  1. In the Windows system, by double-clicking or opening a program, the program will run as a process.
  2. In the Linux system, ./after executing the executable program ( ), it is the form of turning the program into a process

The following figure shows the makefile code:

insert image description here

The above figure shows that the corresponding executable file generated by the myprocess.c file we wrote the code is called MyProcess, and the following will directly use the MyProcess executable file to execute and run the program

Tips

Because the existence of the process can only be guaranteed when the program is running, so we need to observe the corresponding process information when the program is running. At this time, we need to use multiple pages to cooperate. I use Xshell. If you and I use The same can use the following method to create multiple pages.

  1. Right click on the current session box and find Copy Session

    insert image description here

  2. The copied session is the same as the current session, in the same file, sharing the same operating system, and changes on one side affect the other

  3. Drag the copied process to a suitable location

    insert image description here

    • To copy a session, a new bash process is created, and it shares the same operating system with the original bash process (will be discussed later)
  4. This method can be used to generate and copy multiple sessions, run multiple programs at a time, and view the process

Use the command to find the corresponding process:

Use ps axj | head -1 && ps axj | grep 可执行文件名 | grep -v grepthe command to find.

Note: Use this command according to specific needs

insert image description here

explain:

Command: ps axjTo view all currently executing processes,

insert image description here

|: The pipe symbol, which means that the result of the previous execution is passed to the back, and the next instruction is executed

Instruction: head -1Take all ps axjthe first row displayed, that is, the above animation, and finally use the row selected by the white part, and give the name of each column. For the specific content of the instruction, please refer to this blog – common instructions

insert image description here

It mainly includes the following names:

 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND

Here are two that will be used in this article, and the rest will not be studied

  • PID: process ID, the number of the process
  • PPID: parent process number, that is, the parent process number corresponding to PID

&&: Logical OR, combine the results of the instructions at both ends and output to the screen

Command: grep 可执行文件名, the grep command is used to find the corresponding string and output the line where the string is located. For the specific content of the command, you can view this blog – common commands

insert image description here

In the animation above, the final display results are as follows

insert image description here

The second line printed out is what we don't need, we can use the last command to keep it from printing

Command: grep -v grep, -v option means not to display the line where the following string (grep) is located. For specific content, you can view this blog – common commands

insert image description here

View process in file

/procView in the file (this method is relatively unpopular )

insert image description here

  • The number corresponds to the PID of the corresponding process

Example:

We first execute the MyProcess file, use the above command to find the corresponding process, find the PID of the process, and /procview the corresponding file according to the PID, as follows:

insert image description here

The details are as follows (the above process was accidentally closed, and one was reopened)

insert image description here

When we view the specific information of the file corresponding to the PID, we will see the following:

insert image description here

Indicates the location of the program corresponding to the process.

3.3 Close the process

ctrl+c

As shown in the figure below, when we open a program and it is running, we can use the hotkey ctrl + cto end the program (end the process)

insert image description here

kill

We can also use the kill command plus the -9 option to end the target process with the root process PID, as follows

kill -9 PID

insert image description here

3.4 Some characteristics of the process

  1. A program can form multiple processes at the same time

    As follows, we execute the same executable file twice and look at the process

    insert image description here

  2. As the order of process creation is different, its PID changes

    As follows, we open and close the same executable file and check the change of the corresponding process PID

    insert image description here

    The red box in the above figure indicates the change of the process PID after each run of the program. It changes, because the program needs to be re-executed in the form of the process every time, and it is impossible to use the same PID every time.

  3. A program is created multiple times, although the PID is different each time, but the PPID (parent process) is the same

    As shown above, although the PID of each process is different, their parent process PPID is the same, and the parent process in Linux is bash

  4. The parent process of a process in Linux is bash (there are special cases, but it is a special case after all)

    Every time we turn on the computer, the operating system will be started, so a command line interpreter is needed to help us interpret commands. Therefore, the bash command line interpreter is also a process with an independent PID. All programs after the command line is started will eventually become a process, and the parent process corresponding to this process is bash.

    As mentioned in the blog shell and its operating principle , in order to protect itself, the shell creates sub-processes to execute user instructions.

    As follows, we /proccheck the corresponding information in the file according to the PID of the parent process in the above figure, whether it is bash

    insert image description here

    Note: Our instructions (such as: ls, touch, etc.) are written in C language, and ./will also form a process later.

  5. After a process ends, its corresponding process PID file will also be deleted and cannot be opened

    1. After the process ends, we try to open the corresponding process PID file

      insert image description here

      As shown above, the file cannot be opened, and /procthe corresponding file cannot be found in the file

    2. Open the process PID file first, then end the process, and observe the changes of the file after the end

      insert image description here

  6. We know that bash is also a process, which is the parent process of all processes. When we close the process, the corresponding dialog cannot be used and needs to be recreated. As follows, we use the command to close the process (different systems show different situations killbut same meaning)

    insert image description here

    In the figure above, we created two sessions. Although they share the same Linux operating system, the corresponding bash processes are different. We are looking at the process corresponding to the right session, so its parent process is the right bash process. After closing, it will not affect the left Gotta talk.

3.5 Obtain process identifier through system call

System call: The computer is composed of user layer, operating system, driver, hardware, etc. Each of them uses an interface to connect to the adjacent layer, and the interface provided by the operating system for upper layer development is collectively called system call. (See this blog for details – operating system )

When we want to display its corresponding process PID and PPID when the program is running, we need to use system calls.

We use the system call interface getpid()and getppid()to get the process PID and PPID, here we can manview these two functions through the manual (press q to exit)

man getpid

insert image description here

Note: The return value of the function pid_tcan be treated as an int. When printing, use%d

We are myprocess.cmodifying the code in the file as follows

insert image description here

After running the program, check whether the corresponding PID and PPID values ​​are correct

insert image description here

3.6 Create a child process through a system call

Through the above study, we know that the Linux operating system creates child processes through the parent process bash, so how does the parent process create child processes? How to create a new child process directly at the code level? This requires learning the third system call interface fork().

manYou can view the information of fork by

man fork

insert image description here

Let's fork()see the result using

First, let's myprocess.cmodify the program as follows:

insert image description here

The result after running is as follows:

insert image description here

As shown in the figure above, two processes are formed, the PID of the parent process is 10130, and the PID of the child process is 10131 (23286 is the bash process PID of the session)

At the beginning, the string "AAA" of the parent process is printed, the string "BBB" of the parent process is printed after fork, and the string "BBB" of the child process is finally printed.

insert image description here

This is because after fork, the execution flow will become two execution flows, two processes, the two processes share the code after fork, and it is not necessary to run the parent process first and then run the child process, according to the server Different, different results.

The entire call chain is: when we run the program, a child process is generated, and the program creates its own child process through fork.

Summary: For parent-child processes, the child process is created on the basis of the parent process.

Then the question arises, how does the operating system know which is the parent process and which is the child process?

Answer: Through the return value of fork.

Let's manlook at fork using the command and find out what it says about the return value, as follows:

insert image description here

If the process is successfully created, return the PID of the child process to the parent process, return 0 to the child process, and return -1 to the parent process if it fails, and display errno if no child process is created

Note: The return value type ispid_t

We understand the above content through the modified code below and the modified running results

insert image description here
insert image description here

As shown in the figure above, fork returns the PID 17881 of the child process to the parent process, and returns 0 to the child process, so as to distinguish the parent and child processes.

Presumably everyone here will have questions, why does a variable receive two return values? Why are there different values ​​for the same address? We will talk about the first question below, and the second question is designed into the process address space. This article is the concept of a process, and the second question will be talked about in a later blog.

After understanding these knowledge and phenomena, let's take a look at forkthe way we generally use, and modify the code as follows:

The result of the operation is as follows:

insert image description here

It is still a variable with two values, causing the program to be executed ifand executed again else if. This is also a question of how fork returns two values?

Next, we will answer this question step by step:

  1. What did fork do?

    Answer: A child process is created.

    we know a进程 = 内核关于进程的相关数据结构 + 进程对应的磁盘代码和数据

    Like the above program, after running, it creates a process with its own corresponding code and data. After fork, it creates its own child process, and the child process still corresponds to the code and data of the parent process.

    insert image description here

  2. After fork creates a child process, does the parent and child process affect each other?

    Here is a concept first: when a process is running, it is independent!

    For example, we open QQ, WeChat, iQiyi and other software under the Windows system, and closing QQ has no effect on other software.

    The above program is a parent-child process, and it is the same when running. We run the above program, and then close the parent process to see if the child process can run.

    insert image description here

    As shown in the figure above, after the parent process is closed, the child process can still run normally, where the PPID of the child process becomes 1, and the child process becomes 孤儿进程(the orphan process is here for understanding, you need to use kill to close the child process)

  3. How does fork see code and data?

    First of all, the code we write will not be changed after running. For example, if a line uses if, it will be changed to while after running, which will not.

    Secondly, after running the program, when modifying the data, the data is copied to another place, and the data is modified in another place.

    Summarize:

    Code: The code is read-only and cannot be changed

    Data: When an execution flow tries to modify data, the operating system (OS) will automatically divide the current process for us写时拷贝

    Therefore, after the fork, two processes share the same code and data. The code is read-only and cannot be modified, while the data is copied in a realistic way. The two processes each have a private copy, and the two processes do not interfere with each other. (The specific content is about the process address space)

  4. How does fork understand the two return value issues?

    First of all, we need to understand that fork is a system call interface, that is, a function, and before a function returns at the end, its main function has been executed. Among them, it is the parent process that calls fork, and the parent process must execute fork once and return returns a value.

    Looking at fork, its function is to create a child process PCB, make it run, and finally return the return value.

    Now that fork has been executed to return, its main function has been executed, that is, the child process has been created at this time, and the code will be executed after the child process is created, that is to say, after fork is executed to return, both the parent process and the child process will Executed once, resulting in two values ​​being returned.

    When these two values ​​are received, copy-on-write occurs again, and they are stored in different spaces, although they seem to be stored in the same variable and the same space.

Guess you like

Origin blog.csdn.net/m0_52094687/article/details/128884591