[Linux] Process Concept (Part 1)

1. Von Neumann system

Our common computers, such as laptops. Or less common computers, such as servers, mostly follow the von Neumann system.

Insert image description here

The order from input to output is in numerical order above.

About the von Neumann system:

  • The memory here refers to the memory
  • Regardless of the cache, the CPU here can and can only read and write memory, and cannot access peripherals (input or output devices)
  • Peripherals (input or output devices) need to input or output data, and they can only write to or read from memory.
  • In a word, all devices can only interact directly with memory

Why are there the above regulations? First of all, the CPUCPU is a device that processes data very quickly, then the memory, and then various peripherals (disks); in data At the computer level, the CPU generally does not interact directly with peripherals, because the speed of accessing the disk is very slow, which will cause the overall machine efficiency to be too low. ; Therefore, because of the characteristics of CPU, the cost of CPU is very high , with CPU as the center, the closer it is to CPU, the better the storage efficiency The higher it is, the more expensive it is! Therefore, computers based on von Neumann architecture are essentially efficient computers made with relatively low costs!

2. Operating system

Any computer system contains a basic collection of programs called theoperating system (OS). What is an operating system? First of all, the operating system is a piece of software, and it is software that manages software and hardware resources.

We take a macro view of the interaction fromuser to underlying hardware as shown below:

Insert image description here

Briefly describe the above process: First, the instructions entered by our users will be shell (in Linux will call our instructions The system calls the interface, and then the operating system will perform the corresponding operations on our corresponding instructions. shell shell) is received as a "medium", and thenbash

So why do we need an operating system? The reason is that the operating system is a good means of managing software and hardware resources, providing users with a good, stable, efficient, and safe usage environment.

Therefore, since the operating system can manage software and hardware resources, there must be a large number of data objects and data structures inside it; because it needs to manage a large amount of data, it can structure the data and manage the structured data. Here the operating system can be compared to a manager, while the driver and underlying hardware are both managed.

So how does the operating system manage them? First of all, to manage a transaction, we must first model it, that is, first describe, and then organize ; first describe, we must first model it The attributes are analyzed, and the analysis of important attributes represents this transaction; therefore, if the operating system wants to manage a resource, it must first describe it, that is, analyze its attributes, model it, organize it, and manage it after organization. . Assuming that the operating system wants to manage a certain resource, it can be described as a linked list, and then it can be seen as the operating system managing the resources of this linked list.

Summary, computer manages hardware/operating system manages resources

  1. Describe it with struct structure
  2. Organize them using linked lists or other efficient data structures
  • System call and library function concepts

From a development perspective, the operating system will appear as a whole to the outside world, but it will expose some of its interfaces for upper-layer development. This part of the interface provided by the operating system is called a system call.

In terms of use, the functions of system calls are relatively basic and the requirements for users are relatively high. Therefore, thoughtful developers can appropriately encapsulate some system calls to form libraries. With libraries, it is very beneficial for higher-level users or Developers conduct secondary development.

3. Process

1. Basic concepts

Process concept: The so-called process is simply an executable program on the disk that is loaded and copied into the memory to form a process, which is the program being executed.

2. Describe the process - PCB

The process information is placed in a data structure calledProcess Control Block, which can be understood as a collection of process attributes, called < /span> operating system.task_struct Specifically:PCBLinux, PCB (process control block)

The structure describing the process inLinux is calledtask_struct;task_struct is a data structure of the Linux kernel. It will Is loaded intoRAM (memory) and contains process information.

3. Organizational process

So how does the operating system manage and organize processes? First, the executable program on the disk is loaded and copied into the memory. Because the operating system is loaded into the memory first, the operating system will create the corresponding executable program.PCB, where thisPCB contains all the attributes of this process;

For each process, the operating system will create a correspondingPCB, and then process these PCB is organized, and the operating system organizes them by connecting them like a linked list for management, then the management of processes< a i=6> is modeled as the management of PCB, that is, the addition, deletion, checking and modification of the linked list; as shown in the figure below:

Insert image description here

Then what we usually hear about processes queuing up is actually PCB queuing up in a certain queue. For example, there is a running Queue:

Insert image description here

So to be precise, process = executable program + kernel data structure (PCB); PCB facilitates the operating system to manage the process.

In addition, a process is not only in a specified queue or linked list, it may appear in multiple queues or linked lists.

Actually, the task_struct structure is defined in Linux When , first define a double-linked list structure:

			struct dlist
			{
				struct dlist* next;
				struct dlist* prev;
			}

Use the structure pointer of this double linked list to directly point to an attribute queue or linked list of the nexttask_struct, as shown below:

Insert image description here

So the structure of this task_struct should be designed like this:

			struct task_struct
			{
				dlist list;	// 系统所有进程所在的队列
				dlist queue;	// 同时这个进程还可以在队列中
				// 也可以在其它各种结构中
			}

4. Check the progress

(1) View through the system call interface

The command to view all processes is:ps axj.

For example, we first write a program and then run it:

Insert image description here

At this time we need to find this process and view this process, but using ps axj directly has too much information. We need to filter some information, such as:

		ps axj | head -1   # 只显示第一行的数据,即头标
		ps axj | grep xxx  # 查找名为 xxx 的字符串

So we can put the above two sentences together to find our corresponding process and it is very simple. The command is:ps axj | head -1 && ps axj | grep mytest:

Insert image description here

Among them, we look at the process of mytest. Why is there grep What about the process?

Insert image description here

Because when we executed grep, it also became a process, so we did not filter it out when we checked the filter. .

We can see that there arePID and PPID header, which are the process id and parent process id of this process. a>.

Insert image description here

We can also let the process print its ownpid / ppid. The system interface used isgetpid() / getppid, the header files they contain are as follows:

Insert image description here

Return is return pid :

Insert image description here

We can write a program to verify it:

Insert image description here

When I use it ps Command view part pid

Insert image description here

As shown in the two figures above, both methods can view the process's pid / ppid.

When we want to terminate a process, we can use the command:kill -9 +pid.

But we can execute and end a process multiple times and observe its pid and ppid< /span>, as shown below:

Insert image description here

We can observe that the pid of this process is different every time, but its ppid is the same, why is this? We can check its ppid:

Insert image description here

We can see that this is actually bash, that is, Linux< The command line interpreter in a i=4>, so we come to the conclusion that the processes started by our command line are all sub-processes of bash.

(2) View through the /proc system folder

The second way to view process information is to view it through the system folder, but we often use the first way to view it. You need to understand this.

We can view the system folder through the command ls /proc/:

Insert image description here

is followed by the pid we need to find, such as:

Insert image description here

plus -dl option is to view only the folder and its properties of this process.

We only add the-l option, and we can see many information attributes of this process, two of which we need to understand, that iscwd and exe:

Insert image description here

Among them,exe is the executable program corresponding to the process. We can see that it is followed by the path where it is located, so It shows that a process can find its own executable program;

Secondly, cwd is the current working directory. Assume that our current process has code related to file operations and needs to create a file, then This file will be created in the current working directory; by default, the path where the process is started is the current path. The current working directory can be modified.

5. Create a process through system call - fork

(1) First acquaintance fork

fork is a system call that creates a process through code.

Previous use by me man fork First step to understand fork

Insert image description here

We can see that fork() creates a child process. Let's continue to look at its return value:

Insert image description here

We can see that if successful, the pid of the child process will be returned to the parent process, and 0 will be returned to the child process.

Let’s take a brief look at the usage of fork(); we will slightly modify the original code:

		#include <stdio.h>
		#include <unistd.h>
		#include <sys/types.h>
		
		int main()
		{
		    printf("我是一个父进程, 我的pid:%d, ppid:%d\n", getpid(), getppid());
		
		    pid_t id = fork();
		
		    // fork之后,用if进行分流
		    if(id < 0) return 1;
		    
		    // child
		    else if(id == 0)
		    {
		        while(1)
		        {
		            printf("我是一个进程, pid:%d, ppid:%d, ret:%d, 我正在下载任务\n", getpid(), getppid(), id);
		            sleep(1);
		        }
		    }
		    
		    // parent
		    else
		    {
		        while(1)
		        {
		            printf("我是一个进程, pid:%d, ppid:%d, ret:%d, 我正在播放任务\n", getpid(), getppid(), id);
		            sleep(1);
		            printf("\n");
		        }
		    }
		}

Let’s observe the results of running the code:

Insert image description here

I can see it, my father's progress pid is 15264ppid28553,也这 < a i=9>bash;Return is the child's progress pid

progressive pid is 15265;;Reply 0.15264 ppid

Combined with the current situation, we draw a small conclusion that only the parent process executes the code before fork() fork() The code after that will be executed by both the parent and child processes.

So why do we use fork() to create a child process? The reason is that we want the child process to assist the parent process in completing some tasks that cannot be solved by a single thread. For example, in the above code, the parent process performs the playback task and the child process performs the download task.

(2) fork principle

After we have seen the usage offork(), we can’t help but have many questions, such asfork () Why are there two return values? Why do the same variable have different values? etc.

  • What is fork() doing?
    First, let’s understand what fork() is doing, fork( ) creates a child process. There will be one more child process in the system. os will use the parent process as a template to create a child process. The process creates a PCB, and the parent process will share code and data with the child process, so fork()fork(), the parent and child processes will execute the same code.

  • After fork(), which of the parent and child processes will run first?
    After the child process is created, it is just the beginning. Next, other processes in the system, the parent process, and the child process will be scheduled for execution; when the parent and child processes PCB are all created and queued in the run queue, which process's PCB is selected for scheduling first, Which process will run first; and we cannot determine which process is scheduled first, it is determined by the operating system.

  • Whyfork()’s two return values ​​will return to the parent process the child process’s pid, return 0 to the child process?
    Because the pid of the child process is unique, it is to facilitate the parent process to differentiate and manage different child processes; The reason why 0 is returned to the child process is that the parent process of the child process is also unique. They all belong to the same parent process. For the child process, it only needs to know whether it is successful.

  • Why fork() has two return values?
    We first need to know that if a function executes to return, its core work has been completed, butreturn After a>.return; When the child process is scheduled, it must also be executed return must be executed. is also code. Code must be shared, so when the parent process is scheduled, return, return, the code is shared. When the code is executed to fork()

  • How to understand that the same variable will have different values?
    Suppose we start a qq, start WeChat, and start a browser. These are all processes. If we terminate the qq or WeChat process, is the browser process still there? The answer is yes; if it is a parent-child process, if the parent process is terminated, is the child process still there? Or the other way around? The answer is yes! So we come to the conclusion that when processes run, they are independent, no matter what the relationship is! The independence of processes is first reflected in their respective PCB; processes will not affect each other. Even if the parent and child processes share code, the code is only read-only. Yes, it will not affect; but the data will be modified by the parent and child, so when the code is shared, each process must find a way to keep a private copy of the data. At present, we only need to know that it adopts the copy-on-write method. When the parent process Or when a child process modifies the data, the operating system will copy the data. Therefore, when the child process of fork executes to return, this The value needs to be returned to the variable id. The essence of the return is also to write, and id It is also a variable defined by the parent process. It saves data, so when returning, copy-on-write occurs, so the same variable will have different values.

4. Process status

Process status, to put it bluntly, is a field in PCB, which is PCB A variable in , assuming that this variable is status, the essence of the process status change is: 1. Changestatus variable in PCB; 2. ConnectPCB to different queues; what we are talking about All processes are only related to the process's PCB and have nothing to do with the process's code data.

The transition diagrams of the main states of the process are as follows:

Insert image description here

Below we analyze various states in detail and extend other states.

1. Running status

The running state is a state that can be scheduled at any time. The creation, ready, and execution in the above figure actually belong to the running state. As long as the process is in the run queue, the state is the running state.

2. Blocked state

In our code, we must more or less access certain resources in the system, such as various hardware devices such as disks, keyboards, and network cards.

For example,scanf(), cin, etc. appear in our code. The essence is that we read data from the keyboard. If we just don’t Input, the data on the keyboard is not ready, that is to say, the resource our process wants to access is not ready, it does not have access conditions, and the code of the process cannot continue to execute backwards. The process isBlocking state.

When a process is blocked, what we should see is:

  1. The process is stuck;
  2. PCB is not in the run queue and the status is notR(running), will not schedule our process;CPU

At this time, our calculations are rather stuck.

3. Suspended state

If a process is currently blocked, it is destined that the process cannot be scheduled when the resource it is waiting for is not ready. If at this time, the memory resources in the operating system are already seriously insufficient. Now, OS will replace the memory data to peripherals for all blocked processes; at this time, we don’t have to worry about the slowness problem, because this is Inevitable, the main thing is to let OS continue to execute. (OS will replace the data to the swap partition.) After being replaced The process where the data is located is in the suspended state at this time.

swap partition Generally, it cannot be too large or too small. If it is too small, it will not be enough; if it is too large, it will cause cpu Over-relianceswap partition.

When the process is scheduled by OS, the replaced process code and data will be reloaded.

4. The specific status of processes in Linux

A process can have several states (in the Linux kernel, processes are sometimes called tasks). The following states are defined in the kernel source code:

		static const char * const task_state_array[] = 
		{
			"R (running)", /* 0 */
			"S (sleeping)", /* 1 */
			"D (disk sleep)", /* 2 */
			"T (stopped)", /* 4 */
			"t (tracing stop)", /* 8 */
			"X (dead)", /* 16 */
			"Z (zombie)", /* 32 */
		};

(1) R running status

R running status (running): It does not mean that the process is definitely running. It indicates that the process is either running or in the run queue.

HereR running status is the running status we learned above.

We run a program and then view its process status, as follows, the program has been run:

Insert image description here

Check its running status:

Insert image description here

As shown in the picture above, we see that the column STAT means status. So why is the status here S+?

First of all, our program called the printf() function and carried out a lot of I /O operation, cpu needs to access our peripherals, the peripherals here refer to the display screen, this Our display is not necessarily in the ready state, so in most cases it will be in the S state (will be introduced below), that is, the blocking state; maybe luck When the process is good, it happens to be scheduled, and occasionally it will be found that the status of this process is R status once or twice. If we want to see the R status all the time, we can change the printf The function is blocked, that is, there are not a large number of I/O operations. The status viewed at this time is R< /span>.

Then why is there a + after S ? Here we need to learn about foreground process and background process. The so-calledforeground process is that once this process is started, our command linebash cannot continue to run, and can be terminated directlyctrl + c. This is theforeground process< /span>, we cannot use the command: the foreground process; For example, the process above is

Insert image description here

Because there can only be one foreground process, we cannot execute other instructions; we can terminate it directly ctrl + c:

Insert image description here

The foreground process will have a + number after its process status;

Of course we can also use the background process method to start it. We use the ./mytest method to start it by default. It isforeground process; when we add & after it, it is used The background process is started, that is, ./mytest &, for example:

Insert image description here

At this point we check the process status:

Insert image description here

We found that there is no+ behind S No., this is the background process.

(2) Blocked state

  • S sleeping state (sleeping) : It means that the process is waiting for the event to complete (sleep here is sometimes called interruptible sleep (light sleep)). Light sleep can be terminated in response to external signals. Linux When there is really no other way, it will save resources by terminating the process. At this time SSLinux Processes in the a> state can be terminated.

  • D disk sleep state (Disk sleep) It is also called uninterruptible sleep state (deep sleep) (uninterruptible sleep). The process in this state usually waits for the end of IO. This state is specially designed for disks, and the current process state cannot be terminated, and the operating system is not qualified! Because accessing the disk is very slow, you have to wait for the disk to respond. But we generally cannot see this state, because when we see this state, it means that the computer is about to die.

  • T stopped state (stopped): You can stop the (T) process by sending the SIGSTOP signal to the process. The suspended process can be allowed to continue running by sending the SIGCONT signal. But why the pause? When a process accesses software resources, the process may not be allowed to access temporarily, so the process is set to STOP;

  • t stop state (tracing stop):debug When tracing the program, encounter Breakpoint, the process is paused.

The S, D, T, t states above can all be called the blocking states we have learned.

(3) Death state

  • XDead status (dead): This status is just a return status, we will not see this status in the task list.

The reason why we create a process is that we need to complete a certain task, but how do we know how well the process completes the task? When a process exits, it must have some exit information to indicate how well it has completed the task. This information is read by the parent process of the process; this information is provided by OS of the exit process tells you the reason why the child process exited. PCB or the parent process The exit information in the OS is released immediately because it needs to be read by PCB of the currently exiting process, which allows the process's code and data space to be released, but does not allow the process's PCB is written to the

If a process exits but has not been read by the parent process or OS, OS The PCB structure of the exiting process must be maintained. At this time, the process does not count as exiting. At this time, the process is in Z zombie state.

  • Z(zombie) - zombie process
  1. Zombies are a relatively special state. A zombie process is generated when the process exits and the parent process (using the wait() system call, discussed later) does not read the return code of the child process exit.
  2. The zombie process remains in the process table in a terminated state and is waiting for the parent process to read the exit status code.
  3. Therefore, as long as the child process exits, the parent process is still running, but the parent process does not read the child process status, and the child process enters the Z state.

If a process enters the Z state, but the parent process does not recycle it, its The PCB will always exist, which may causememory leak!

After it is read by the parent process or OS, PCB The state is first changed to the X state before being completely released.

Let's simulate creating a zombie process:

		  1 #include <stdio.h>
		  2 #include <stdlib.h>
		  3 #include <unistd.h>
		  4 
		  5 int main()
		  6 {
		  7     pid_t id = fork();
		  8     if(id < 0) return 1;
		  9     else if(id == 0) // 子进程
		 10     {
		 11        int cnt = 3;
		 12        while(cnt--)
		 13        {
		 14            printf("i am a child, run times: %d\n", cnt);
		 15            sleep(1);
		 16        }
		 17        printf("i am a zombie now\n");
		 18        exit(-1);                                                                 
		 19     }
		 20     else // 父进程 
		 21     {
		 22         while(1)
		 23         {
		 24            printf("i am a father, running any times!\n");                        
		 25            sleep(1);
		 26         }
		 27     }
		 28 
		 29     return 0;
		 30 }

Such as the above code, we usefork to split the two processes. The child process exits after executing 3 times, and the parent process keeps running. Exit, at this time we check the status of the two processes:

Insert image description here

At this time we see that the child process has enteredZ zombie state.

Now that all the process states worthy of attention have been explained, let’s get to know another kind of process: Orphan process.

  • Orphan process
  1. If the parent process exits early, then the child process exits later and enters Z. What should be done?
  2. The parent process exits first, and the child process is called "orphan process".
  3. The orphan process is adopted byProcess No. 1 init process or systemd process, of course a>init process or systemd process recycling.

Let's simulate the code of the orphan process first. We only need to replace the code of the parent process and the child process in the code of the zombie process above:

		  1 #include <stdio.h>
		  2 #include <stdlib.h>
		  3 #include <unistd.h>
		  4 
		  5 int main()
		  6 {
		  7     pid_t id = fork();
		  8     if(id < 0) return 1;
		  9     else if(id > 0) // 父进程
		 10     {
		 11        int cnt = 3;
		 12        while(cnt--)
		 13        {
		 14            printf("i am a father, run times: %d\n", cnt);
		 15            sleep(1);
		 16        }
		 17        exit(-1);
		 18     }                                                                                                       
		 19     else // 子进程 
		 20     {
		 21         while(1)
		 22         {
		 23            printf("i am a child, running any times!\n");
		 24            sleep(1);
		 25         }
		 26     }
		 27 
		 28     return 0;
		 29 }

After running, we check the status of the two processes:

Insert image description here

Let’s check again after a while:

Insert image description here

Only the child process is left at this time, and its ppid becomes 1 process, which is the operating system. init / system is actually the 1, 1, that is, it was adopted by process number

But before the parent process exits, it does not change to theZ state, but disappears directly. Why is this? The reason is that the original parent process also has its own parent process! That is bash. When the parent process exits, it will be directly bash Recycle and release; the remaining child process becomes an orphan process, and it will be adopted by process No. 1.

5. Process priority

(1) PRI & NI

We already know that processes may queue in a queue, and the essence of queuing is to confirm priority. So what is priority? It is the order in which processes are scheduled to getcpu. The essential reason for the priority is that cpu is insufficient in resources. An operating system generally only has one cpu , and thiscpu has too many processes that need to be scheduled, so the term priority appears.

The way to confirm the priority is actually to modify one of PCBint< a i=4> field, where this field is PRI. The smaller the value, the greater the priority. We can check the priority of the process through the command ps -la:

Insert image description here

Among themPRI this column is the priority of the process;Linux The priority range of the process is 60~99, among which the default process priority is 80< /span>.

ButLinux supports dynamic priority adjustment,Linux There is a nice value in the processPCB, and the process priority is Corrected data, where: PRI(new) = PRI(old) + nice, where PRI(old)< /span>! 80 Each modification starts from

We can use the commandtop to modify the nice value of an existing process, but due to permission issues some We cannot modify the process, we can use sudo top to modify it. Modification steps: first use top to enter the modification mode, then press r and enter the process pid, press Enter and enter the nice value to complete the modification, for example:

Insert image description here

We try to modify the priority of a process:

Insert image description here

Notenice The minimum value for adjustment is-20 , the excess is regarded as -20; the maximum value is 19, and the excess Partially unified as 19. Why is it stipulated in this way? Because OS will schedule each process in a more balanced manner when scheduling. If the priority range is not limited, it will easily lead to a higher priority. A low-level process cannot get cpu resources for a long time. This situation is called process starvation< a i=16>.

(2) Other concepts

  • Competition: There are many system processes, but CPU resources are only a small amount, or even 1, so there is competition between processes. . In order to complete tasks efficiently and compete for related resources more reasonably, priority is given.
  • Independence: When multiple processes run, they need to enjoy various resources exclusively and do not interfere with each other during the running of multiple processes.
  • Parallel: Multiple processes are run simultaneously on multiple CPUs, which is called parallelism.
  • Concurrency: Multiple processes use one CPU using process switching allows multiple processes to advance within a period of time, which is called concurrency.

Each process does not occupy CPU and runs all the time. Every once in a while, it will be automatically removed from Split off the CPU, this period of time is called time slice; but LinuxThere is more than just time slice in the kernel, because it would be too dull to have only time slice, assuming that each process runs1ms will come down, then each one will look like this on average. Therefore, the Linux kernel supports cpu resource preemption between processes, based on Time-sliced ​​round-robin preemptive kernel! To put it simply, when a process with priority level 80 runs 0.5ms, at this time a process with a higher priority comes, and the process with priority 80 will be forcibly peeled off, and Put the one with higher priority.

(3) Switching between processes

In ourprogram/process, how do we know where we are currently running? Or where did we run last time? How to jump between processes? Suppose we have a program with 10000 lines of code, which runs 1000cpu know where I ran it last time? ?

Incpu, there are many registers, one of which is calledeip (pc pointer) program counter, which will save the address of the next instruction of the currently running process. In other words, the address of this instruction can know which statement we should execute next, so You can jump betweenfunctions/processes.

When our process is running, these registers will be used. Our process will generate various data, and these data will be temporarily saved in the registers. If we have multiple processes, the temporary data formed in the CPU registers of each process should be different. This is called the hardware context of the process. Therefore, CPU has only one set of register hardware, and 10 processes There should be 10 sets of context data, because the register is not equal to the contents of the register!

So we have to understand that when switching between processescpu we must first savecpu< /span>, the register The original content will not be cleared, but will be used directly by the next process, which will overwrite the original data. PCB are saved to cpu register The contents are saved into memory! When the contents of the register in CPU of the corresponding process, which is essentially the PCBThe contents of the register are then placed in the switching process, otherwise they will affect each other. So where are the contents of the register stored? The answer is to save it in the

Guess you like

Origin blog.csdn.net/YoungMLet/article/details/133140013