Understanding of processes and related concepts under Linux

Table of contents

1. Process concept

2. Describe the process PCB

3. Check the process

3.1 View through the system catalog

3.2 View by ps command

4. Process status

Running status R

sleep state

disk hibernation state D

Paused state T

Zombie state Z

state of deathX

blocking (extended)

5. Zombie process and orphan process

5.1 Zombie process

5.1.1 The concept of zombie process

5.1.2 Harm of Zombie Process

5.2 Orphan Processes

6. Process address space

6.1 Verification of process address space

6.2 Perceive process address space

6.3 Detailed cognition

7. Process priority

7.1 Priority concept

7.2 Why priority exists

7.3 PRI and NI

7.4 Modify the process nice value

7.4.1 top command

 7.4.2 renice command

Eight, environment variables

8.1 Concept

8.2 Common environment variables

8.3 Environment variable related commands

8.4 Organization of environment variables

8.5 Methods of obtaining environment variables

8.5.1 main function parameters

8.5.2 Third-party variable environ

8.5.3 getenv function


1. Process concept

Textbook concept : an execution instance of a program, an executing program, etc.
Kernel point of view : an entity that allocates system resources (CPU time, memory)

When the code is compiled and linked, an executable program will be generated. This executable program is essentially a file and stored on the disk. When the executable program is run, the program is essentially loaded into the memory, because only after being loaded into the memory, the CPU can execute the statement line by line, and once the program is loaded into the memory, we This program should not be called a program anymore, it should be called a process in the strict sense .

Moreover, processes and programs do not necessarily correspond to each other. A program can run multiple times at the same time, and there are multiple processes. 

Competitiveness: There are many system processes, but only a small amount of CPU resources, or even one, so the processes are competitive. In order to complete tasks efficiently and compete for related resources more reasonably, they have priorities

Independence: Multi-process operation requires exclusive use of various resources, and does not interfere with each other during multi-process operation

Parallelism: Multiple processes run simultaneously under multiple CPUs, which is called parallelism

Concurrency: Multiple processes use process switching under one CPU to allow multiple processes to advance within a period of time, which is called concurrency

2. Describe the process PCB

There are a large number of processes on our computer, and the operating system is required to manage them. How to manage it? Describe first, organize later

The operating system describes each process, forms process control blocks (PCBs, essentially a structure), and organizes these PCBs in the form of a doubly linked list.

PCB is actually a general term for process control blocks. The process control block in Linux is task_struct, which mainly contains the following information:

Identifier : The unique identifier describing this process is used to distinguish other processes.
Status : task status, exit code, exit signal, etc.
Priority : Priority relative to other processes.
Program Counter (pc) : The address of the next instruction to be executed in the program.
Memory pointers : including pointers to program code and process-related data, as well as pointers to memory blocks shared with other processes.
Context data : The data in the registers of the processor when the process is executed.
I/O status information : including displayed I/O requests, I/O devices assigned to the process, and a list of files used by the process.
Billing information : May include sum of processor time, sum of clocks used, time limits, billing account number, etc.
Additional information : ...

3. Check the process

3.1 View through the system catalog

There is a system directory called proc under the root directory, which contains a lot of process information. The directory names of some of the subdirectories are numbers, these numbers are actually the PID of a certain process, and various information of the corresponding process is recorded in the corresponding folder. If you want to view the process information of the process with PID 1, you can view the folder named 1.

3.2 View by ps command

For the specific use of the ps command, you can use the man 1 ps command to view the documentation

 The following situation occurs when using the ps -l command under the Linux operating system.

  • UID: represents the identity of the executor.
  • PID: represents the code name of this process.
  • PPID: Represents which process this process is derived from, that is, the code name of the parent process.
  • PRI: Represents the priority that this process can be executed, the smaller the value, the earlier it will be executed.
  • NI: represents the nice value of this process.

4. Process status

 The source code of the Linux operating system has the following definition for the process state:

static const char *task_state_array[] = {
	"R (running)",       /*  0*/
    "S (sleeping)",      /*  1*/
    "D (disk sleep)",    /*  2*/
    "T (stopped)",       /*  4*/
    "T (tracing stop)",  /*  8*/
    "Z (zombie)",        /* 16*/
    "X (dead)"           /* 32*/
};

Running status R

All running processes (i.e. processes that can be scheduled) are placed in the run queue. When the operating system needs to switch the process to run, it directly selects the process to run in the run queue. A process is running (running), does not mean that the process must be running. The running state indicates that a process is either running or in the run queue. That is, multiple processes in the R state can exist at the same time

sleep state

means that the process is waiting for an event to complete (this sleep state can also be called interruptible sleep )

For example, when the process loops to output to the screen, because the processing speed of the CPU is extremely fast, but the speed of the display is relatively slow, the process needs to wait for the resource of the display (the CPU will process other processes at this time). At this time, the process will constantly switch between the running state and the sleeping state, but due to the high speed of the CPU, there is a high probability that we will see the sleeping state when observing

#include <stdio.h>
int main()
{
    while(1){
         printf("handsome boy!\n");                                                                                                                                                                    
    }
    return 0;
}       

There is a + sign when displaying the status, indicating that the process is a foreground process, and if not, it is a background process.

The process in this sleep state can be killed, such as using the kill command to send a signal

disk hibernation state D

A process is in the disk sleep state, which means that the process will not be killed, not even the operating system, only the process can be killed when it wakes up automatically. It can also be called uninterruptible sleep state (uninterruptible sleep), and the process in this state usually waits for the end of IO.

For example, if a process requires writing to the disk, the process is in a deep sleep state during the writing to the disk and cannot be killed. Because the process needs to wait for the reply from the disk (whether the writing is successful) to make a corresponding response.

Use the dd command to simulate disk hibernation

Paused state T

In Linux, we can send the process into the suspended state by sending the SIGSTOP signal, and the process in the suspended state can continue to run by sending the SIGCONT signal. 

Zombie state Z

When a process is about to exit, at the system level, the resources that the process has applied for are not released immediately, but are temporarily stored for a period of time for the operating system or its parent process to read. If it is read, the relevant data will not be released. If a process is waiting for its exit information to be read, then we call the process in a zombie state. (The exit information of the process is stored in the task_struct of the process)

The existence of the zombie state is necessary, because the purpose of the process is to complete a certain task, then when the task is completed, the caller should know the completion of the task, so there must be a zombie state, so that the caller can know the task The completion status, in order to carry out the corresponding follow-up operations.

state of deathX

The death state is just a return state. When the exit information of a process is read, the resources requested by the process will be released immediately, and the process will no longer exist, so it is almost impossible to see the death state in the task list. .

blocking (extended)

When a process is running, it is scheduled by the CPU. That is, the process needs to use CPU resources when scheduling. Each CPU has a running waiting queue (runqueue), and the CPU obtains the process from the queue for scheduling when it is running.

The processes in the running waiting queue are essentially waiting for CPU resources. In fact, not only waiting for CPU resources, but also waiting for other resources, such as lock resources, disk resources, network card resources, etc., have their own corresponding resource waiting queue

Corresponding to the state in Linux, blocking is sleep state S and disk sleep state D

5. Zombie process and orphan process

5.1 Zombie process

5.1.1 The concept of zombie process

If a process is waiting for its exit information to be read, then we say that the process is in a zombie state. A process in a zombie state is a zombie process.

In the following code, the child process created by the fork function will exit after printing information 5 times, while the parent process will always print information. That is, the child process exits, the parent process is still running, but the parent process does not read the exit information of the child process, then the child process enters the zombie state.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main()
{
	pid_t id = fork();
	if(id == 0)
    {
		int count = 5;
		while(count){
			printf("I am child...PID:%d, PPID:%d, count:%d\n", getpid(), getppid(), count);
			sleep(1);
			count--;
		}
		printf("child quit...\n");
		exit(1);
	}
	else if(id > 0)
    {
		while(1){
			printf("I am father...PID:%d, PPID:%d\n", getpid(), getppid());
			sleep(1);
		}
	}
	else{
        exit(-1);
	}
	return 0;
} 

5.1.2 Harm of Zombie Process

  1. If the parent process never reads the exit information of the process, the child process will always be in a zombie state.
  2. The exit information of the zombie process is stored in task_struct. If the zombie process does not exit, the PCB needs to be maintained all the time.
  3. If a parent process creates many child processes but does not recycle them, it will cause waste of resources.
  4. The resources requested by zombie processes cannot be recycled, so the more zombie processes there are, the fewer resources are actually available, and zombie processes will cause memory leaks.

5.2 Orphan Processes

If the parent process exits first, then when the child process enters the zombie state in the future, there will be no parent process to process it. At this time, the child process is called an orphan process. If the exit information of the orphan process is not processed all the time, the orphan process will always occupy resources, which will cause a memory leak. Therefore, when an orphan process occurs, the orphan process will be adopted by the No. 1 init process, and then when the orphan process enters the zombie state, it will be processed and recycled by the int process.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main()
{
     pid_t id = fork();
     if(id == 0){ //child
         int count = 5;
         while(1){
             printf("I am child...PID:%d, PPID:%d\n", getpid(), getppid(), count);
             sleep(1);
         }
     }
     else if(id > 0){ //father                                                      
         int count = 5;
         while(count){
             printf("I am father...PID:%d, PPID:%d, count:%d\n", getpid(), getppid(), count);
             sleep(1);
             count--;
         }
         printf("father quit...\n");
         exit(0);
     }
     else{ //fork error
         exit(-1);
     }
     return 0;
}

Since the orphan process will be adopted by the init1 process, it will not cause harm.

6. Process address space

6.1 Verification of process address space

 The following code can verify that the process address space is consistent with the above figure

#include <stdio.h>                                                                   
#include <stdlib.h>

int un_val;
int init_val = 100;

int main(int argc,char* argv[],char* env[])
{
     int i = 0;
     int count = 0;
     while(env[i] != NULL && count < 5){
         printf("环境变量地址: %p\n",env[i]);
         ++count;
     }
 
     for(int i = 0;i < argc; ++i){
         printf("命令行参数地址: %p\n",argv[i]);
     }
 
     char* p1 = (char*)malloc(10);
     char* p2 = (char*)malloc(10);
     char* p3 = (char*)malloc(10);
 
     printf("栈区地址: %p\n",&p3);
     printf("栈区地址: %p\n",&p2);
     printf("栈区地址: %p\n",&p1);
 
     printf("堆区地址: %p\n",p3);
     printf("堆区地址: %p\n",p2);
     printf("堆区地址: %p\n",p1);
 
     printf("未初始化数据区: %p\n",&un_val);
     printf("初始化数据区: %p\n",&init_val);

     printf("代码区: %p\n",main);
     return 0;
}

The stack area grows toward lower addresses, and the heap area grows toward higher addresses.

6.2 Perceive process address space

Through the following code we can find a problem

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdlib.h>                                                                  

int val = 100;
int main()
{
    pid_t id = fork();
    if(id < 0)//err
    {
        exit(-1);
    }
    else if(id > 0)//father
    {
        sleep(3);
        printf("PID:%d PPID:%d val:%d &val:%p\n",getpid(),getppid(),val,&val);
    }
    else//id == 0
    {
        val = 200;
        printf("PID:%d PPID:%d val:%d &val:%p\n",getpid(),getppid(),val,&val);
    }
    return 0;
}

In the code, a child process is created with the fork function, and the child process is asked to change the global variable val from 100 to 200 and print it, while the parent process sleeps for 3 seconds first, and then prints the value of the global variable. It stands to reason that the value of the global variable printed by the child process is 200, and the parent process prints the global variable after the child process changes the global variable, so it should also be 200. But the result is not, and the address of the val variable in the two processes is the same, but why are the printed results inconsistent?

If we obtained the data at the same physical address, it must be the same, but now the values ​​obtained at the same address are different, which only shows that the address we printed out is definitely not a physical address .

In fact, the addresses we print out at the language level are not physical addresses, but virtual addresses. The physical address cannot be seen by users at all, and is managed uniformly by the operating system. So even if the addresses (virtual addresses) of the global variables printed in the parent and child processes are the same, the values ​​of the global variables in the two processes are different.

6.3 Detailed cognition

The address size of the process address space is from 0x00000000 to 0xffffffff, and is divided into various areas, such as code area, heap area, stack area, etc. In the structure mm_struct , the boundary addresses of each area are recorded. Since the virtual address grows linearly from 0x00000000 to 0xffffffff, the virtual address is also called a linear address .

The upward growth of the heap and the downward growth of the stack are actually changing the boundary addresses of the heap and stack in mm_struct.

The executable program we generated is actually divided into various areas (such as initialization area, uninitialization area, etc.), and uses the same addressing method as in the Linux kernel. When the executable program is running, the operating system can load the corresponding data into the corresponding memory, which greatly improves the working efficiency of the operating system. The "partition" operation of the executable program is actually the compiler , so the optimization level of the code is actually the compiler's final say.

When each process is created, its corresponding process control block (task_struct) and process address space (mm_struct) will also be created. The operating system can find its mm_struct through the task_struct of the process, because there is a structure pointer in the task_struct that stores the address of mm_struct.

When the child process is just created, the data and code of the child process and the parent process are shared, that is, the code and data of the parent and child processes are mapped to the same space of physical memory through the page table. Only when the parent process or the child process needs to modify the data, the data of the parent process is copied in the memory, and then modified. Reflecting the independence between processes, this technology of copying only when data modification is required is called copy-on-write technology .

Why not copy the data when the child process is created?

There is a high probability that the child process will not use all the data in the parent process, and if the child process does not write data, there is no need to copy the data. Re-allocate when the data needs to be modified (delayed allocation), so that the memory space can be used efficiently

Will the code be copy-on-write?

In most cases, it is not, but this does not mean that the code cannot be copied on write. For example, when performing process replacement, copy-on-write of the code is required.

Why is there a process address space?

  1. The process address space and page table are created and managed by the OS. Any illegal access or mapping will be terminated by the operating system, which protects all legal data in the physical memory (the relevant valid data of each process and the kernel), and there will be no System-level out-of-bounds problem.
  2. With the process address space, each process sees the same space range, including the composition of the process address space and the division order of internal areas, etc., so that when we write programs, we only need to Focus on virtual addresses, not on where the data is actually stored in physical memory. It allows the process to look at the memory from a unified perspective, facilitates the compilation and loading of all executable programs in a unified way, and simplifies the design and implementation of the process itself.
  3. With the process address space, each process thinks that it is monopolizing the memory, which can better complete the independence of the process and use the memory space reasonably (open up the memory space when it actually needs to be used), and can Decouple process scheduling from memory management.

How are processes created?

The creation of a process is accompanied by the creation of its process control block (task_struct), process address space (mm_struct) and page table

7. Process priority

7.1 Priority concept

Priority is actually the order in which certain resources are obtained, and process priority is actually the order in which processes obtain CPU resource allocation, which refers to the priority of the process. Processes with high priority have priority in execution.

7.2 Why priority exists

The main reason for the existence of priority is that resources are limited, and the main reason for the existence of process priority is that CPU resources are limited. A CPU can only run one process at a time, and there can be multiple processes, so there is a need for process priority. Level, to determine the order in which processes obtain CPU resources.

7.3 PRI and NI

  • PRI represents the priority of the process, that is, the order in which the processes are executed by the CPU. The smaller the value, the higher the priority of the process.
  • NI stands for the nice value, which represents the modified value of the priority at which the process can be executed.
  • The smaller the PRI value, the faster it will be executed. When the nice value is added, the PRI will become: PRI(new) = PRI(old) + NI.
  • If the NI value is negative, the PRI of the process will be smaller, that is, its priority will be higher.
  • To adjust the process priority, under Linux, is to adjust the nice value of the process.
  • The value range of NI is -20 to 19, a total of 40 levels.

Note: In the Linux operating system, PRI(old) defaults to 80 , that is, PRI = 80 + NI.

7.4 Modify the process nice value

7.4.1 top command

The top command is equivalent to the task manager in the Windows operating system, it can dynamically monitor the resource usage of the process in the system

After using the top command and pressing the "r" key, you will be asked to enter the PID of the process whose nice value is to be adjusted; after entering the process PID and pressing Enter, you will be asked to input the adjusted nice value. If you want to exit, enter q.

 7.4.2 renice command

renice + changed nice value + PID

If you want to use the renice command to adjust the NI value to a negative value, you need root privileges

Eight, environment variables

8.1 Concept

Environment variables (environment variables) generally refer to some parameters used in the operating system to specify the operating environment of the operating system. For example, when writing C/C++ code, when each object file is linked, it never knows where the linked dynamic and static library is, but it can still be linked successfully to generate an executable program. The reason is that there are relevant environment variables to help the compiler find .

Environment variables usually have some special purpose and are usually global in the system .

8.2 Common environment variables

  • PATH:  Specify the search path of the command (the system command is essentially an executable program, but it does not need to specify the path when starting)
  • HOME:  Specify the user's main working directory (that is, the default directory where the user logs in to the Linux system)
  • SHELL:  the current Shell, its value is usually /bin/bash

8.3 Environment variable related commands

echo : display the value of an environment variable

export : set a new environment variable

env : display all environment variables

set : Display locally defined shell variables and environment variables

unset : Clear environment variables

8.4 Organization of environment variables

In the Linux system, environment variables are organized as follows:

Each program will receive an environment variable table, the environment table is an array of character pointers, each pointer points to an environment string ending with '\0', and the last character pointer is empty.

8.5 Methods of obtaining environment variables

8.5.1 main function parameters

The main function actually has three formal parameters, but they are not often used so they are not written out.

int main(int argc, char* argv[],char* env[])
{ …… }

The second parameter of the main function is an array of character pointers. The first character pointer in the array stores the string of the executable program, and the rest of the character pointers store the strings of the given options. The last character The pointer is empty, and the first parameter of the main function represents the number of valid elements in the character pointer array. The third parameter of the main function actually receives the environment variable table, and we can obtain the environment variables of the system through the third parameter of the main function.

#include <stdio.h>
int main(int argc, char* argv[], char* env[])
{
    for(int i = 0; env[i] != NULL; ++i){
         printf("%s\n",env[i]);                                                                                                                                                 
    }
    return 0;
}

8.5.2 Third-party variable environ

The c language provides us with a global variable environ, which can be used to access the environment table

#include <stdio.h>
int main()
{
    extern char** environ;
    for(int i = 0;environ[i] != NULL; ++i){
        printf("%s\n",environ[i]);                                                                                                                                               
    }
    return 0;
}

8.5.3 getenv function

Environment variables can be obtained by calling the getenv function of the system. The getenv function can search the environment variable table according to the given environment variable name, and return a string pointer pointing to the corresponding value.

#include <stdio.h>
#include <stdlib.h>
int main()
{
    printf("%s\n",getenv("PATH"));                                                                                                                                               
    return 0;
}

Guess you like

Origin blog.csdn.net/GG_Bruse/article/details/128746932