【Linux】--Process concept

Table of contents

1. Process concept

2. PCB

1.What is PCB 

2.What is task_struct

3.task_struct contains content

 3. Detailed explanation of task_struct content

1. Check the process

(1) View through the system directory

(2) View through ps command 

(3) View through the top command 

(4) Obtain the process PID and parent process PPID through system calls

        Get the process ID functions getpid and getppid

        Get the current process ID

        Get parent process ID

2. Status 

3.Priority

4. Program counter 

5. Contextual data

6.I/O status information

7. Accounting information

4. Create a process through system calls

1. Use fork to create a child process

2. Understand fork to create child processes

3. Data modification after fork 

4.The return value of fork

(1) Meaning of fork return value 

(2) Let the parent and child processes perform different functions based on the fork return value

5. Process status

1. Process status definition

2. Process status classification

(1) R-Running status

(2) S-light sleep state

(3) D-deep sleep state

(4) T-stop state

(5) Z-zombie state

(6)X-death state

 3. Hazards of zombie processes

6. Orphan process

7. Process priority

1. Concept

2. Why should we have process priority?

3. View system processes

4.PRI WANI

5. Use the top command to change the process priority

(1) Change NI value 

(2) The value range of NI 

(3) The reason why the value range of NI is small

8. Environment variables

1. Concept

2. Common environment variables

3. How to view environment variables 

4. Commands related to environment variables

5. How environment variables are organized

(1) Environmental table 

(2) Get environment variables

6. Global properties of environment variables

7. Local variables

9. Program address space

1. Program address space distribution 

2. The program address space is a virtual address

3.Virtual address

(1)mm_struct 

(2) Page table and MMU 

(3) Reasons for the existence of process address space 

4. Copy on write


1. Process concept

Textbook concept: A process is an execution instance of a program, a program that is being executed.

Kernel perspective: A process is an entity responsible for system resources (CPU time, memory).

After we finish writing the code, it is compiled and connected to form an executable program .exe, which is essentially a binary file and is stored on the disk. Double-clicking this .exe file to run the program means loading the program from disk to memory, and then the CPU can execute its code statements. When a program is loaded into memory, it is called a process . The essence of all the process of starting a program is to create a process on the system, and double-clicking the .exe file is no exception:

2. PCB

1.What is PCB 

 According to operating system management, description is first performed and then organized. So how does the operating system describe the process? Imagine first that the process information must be described first, and then the information is organized and managed using data structures. So what information does the process have? use

ps axj

Command to view the processes in the system, that is, the programs that are running:

You can see that the attributes of the process are at least PPID, PID, PGID, SID, TTY, TPGID, STAT, UID, TIME, and COMMAND.

Process information is placed in a data structure called Process Control Block PCB (Process Control Block), which is a collection of process attributes.

When the operating system creates a process, in addition to loading the code and data on the disk into the memory, it also creates a task_struct for the process inside the system, which is a struct.

2.What is task_struct

The PCB under the Linux operating system is task_struct, so task_struct is a type of PCB. The PCB in other operating systems is not necessarily called task_struct.

Creating a process not only loads code and data into memory, but also creates a task_struct for the process. Therefore, a process is not just a running program. To be more precise, a process is the content of the program file and the process-related information automatically created by the operating system. Data structure, in fact, the process also includes other contents. Let’s talk about these two today.

The operating system describes each process, so there are PCBs one by one. The PCB in Linux is task_struct. This struct will have next and prev pointers. You can use a doubly linked list to link the processes. Some pointers of the task_struct structure You can also point to the code and data of the process:

 All processes running in the system are stored in the kernel in the form of task_struct as linked list nodes, which turns the management of processes into operations of adding, deleting, modifying and checking linked lists.

Added: When generating an executable program, store the .exe file on the disk. When you double-click to run the .exe program, the operating system will load the code and data of the process into the memory and create a process. The process will be described later. Form task_struct and insert it into the doubly linked list.

Delete: Process exit means deleting the task_struct node of the process from the doubly linked list, and the operating system releases the code and data of the process in the memory.

3.task_struct contains content

task_struct is a data structure of the Linux kernel, which is loaded into RAM (memory) and contains process information. So what specific information does task_struct contain?

Identifier : A unique identifier that describes this process and is used to distinguish other processes.
Status : Task status, exit code, exit signal, etc.
Priority : Priority relative to other processes.
Program Counter : The address of the next instruction to be executed in the program.
Memory pointers : including pointers to program code and process-related data, as well as pointers to memory blocks shared with other processes.
Context data : Data in the processor's registers while the process is executing.
I/O status information : includes displayed I/O requests, I/O devices assigned to the process and a list of files used by the process.
Accounting information : may include total processor time, total number of clocks used, time limits, accounting accounts, etc.

There is some additional information. The following explains the specific meaning of the content contained in task_struct.

 3. Detailed explanation of task_struct content

1. Check the process

(1) View through the system directory

proc is a system folder, which can be seen through ls in the root directory:

able to pass

ls /proc

Command to view process information, the number is the PID:

If you want to view process information, such as viewing the process information with PID 989, use the command

ls /proc/PID

Check:

(2) View through ps command 

 use

ps aux

Command to view processes, you can see all processes:

 If combined with grep, you can view a certain process:

For example, if you want to view the processes containing proc, you can use the following command: 

ps aux | head -1 && ps aux | grep proc | grep -v grep

 

(3) View through the top command 

You can also pass

top

 Command to view:

(4) Obtain the process PID and parent process PPID through system calls

  • Get the process ID functions getpid and getppid

Obtaining the process ID and obtaining the parent process ID can be obtained in the following ways, where pid_t is a short type variable: 

#include <sys/types.h>
#include <unistd.h>

pid_t getpid(void);//获取当前进程ID
pid_t getppid(void);//获取当前进程的父进程ID
  • Get the current process ID

 Get the current process, process.c 

#include<sys/types.h>
#include<stdio.h>
#include<unistd.h>

int main()
{
    while(1)
    {
        printf("hello linux!:pid:%d\n",getpid());//获取当前进程ID
        sleep(1);
    }

    return 0;
}

 Makefile:

process:process.c
    gcc -o $@ $?
.PHONY:clean
clean:
    rm -f process

After running, the PID of the current process, which is the process number, is obtained: 

 To close the process, you can use ctrl+c or to close the process. Open another window and now view the process through ps:

 This also verifies that what getpid obtains is the PID.

  • Get parent process ID

#include<sys/types.h>
#include<stdio.h>
#include<unistd.h>

int main()
{
    while(1)
    {
        printf("hello linux!:pid:%d,ppid:%d\n",getpid(),getppid());
        sleep(1);
    }

    return 0;
}

 Use the ps command to check and find that the ID of the parent process is 11081, but 11081 is also a child process of bash:

 This is because running command line commands is risky. An error on the command line cannot affect the command line interpretation. Therefore, the parent process of commands run on the command line is basically bash.

 Use the following command to view all attribute information inside the process:

ls /proc/当前进程ID -al

 When the process exits, there will be no folder /proc/18448. After ctrl c, check the folder again. It no longer exists: 

2. Status 

The return value of the code written before is 0. This 0 is the exit code when the process exits. This exit code is to be obtained by the parent process and returned to the system. The parent process obtains it through the system. For example, the exit code of the following code is 0

#include<stdio.h>
int main()
{
    printf("hello linux!\n");
    return 0;
}

 Then use

echo $?

You can see that the process exit code is 0: 

If you change the exit code to 99:

 Then the exit code after the program runs also becomes 99:

 Therefore, the role of status is to output the exit code of the most recently executed command.

3.Priority

Permission refers to whether it can be done, and priority refers to whether it is already possible or not. But as for when to execute it, you have to queue up first. This is like ordering food at a restaurant and paying out the receipt. You can already get the meal. Yes, but when can I get it? You need to queue up. In this process, whether you issue a small ticket means whether you have permission, and queuing up to pick up the meal means your priority.

4. Program counter 

 When the CPU executes a program and executes the current line of instructions, how does it know what the next line of instructions is? The program counter pc stores the address of the next instruction. When the operating system finishes executing the current line of instructions, the pc automatically turns to ++ and executes it directly. Next line of command.

The code and data of the process can be found through the PCB through the memory pointer in task_struct.

5. Contextual data

When the operating system maintains a process queue, because the process code may not be executed in a short time, if the operating system does not let other processes wait until the current process is completed, then the current process may not be executed in a short time. The process takes a long time to complete, and other processes will always be waiting, which is unreasonable. Then when the operating system actually performs process scheduling, it allocates execution time according to time slices. Once the time slice arrives, it switches to the next process. A time slice is the maximum time a process can run at a time.

 For example, if there are 4 processes, let the first process run for 10ms within 40ms. Once the time is up, even if it has not finished running, move the first process from the head of the queue to the end of the queue, and then let the second process run for 10ms. After 40ms, the user perceives that all four processes have advanced. In fact, it is essentially completed through fast switching of the CPU.

It is possible to be scheduled hundreds or thousands of times during the life cycle of a process. For example, the CPU has 5 registers. When process A is running and the time slice arrives, when it is switched away, the temporary data stored in the registers related to process A in the CPU will be taken away. After process B is scheduled, when process A is scheduled again, the temporary data saved in process A will be restored to the CPU registers, and it will continue to run in the state when it was last switched. Therefore, the protection context can ensure that multiple processes switch. share the CPU. 

6.I/O status information

 File operations include functions such as fopen, fclose, fread, fwrite, etc. In fact, the process is operating the file, because after the code is written, when the program is run, the operating system will find the process, and the process opens the file for IO operations. In fact, IO is It is the process that is performing IO, so the operating system needs to maintain process and IO information.

7. Accounting information

Record the combination of software and hardware resources enjoyed by a process in history.

4. Create a process through system calls

1. Use fork to create a child process

fork is used to create child processes:

#include <unistd.h>
pid_t fork(void);//通过复制调用进程创建一个新进程。新进程称为子进程。调用进程称为父进程。

Let’s look at a strange code first:

forkProcess_getpid.c 

#include<unistd.h>
#include<stdio.h>

int main()
{
    int ret = fork();

    if(ret > 0)
    {
            printf("I am here\n");
    }
    else
    {
            printf("I am here,too\n");
    }

    sleep(1);
    return 10;
}                    

Logically speaking, either print I am here or print I am here, too. But looking at the execution results, we found that both sentences were printed, that is, both if and else were executed:

 Look at the code again:

#include<stdio.h>
#include<unistd.h>

int main()
{
    int ret = fork();

    while(1)
    {
            printf("I am here,pid = %d,ppid = %d\n",getpid(),getppid());
            sleep(1);
    }

    return 10;
}

 Found that there are two pid and ppid:

This shows that when executing the while infinite loop, not only one execution stream is executed, but two execution streams are executed. The two IDs in each row are parent-child relationships. This is because after the fork, there are two execution streams executing the while loop at the same time.

You can see that bash 16202 created child process 16705, and the child process created child process 16706:

2. Understand fork to create child processes

Let’s talk about why both if and else are executed.

./Executable program, command line, fork, from the perspective of the operating system, there is no difference in the way to create a process, they are all extra processes in the system. The child process created by fork is different from the parent process. The parent process has an executable program on the disk. When the executable program is run, the corresponding code and data will be loaded into the memory for execution.

However, the child process is only created and does not have the code and data of the process. By default, the child process will inherit the code and data of the parent process. The data structure task_struct of the child process will also be used as a template to initialize the child process. task_struct. Therefore, the child process will execute the code after the parent process forks to access the data of the parent process.

 

 Summary: When fork creates a child process, there is an additional process in the system. In fact, there is an additional data structure task_struct describing the process with the parent process as the template and code and data with the parent process as the template. Therefore, after fork, the code in if and else are executed. If task_struct is compared to genes, and code and data are compared to careers, then the child process inherits both the genes and the career of the parent process.

3. Data modification after fork 

The code cannot be modified. What about data? The child process and the parent process share data. When the parent process modifies the data, the data seen by the child process is also modified, then the parent process will affect the child process. Are these two processes still independent?

When both the parent and child processes only read and do not write data, the data is shared. However, if any of these two processes modifies data, it will affect the other. At this time, the operating system, which is also the process manager and also the memory manager, will step in to intervene. When modifying, the operating system will re-open a space in the memory, copy this part of the data and then modify it instead of modifying the original data. This is called copy-on- write .

Copy-on-write is to maintain process independence and prevent multiple processes from interfering with each other when running. When creating a child process, the child process will not be allowed to copy all the data of the parent process, because data writing may not occur in all cases, so this avoids the reduction in efficiency and more waste during fork. Space issues. Therefore, it is reasonable to open space only when writing data.

4.The return value of fork

(1) Meaning of fork return value 

After forking out the child process, the child process and the parent process are usually allowed to do different things. How to distinguish the parent and child processes at this time? The return value of the fork function is as follows:

Print the return value of fork:

 forkProcess_getpid.c

#include<stdio.h>
#include<unistd.h>

iint main()
{
    pid_t ret = fork();

    while(1)
    {
        printf("Hello forkProcess,pid = %d,ppid = %d,ret = %d\n",getpid(),getppid(),ret);
        sleep(1);
    }

    return 10;
}

 The print result is as follows:

this means:

  • The child process is created when fork is ready to return.
  • There are two return values ​​here. Since the return value of the function is written through the register, the variable value is written to the space where the data is saved when the function returns. Therefore, after the execution of the parent-child execution flow is completed, there are two returns, and there are two different return values, which must be written. Whoever returns first will write first, that is, copy-on-write occurs.
  • The reason why the pid of the child process is returned to the parent process is that a parent process may have multiple child processes, and the child processes must be identified by pid, so the pid of the child process is generally returned to the parent process to control the child process. If the child process wants to know the pid of the parent process, it can be obtained through get_ppid(). In this way, the parent-child process can be maintained.

(2) Let the parent and child processes perform different functions based on the fork return value

The return value is used to split the parent-child process to perform different functions:

#include<stdio.h>
#include<unistd.h>

int main()
{
    pid_t ret = fork();

    //通过if else来分流
    if(ret == 0)//child
    {
        while(1)
        {
            printf("I am child, pid = %d,ppid = %d\n",getpid(),getppid());
            sleep(1);
        }
    }
    else if(ret > 0)//parent
    {
        while(1)
        {
            printf("I am parent, pid = %d,ppid = %d\n",getpid(),getppid());
            sleep(3);
        }
    }
    else
    {
    }

    return 0;
}

This allows the parent and child processes to perform different functions. The above code prints the parent process every 3 seconds and the child process prints every 1 second:

 You can view the parent process and child process:

 The process is created through fork, and then separated through if else, so that the parent and the child can each execute different code segments to achieve different functions. As for which of the parent and child processes runs first, it is decided by the scheduler.

5. Process status

1. Process status definition

 During the entire life of a process from creation to cancellation and death, sometimes it occupies the processor for execution, sometimes it can run but is not allocated a processor, sometimes it has an idle processor but cannot execute because it is waiting for an event to occur. This shows that a process is different from a program. It is active and has state changes. It can reflect the life state of a process and can be described by a set of states:

 

State definition in kernel source code:

/*
* The task state array is a strange "bitmap" of
* reasons to sleep. Thus "running" is zero, and
* you can test for combinations of others with
* simple bit tests.
*/
static const char * const task_state_array[] = {//进程也叫做任务
    "R (running)", /* 0 */
    "S (sleeping)", /* 1 */
    "D (disk sleep)", /* 2 */
    "T (stopped)", /* 4 */
    "t (tracing stop)", /* 8 */
    "X (dead)", /* 16 */
    "Z (zombie)", /* 32 */
};

 Processes are distinguished by different states to classify processes. The status information of the Linux process is stored in the task_struct of the process.

2. Process status classification

 You can use the following two commands to view the current status of the process:

ps aux
ps axj

 Viewed process status:

(1) R-Running status

R (Running) : Either running or in the running queue, so the R status does not mean that the process must be running, so there may be multiple R status processes in the system at the same time.

The following code statusType.c:

#include<stdio.h>
int main()
{
    while(1);
    return 0;
}

After running it, it will always be in the running state, and you will find that it is in the R+ state, where + means running in the foreground:

If you add & at the end when running, it will run in the background and become R state:

Processes running in the background can only be killed with kill -9 process ID :

Processes in the running state can be scheduled by the CPU. When the operating system switches processes, it will directly select the R state process in the run queue.

(2) S-light sleep state

S (Sleeping)  : The process is waiting for the completion of an event and can be awakened or killed. The light sleep state is also called interruptible sleep.

For example, the following code:

status.c 

int main()
{
    printf("hello linux\n");
    sleep(20);

    return 0;
}

Check the status of the status process within 20 seconds after running and find that it is S+. After executing the kill command, the process is killed:

(3) D-deep sleep state

D (Disk sleep) : The process is waiting for IO and cannot be killed. It must wake up automatically to recover. It is also called uninterruptible sleep state.

 When the process is waiting for IO, such as writing to the disk, the process is in a deep sleep state and needs to wait for the disk to return information about whether the write is successful to the process, so the process will not be killed at this time.

(4) T-stop state

T(Stopped) : The (T) process can be stopped by sending the SIGSTOP signal to the process. The suspended process can continue running by sending the SIGCONT signal. 

 The running status process was suspended through the SIGSTOP signal, and the status changed from S+ to T:

It was restored through the SIGCONT signal, and the state changed from T to S:

The kill -l command can list all signals in the operating system, among which 18 is the SIGCONT signal and 19 is the SIGSTOP signal:

 Therefore, the above kill SIGCONT process number  can also be replaced by kill -18 process number , and the kill SIGSTOP process number  can also be written as kill -19 process number .

(5) Z-zombie state

When the process exits, the occupied resources are not released immediately. Instead, all the exit information of the process is temporarily saved to identify the cause of the process death (such as code problems, being killed by the operating system, etc.). These data are saved in task_struct , for the parent process or system to read, which is why the zombie state exists.

A zombie process occurs when a process exits and the parent process does not read the return code of the child process's exit. The zombie process will remain in the process table in a terminated state and will wait for the parent process to read the exit status code.

The following code, statusZombie.cc:

#include<iostream>
#include<unistd.h>
using namespace std;

int main()
{
    pid_t id = fork();
    if(id == 0)
    {
        while(1)
        {
            cout << "child is running" << endl;
            sleep(20);
        }
    }
    else
    {
        cout  << "father" << endl;
        sleep(50);
    }
    return 0;
}

 Makefile:

statusZombie:statusZombie.cc
        g++ -o $@ $^
.PHONY:clean
clean:
        rm -f statusZombie

Use the following monitoring process script 

while :; do ps axj | head -1 && ps ajx | grep 进程名 | grep -v grep;sleep 1;  echo "####################"; done

To monitor the process status, after the process runs, the status of the parent process and the child process becomes S:

 After killing the child process, the state of the child process changes to Z state:

 

Therefore, as long as the child process exits and the parent process is still running, but the parent process does not read the child process status, the child process will enter the Z state.

(6)X-death state

This status is just a return status and cannot be seen in the task list. Because when the process exits, the resources occupied by the process are released in an instant, so the death state cannot be seen.

 3. Hazards of zombie processes

 From the zombie state, we know that when the zombie process exits, it will wait for the parent process or the system to read its return code to identify the cause of the process death. This is like when we write code, the return value of the main function is 0:

#include<stdio.h>
int main()
{
	//code
	return 0;
}

The return value 0 is to tell the operating system that the code execution has ended successfully. You can use echo $? to get the exit code when the process last exited:
 

 When the child process exits and the parent process is still running, but the parent process does not read the child process's exit information, the child process enters the zombie state.

For example, in the following code zombieProcess.c, the child process exits after printing 5 times. The parent process does not read the exit information of the child process. At this time, the child process becomes a zombie state:

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
int main()
{
    pid_t id = fork();
    if(id == 0)//child
    {
        int count = 5;
        while(count)
        {
            printf("child PID:%d,PPID:%d,count:%d\n",getpid(),getppid(),count);
            sleep(1);
            count--;
        }
        printf("child is quiting\n");
        exit(1);
    }
    else if(id >0)//father
    {
        while(1)
        {
            printf("father PID:%d,PPID:%d\n",getpid(),getppid());
            sleep(1);
        }
    }
    else//fork error
    {
        //do nothing
    }
    return 0;
}

 Using the monitoring script, you can see that the status of the child process has become a zombie state:

Hazards of zombie processes:

(1) The exit status of the process must be maintained because it has to tell the parent process the exit information. If the parent process never reads it, then the child process will always be in a zombie state.

(2) Since the basic information of the process is stored in the task_struct, if the zombie state never exits, as long as the parent process does not read the child process exit information, the PCB will always need to be maintained.

(3) If a parent process creates multiple child processes and does not recycle them, multiple task_struct data structures must be maintained, which will cause a waste of memory resources.

(4) The resources requested by zombie processes cannot be recycled. The more zombie processes there are, the fewer resources are actually available. In other words, zombie processes will cause memory leaks.

6. Orphan process

 In a zombie process, the child process exits first, but the parent process does not read the child process's exit information.

If the parent process exits first and the child process exits later, the child process is in a zombie state and there is no parent process to read its exit information. At this time, the child process is called an orphan process.

In the following code orphanProcess.c, the parent process terminates and exits after 5 seconds, but the child process does not exit:

#include<stdio.h>
#include<unistd.h>
#include<stdlib.h>
int main()
{
    pid_t id = fork();
    if(id ==0)//child
    {
        while(1)
        {
            printf("child\n");
            sleep(2);
        }
    }
    else//father
    {
        sleep(5);
		printf("father is quiting\n");
        exit(1);//父进程5秒后终止
    }
    return 0;
}

 Start the monitoring script and see that after the parent process exits, the child process becomes an orphan process, but the PPID of the child process becomes 1, that is, the parent process of the child process becomes process No. 1:

 What process is process No. 1?

Process No. 1 is the init process, also called an operating system process. When an orphan process occurs, the orphan process will be adopted by int process No. 1. When the orphan process enters the zombie state, it will be recycled by init process No. 1.

Why is the orphan process adopted by process No. 1?

If the orphan process needs to be recycled when it wants to exit, then a process needs to be recycled. Therefore, if the orphan process is adopted by init process No. 1, it can also be recycled by init process No. 1.

7. Process priority

1. Concept

The priority of a process is the order in which CPU resources are allocated, that is, the priority of the process. Processes with higher priority have priority execution rights.

There are some other concepts:

  • Competition: There are many system processes, but there are only a small number of CPU resources, or even one, so there is competition between processes. In order to complete tasks efficiently and compete for related resources more reasonably, priority is given
  • Independence: Multi-process operation requires exclusive use of various resources, and multi-process operations do not interfere with each other. 
  • Parallelism: Multiple processes run on multiple CPUs at the same time. This is called parallelism. 
  • Concurrency: Multiple processes use process switching under one CPU to advance multiple processes within a period of time, which is called concurrency. 

2. Why should we have process priority?

 Because CPU resources are limited, a CPU can only run one process at the same time. When there are multiple processes in the system, process priority is needed to determine the process's ability to obtain CPU resources.

In addition, configuring process priorities is very useful for Linux in a multi-tasking environment and can improve system performance. You can also run the process on a specified CPU, which arranges unimportant processes to a certain CPU, which can greatly improve the overall performance of the system.

3. View system processes

 use:

ps -l

 Command to view system processes:

 can be seen

  • UID: represents the identity of the executor, indicating who started the process
  • PID: represents the codename of this process
  • PPID: represents which process this process is derived from, that is, the code name of the parent process
  • PRI: represents the priority at which this process can be executed. The smaller the value, the earlier it will be executed.
  • NI: represents the nice value of this process 

4.PRI WANI

  • PRI is the priority of the process, that is, the order in which programs are executed by the CPU. The smaller the value, the higher the priority of the process.
  • NI is the nice value, which represents the modified value of the priority at which the process can be executed.
  • The smaller the PRI value, the faster it will be executed. After adding the nice value, the PRI will become: PRI(new)=PRI(old)+nice
  • When the nice value is negative, the priority value of the program will become smaller, that is, its priority will become higher, and the faster it will be executed.
  • Adjusting the process priority under Linux is to adjust the process nice value
  • The value range of nice is -20 to 19, with a total of 40 levels.

Note:  The nice value is not the priority of the process. It is the correction data of the process priority, which will affect the priority change of the process.

5. Use the top command to change the process priority

(1) Change NI value 

To run a process first, use

ps -l

Check the process number, priority and NI value, for example, execute ./forkProcess_getpid process:

 You can see that the priority is 80 and the NI value is 0:

 After running the top command, enter r, and there will be PID to renice. At this time, enter the process number 5255, and then enter the NI value. Set it to 10 here:

 Then check the priority and NI value of the process. The priority becomes 90 and the NI value becomes 10:

 It means that the priority and NI value have been changed. This can also be verified:

PRI(new) = PRI(old)+nice

PRI (old) is generally 80, which is why the PRI of the process viewed with the ps -al command is all 80 before the NI value is modified .

(2) The value range of NI 

Now verify the value range of NI (nice). If the value of NI is set to 100:

Check the priority and NI value of the process again and find that the NI value has become 19 and the priority has increased by 19:

 This shows that the upper limit of NI is 19, but what about the lower limit? At this time, the PID becomes 12452.

 Change the NI value to -100:

 It is found that the NI value has changed to -20, which means that this time the NI value has changed to -20 and the priority has been reduced by 20:

 This shows that the value range of NI is -20~19, with a total of 40 levels.

(3) The reason why the value range of NI is small

Because no matter how the priority is set, it can only be a relative priority and cannot have an absolute priority. Otherwise, there will be a serious process "starvation problem", that is, a process cannot get CPU resources for a long time, and The scheduler needs to allow each process to enjoy CPU resources more evenly.

8. Environment variables

1. Concept

Environment variables refer to some parameters in the operating system used to specify the operating environment of the operating system. For example: When writing C/C++ code, when linking, you never know where the linked dynamic and static libraries are, but you can still link successfully and generate an executable program. The reason is that there are relevant environment variables to help the compiler find it. .

2. Common environment variables

  • PATH: Specify the search path for the command
  • HOME: Specify the user's home working directory (that is, the default directory when the user logs in to the Linux system)
  • SHELL: The current Shell, its value is usually /bin/bash.

 

3. How to view environment variables 

When we run an executable program, we need to add ./ in front of the executable program to execute it:

But when executing system commands, why don't we need to add ./ in front? 

 Commands, programs, and tools are essentially executable files. The function of ./ is to help the system determine where the corresponding program is. Due to the existence of environment variables, when executing system commands, there is no need to add ./ before the system command.

How to view environment variables:

echo $PATH

 The system performs path search through PATH. The search rule is: first search in the first path in PATH, if it cannot be found, search in the second path, and if it cannot be found again, search in the third path..., If you find it, you don't need to search any further and just run the program under the found path. This completes the path search. That is, when the system executes a command, the operating system searches for the corresponding executable program path through the environment variable PATH.

How to execute forkProgress without ./ is the same as executing system commands. There are two ways:

  • Copy the forkProgress command to any of the above 6 paths. However, this approach is not recommended and will pollute the command pool.
  • Add the current path to the PATH environment variable

Usually, installing software is done by copying the software to a specific command path in the system environment variable. The installation process is actually a copying process.

 You cannot directly assign the current path to PATH, otherwise the above 6 paths will be lost. Environment variables can be imported using export:

export PATH=$PATH:程序路径

Find the path to forkProcess:

 Add environment variables:

Now the executable program can be executed in other paths, such as in the home directory:

4. Commands related to environment variables

The essence of environment variables is the space opened by the operating system on the memory/disk to save system-related data. The essence of defining environment variables in language is to open up space in memory to store key and value values, that is, variable names and data. 

  • echo: Display the value of an environment variable
  • export: Set a new environment variable
  • env: display all environment variables
  • set: Display locally defined shell variables and environment variables
  • unset: clear environment variables

Use echo to display the value of a variable:

export sets a new environment variable, which has been set previously:

env displays all environment variables:

set displays environment variables:

 unset clears environment variables:

5. How environment variables are organized

(1) Environmental table 

Each process will receive an environment table when it is started. The environment table mainly refers to a collection of environment variables. Each process has an environment table that is used to record environment variable information related to the current process.

The environment table is stored in the form of a character pointer array, and then the global variable char** envrion is used to record the first address of the environment table, and NULL is used to represent the end of the environment table:

When writing C code in the past, the main function could take 2 parameters:

#include<stdio.h>

int main(int argc,char *argv[])
{
    return 0;
}

 The second parameter argv is an array of pointers. There are a total of argc array elements. argc determines how many valid command lines there are for that string. You can print the details of the command line parameters:

#include<stdio.h>

int main(int argc,char *argv[])
{
	int i = 0;
	for(i = 0;i<argc;i++)
	{
		printf("argv[%d] = %s\n",i,argv[i]);
	}
	return 0;
}

 Run the command line with parameters:

The number of elements in the command line parameter array changes dynamically. Several parameters have corresponding lengths:

Various data passed in the command line will eventually be passed to the main function, which is saved in argv at a time, and then indicated by argc.

 The end of the array is NULL, so can we not use argc? No, there are two reasons:

  • As an array parameter, it is generally recommended to bring the number
  • The user fills in parameters to the command line. If you want to limit the number of command line parameters entered by the user, you need to use argc, for example:
    if(argc != 5)
    {
        //TODO
    }

The role of command line parameters is that the same program can present different expressions or functions by giving it different parameters, for example:

 

Implement a program that prints hello linux if the input parameter is o or e:

inputPara.c

#include<stdio.h>
#include<string.h>
#include<unistd.h>
int main(int argc,char *argv[])
{

    if(argc != 2)//输入参数不为2时
    {
        printf("Usage: %s -[l|n]\n",argv[0]);
        return 1;
    }
    if(strcmp(argv[1],"-l") == 0)//输入第二个参数为-l
    {
        printf("hello linux! -l\n");
    }
    else if(strcmp(argv[1],"-n") == 0)//输入第三个参数为-n
    {
        printf("hello linux -n\n");
    }
    else
    {
        printf("hello\n");
    }

    return 0;
}

Entering different parameters will result in different execution results:

 The significance of command line parameters is that the command has many options to complete different sub-functions of the same command. The underlying options use command line parameters.

 If the function has no parameters, you can use the variable parameter list to obtain them.

(2) Get environment variables

  • Use getenv to get environment variables
#include <stdlib.h>

char *getenv(const char *name);

Get the three environment variables PATH, HOME, and SHELL:

#include<stdio.h>
#include<stdlib.h>

int main()
{
    printf("PATH:%s\n",getenv("PATH"));
    printf("HOME:%s\n",getenv("HOME"));
    printf("SHELL:%s\n",getenv("SHELL"));

    return 0;
}

 as follows:

  • Use the third parameter on the command line to get environment variables 

 Use the third parameter env on the command line to obtain environment variables:

env1.c 

#include<stdio.h>

int main(int argc,char *argv[],char *env[])
{
    int i = 0;
    for(; env[i];i++)
    {
        printf("%s\n",env[i]);
    }

    return 0;
}

 The result is as follows:

  • Obtained through the third-party variable environ

The global variable environ defined in libc points to the environment variable table. environ is not included in any header file, so it must be declared with extern when using it. 

#include <stdio.h>
int main(int argc, char *argv[])
{
	extern char **environ;
	int i = 0;
	for(; environ[i]; i++){
		printf("%s\n", environ[i]);
	}
	return 0;
}

 The result is as follows:

6. Global properties of environment variables

Environment variables usually have global properties and can be inherited by child processes.

The following code:

geteEnvironment.c

#include<stdio.h>
#include<sys/types.h>
#include<unistd.h>

int main()
{
    printf("pid = %d,ppid = %d\n",getpid(),getppid());
    return 0;
}

 It was found that every time the program is run, the ID of the child process is different, but the ID of the parent process is the same.

The parent process of the process started on the command line is bash. The environment variables of bash are read from the system. The system environment variables are in the system configuration. When bash logs in, bash imports the system configuration into its own context. . The environment variables of the child process are given by the system, that is, given by the parent process bash. Once exported, environment variables can affect child processes.

The reason why environment variables have global properties is that environment variables can be inherited. For example, after bash creates a child process, the child process creates more child processes, which is equivalent to starting from bash, an environment variable is set, all child processes see the bash environment variable, and all users can obtain this environment variable. Use these environment variables to do some search and search tasks. The reason why gcc and gdb can link to various libraries is that they are all commands and sub-processes of bash. All bash searches for library paths, header files, etc. Various global designs can be found by these commands. Essentially, because environment variables can guide the compilation tool to perform related searches, you don’t need to bring many options when compiling the program. They can be found by default, allowing the program to quickly complete translation and debugging.

7. Local variables

Compared with environment variables, there are local variables, which are effective for the current process of the current user. They are temporary variables and will become invalid after exiting this login.

As shown below, the value of the variable value is printed to 5 before logging out, and after ctrl+d logging out.

Then go to echo $value and find that the value has expired.

Can local variables be inherited by child processes? Check with env and find that it is not in the context of the shell:

 It means that local variables cannot be inherited and can only be used by bash itself.

Now use getenv to get the environment variables of this local variable:

getLocalValue.c

#include<stdio.h>
#include<stdlib.h>

int main()
{
    printf("value =  %d\n",getenv("value")) ;
    return 0;
}

 After running, it is found that the value becomes 0, indicating that the value variable just defined is a local variable.

Use export to export the defined value variable into an environment variable. In fact, it is exported to the environment variable list of the parent process bash:

At this time, use env to check and find that there is: in the context of the shell:

 This shows that the environment variables have been given to the parent process bash. There are already environment variables in bash. When ./getLocalValue.c is run, its environment variable information will be inherited from the parent process. The parent process now has one more environment variable. Use env You can achieve success.

9. Program address space

1. Program address space distribution 

C/C++ program address space:

 

 So is the program address space of C/C++ memory? To verify what it is, you can use code like this:

printfProcessAddress.c

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

int g_UnValue;
int g_Value = 1;

int main()
{
    const char *string = "hello world";
    char *heap = (char*)malloc(10);
    int a = 5;

    printf("code address:%p\n",main);//代码区

    printf("read only string:%p\n",string);//字符常量区
    printf("statck address:%p\n",&string);//栈区

    printf("uninit address:%p\n",&g_UnValue);//未初始化全局变量区
    printf("Init address:%p\n",&g_Value);//已初始化全局变量区

    printf("heap address:%p\n",heap);//堆区
    printf("stack address:%p\n",&heap);//栈区
    
    printf("stack a:%p\n",&a);//栈区

    return 0;
}

 After running it, I found:

(1) The address 0x40057d of the code area is the smallest, indicating that the code area is at the bottom in the program address space;

(2) String constant area 0x400710

(3) Initialized global variable area 0x60103c followed by

(4) Uninitialized global variable area 0x601044, second

(5) The heap area is 0x17e4010, followed by 0x17e4030, and the two addresses increase in sequence, indicating that the heap is growing upward.

(6) The stack area address is the largest, and the three stack addresses decrease in sequence:

The high address is printed first, and the low address is printed last, which shows that the stack is growing downward.

  The above completely restores the address distribution of the program address space.

2. The program address space is a virtual address

Let’s first look at the following piece of code. The child process modifies the value of the global variable during operation:

printfFork.c 

#include<stdio.h>
#include<string.h>
#include<unistd.h>

int g_Value = 1;

int main()
{
    //发生写时拷贝时,数据是父子进程各自私有一份
    if(fork() == 0)//子进程
    {
        int count = 5;
        while(count)
        {
            printf("child,times:%d,g_Value = %d,&g_Value = %p\n",count,g_Value,&g_Value);
            count--;
            sleep(1);
            if(count == 3)
            {
                printf("############child开始更改数据############\n");
                g_Value = 5;
                printf("############child数据更改完成############\n");
            }
        }
    }
    else//父进程
    {
        while(1)
        {
            printf("father:g_Value = %d,&g_Value = %p\n",g_Value,&g_Value);
            sleep(1);
        }
    }

    return 0;
}

But when printing, I found that the same address has different g_Value values: 

 If copy-on-write accesses the same physical address, why is the g_Value obtained different? Therefore, the program address space uses not physical addresses, but virtual addresses.

All virtual addresses used in C/C++ are virtual addresses. The operating system does not expose physical memory to users. Physical addresses are managed by the operating system, and the operating system is responsible for converting virtual addresses into physical addresses. When the computer first starts, the operating system is not loaded, so the computer can only access physical memory. After the operating system starts, the CPU runs normally and enters the virtual space.

Therefore, the program address space distribution diagram drawn above is not a physical address, but a process virtual address space. 

3.Virtual address

The process address space is essentially a type of data structure within the operating system. The operating system allows each process to feel that it is monopolizing system memory resources. Each process thinks that it is monopolizing 4GB of space.

(1)mm_struct 

When a process is created, the task_struct structure of the process contains a pointer to the mm_struct structure, which is used to describe the process virtual address space, that is, the space seen by the user. mm_struct contains the loaded executable image information and the page table directory pointer pgd of the process. The virtual address is mapped to the actual physical address through the page table:

 

 Each process thinks that mm_struct represents the entire memory address space. The address space can not only form regions, but also abstract an address in each region, because this address is linearly continuous. start and end correspond to the array subscript, and the subscript corresponds to the virtual address. The address seen by task_struct is not a physical address, but a virtual address.

Each process has only one virtual space, and this virtual space can be shared by other processes.

So what is the role of virtual address ?

Virtual addresses essentially create a pie for processes in software, allowing each process to feel that it is monopolizing resources. No matter how the pie is drawn, the process must ultimately be able to access address data, read and execute code for calculations.

(2) Page table and MMU 

The page table is a data structure that records the correspondence between pages and page frames. It is essentially a mapping table, which adds permission management, isolates the address space, and can convert virtual addresses into physical addresses. The operating system maintains a page table for each process.

MMU (Memory Manage Unit) memory management unit is the overall space of virtual addresses and a description of the entire user space. The MMU is generally inherited from the CPU.

Therefore, each area of ​​the process includes the code area, initialized area, uninitialized area, heap area, stack area, shared area, etc. are all virtual addresses. They are mapped into corresponding physical addresses through the page table and MMU, and then the process is allowed to access the code and data.

 

 

(3) Reasons for the existence of process address space 

Is it okay if the process directly accesses the memory? Why do we need to map it in the middle?

This is because adding an intermediate layer is beneficial to management and prevents illegal behavior of the process. If a process is allowed to directly access physical memory, it can access its own code and data, but this process may also access and modify the code and data of other processes. Some malicious processes may even access and manipulate the code and data of other processes through illegal pointers. This can cause serious problems and may threaten system security.

 

 

Reasons for the existence of process address space: 

① By adding a software layer, effective risk management (permission management) of process operation memory is completed. The essential purpose is to protect the data security of physical memory and each process.

Now adding a page table between the virtual address and the physical memory is equivalent to adding a software layer. This software layer is the spokesperson of the operating system. When the page table and MMU are mapped, it is actually the operating system that is mapping. Can it be mapped? It is determined by the operating system, which enables permission management. for example:

const char *str = "spring";
*str = "summer";//报错,不允许

Modifying the value of the variable pointed to by str is not allowed. Because str is a local variable on the stack, but spring is in the character constant area and cannot be modified, because the operating system only gives you r read permission. This is why the content of the code area cannot be modified, nor can the content of the character constant area be modified. Because the page table is managed with permissions, the permissions assigned to the code area and character constant area are r read permissions. So str points to a virtual address. When writing to *str, the virtual address is also accessed. This requires the operating system to convert between the virtual address and the physical address, but the permission of *str is r. If it cannot be read or written, the process will crash.

 

② The concepts of memory application and memory usage are clearly divided in time, and the virtual address space is used to shield the underlying process of applying for memory, so as to achieve the purpose of separating the process reading and writing memory and operating system management operations at the software level, allowing applications and Memory management decoupling. 

If a process requests 1000 bytes, can it use these 1000 bytes immediately? Not necessarily, there may be situations where not all of them will be used temporarily, or even not used temporarily. From the perspective of the operating system, if you give these 1,000 bytes of space to this process immediately, it means that the space that could have been used by others immediately is now idle by you. Therefore, the operating system will not immediately allocate 1000 bytes of physical memory space to this process, but it will be approved in the process virtual address space. When the process is about to use the 1000 bytes of space, the process will tell the upper layer that it has applied for 1000 bytes of space and is ready to access it. This is when the operating system will apply for 1000 bytes in the physical memory. The space requested this time is transparent, and then a mapping relationship is established between the actual 1000 bytes of space and the virtual space applied for by the process.

 

③ From the perspective of the CPU and application layer, processes can be collectively regarded as using 4GB of space, and the relative position of each area is determined. The purpose is to make each process think that it monopolizes system resources.

The code and data of the program must be loaded into physical memory, and the operating system needs to know the physical address of the main function. If the physical address of the main function of each process is different, then when the CPU executes the process code, it has to go to different physical addresses to find the main function, which is very troublesome. The physical starting address of the main function of each process may be different, but with the process address space, the physical starting address of the main function can be mapped to the same virtual space address through the page table and MMU, so that A mapping relationship is established between this virtual space address and the physical address of each process. If you want to run other processes, you can map the actual address of the main function of other processes to that virtual space address. In this way, when the CPU reads the process, the main function starting code is uniformly read from the same starting position, and the main function entry position of each process can be found.

 

In addition, data and code may be discontinuous in physical memory, and when the page table maps all code areas, initialized global data areas, uninitialized global data areas, etc. to the virtual address space through mapping, they can be mapped to a continuous area to form a linear area.

4. Copy on write

 In the code of printfFork.c, let the child process run for 5 seconds. At the 3rd second, the value of g_Value is changed, so the child process prints 1 twice at the same address, and the subsequent ones are all 5. The parent process It keeps printing 1 because each process has its own page table, and the address is a virtual address rather than a physical address.

When the program first starts running, there is only one process, the parent process. The pcb of the parent process points to the address space of the parent process. When the global variables are defined, the child process does not have a fork. g_Value corresponds to the global variable defined in the initialized area. After the page Table mapped to g_Value on physical memory

When fork creates a child process, a new pcb, address space, and page table are created for the child process using the parent process as a template. The child process inherits most of the contents of the parent process, such as the address space. The address space and page table of the child process are also the same as the parent process. Therefore, after the child process is created, the child process also points to the g_Value of the parent process at the beginning. :

 At the 3rd second, the child process modified the value of g_Value. The operating system did not allow the child process to change the value directly because the processes are independent and do not interfere with each other. Copy-on-write occurs during modification, a physical space is re-opened for the child process, the g_Value variable value is copied in, and the mapping between the virtual address and the physical address of the child process is re-established:

 Therefore, the addresses printed by the child process and the parent process are the same because the virtual addresses are the same. The reason why the values ​​are different is that they are different variables in physical memory.

Guess you like

Origin blog.csdn.net/gx714433461/article/details/128102804