Linux process summary

 


Preface

The saint said: Reviewing the past and knowing the new can be a teacher. This article aims to connect the previously learned fragmentary knowledge about the Linux process and make a systematic summary. On the one hand, it helps oneself deepen understanding; on the other hand, it can provide a little help for those in need. I hope that all of my colleagues will give me your advice on the mistakes and omissions. Thank you very much!


Tip: The following is the content of this article, the following cases are for reference

1. Definition of process:

1. A typical process is defined as follows:

1) A process is an execution of a program.

2) A process is an activity that occurs when a program and its data are executed sequentially on the processor.

3) A process is a process in which a program with independent functions runs on a data set. It is an independent unit of the operating system for resource allocation scheduling (the basic unit of resource allocation).

2. Process definition in traditional OS: A process is the running process of a process entity and an independent unit for the system to allocate and schedule resources.

3. Expand the concept:

1) Process Control Block (Process Control Block, PCB): In order to enable each program to run independently, the operating system is configured with a special data structure called the process control block. The system uses PCB to describe the basic situation and activity process of the process, and then control and manage the process. (It is actually a structure task_struct, placed in the sched.h file )

2) Process entity: consists of three parts: program segment, related data segment and PCB. Also known as bit process image. The so-called creation process is essentially the PCB in the creation process entity; the cancellation process is essentially the PCB in the cancellation process.

3) Code segment: storage code, global constant (const), string constant.

4) Data segment: store global variables (initialized and uninitialized global variables, of which the uninitialized global variables are stored in the BSS segment, and the BSS segment belongs to the data segment), static variables (global and local, initialized and uninitialized static variable).

5) Heap: The area used for dynamically allocated memory.

6) Stack: store local variables (initialized and uninitialized local variables but not static variables), local constants (const).

4. In the Linux kernel, tasks and processes are the same concept. The four elements of a process are summarized as follows:

(1) There is a section of program executable, not necessarily exclusive to a certain process, and can be shared with other processes;

(2) The process has a dedicated system stack space;

(3) The process control block, the task_struct structure, is registered in the kernel, so that the process can become a basic unit to accept kernel scheduling; it also records the resources occupied by the process;

(4) In addition to the above-mentioned proprietary system stack space, there must also be independent storage space, which means there is a dedicated user space; note that the process can only change its own system stack space, not the system space (not independent ); such as mm_struct used for virtual storage management and its subordinate vm_area, page directory entries and page tables, they are subordinate to the resources of task_struct;

2. Characteristics of the process:

1) Dynamic: The essence of the process is the execution process of the process entity. Therefore, dynamics is the most basic feature of the process. It arises from creation, executes with scheduling, and dies from cancellation.

2) Concurrency: It refers to the co-existence and memory of multiple process entities, and they can run simultaneously in the same time period. The purpose of introducing the process is to make the process entity and other process entities execute concurrently.

3) Independence: It means that the process entity is a basic unit that can run independently, obtain resources independently, and accept scheduling independently.

4) Asynchrony: Refers to the process running in an asynchronous manner, that is, advancing at an independent and unpredictable speed. In order to ensure that the results of concurrent execution of processes are predictable, the OS has introduced a process synchronization mechanism.

3. The basic state of the process is the transition:

1. Three basic states of the process:

1) Ready state: refers to the state where the process is ready to run, that is, after the process is allocated all necessary resources except the CPU, it can be executed immediately as long as it obtains the CPU. If there are many processes in the ready state in the system, they are usually arranged in a queue according to a certain strategy (such as a priority strategy), which is called the ready queue.

2) Running state: refers to the state in which the process has obtained the CPU and its program is executing.

3) Block state: refers to the state when the executing process is temporarily unable to continue execution due to an event (such as IO request, application cache failure, etc.). At this time, process scheduling is caused, and the OS allocates CPU resources to another ready process, leaving the blocked process in a suspended state. This suspended state is generally called a blocked state. The system arranges the processes in the blocking state into a queue, which is called a blocking queue. In larger systems, in order to reduce the overhead of queue operations and improve system efficiency, multiple blocking queues will be set up according to different blocking reasons.

2. Three state transitions:

    Processes often undergo state transitions during operation. As shown in the figure below, a process in the ready state can be executed after the scheduler allocates CPU resources to it. Correspondingly, its state changes from the ready state to the execution state; if the process being executed is due to the time allocated to it When the film is finished and deprived of CPU resources, when the execution is suspended, its state is changed from the execution state to the ready; if an event occurs, the execution of the current process is blocked (for example, the process accesses a critical resource, and the resource is being used by other processes) Access), making it impossible to continue execution, the process will change from execution to blocking. When the process applies for critical resources, the state of the process changes from blocked to ready;

       Supplement: There are two common states, creation state and termination state. No research here. 

 

Fourth, process control

    Process control is the most basic function in process management. It mainly includes creating a new process, terminating the completed process, placing the process that cannot be continued due to an abnormal situation in a blocking state, and being responsible for functions such as state transition during the process.

1. Process creation:

       Process creation process:

       1) Apply for PID for the new process and request a blank PCB from the PCB collection;

       2) Allocate the resources needed for the new process to run, such as memory, files, I/O devices, CPU time, etc.

       3) Initialize the PCB;

       4) If the process ready queue can accommodate the new process, insert the new process into the ready queue.

       There are three ways to create a Linux process: fork vfork clone. The three functions call do_fork() through sys_fork(), sys_vfork() and sys_clone() to do specific creation work, but the parameters passed in are different.

2、fork()

      1) Function prototype: pid_t fork(void);

      2) Role: Create process;

      3) Header file: #include <unistd.h>

      4) Return value: return the process ID in the parent process, return 0 in the child process, return -1 on error;

      Features of fork() function:

      1) The new process created by fork() is called the child process, and the call returns twice, the process ID is returned in the parent process, and 0 is returned in the child process;

      2) The child process is a copy of the parent process, that is, the child process obtains a copy of the data space, heap, and stack of the parent process. The parent and child processes share the text segment (the part of the machine instructions executed by the CPU).

      3) The child process starts execution from the next line of fork();    

      4) Most of the current operating systems adopt the Copy-On-Write (COW) strategy. That is, the complete copy of the data space, heap, and stack of the parent process is not performed immediately when the child process is created. These areas are shared by the parent and child processes, and the kernel changes their access rights to read-only. If any of the parent and child processes tries to modify this area, the kernel will only make a copy of the memory in the modified area. Usually a "page" in a virtual memory system.

      5) Examples:

#include <unistd.h>
#include <stdio.h>

void main()
{
        pid_t pid;

        int count = 0;

        pid = fork();

        count++;

        if(pid > 0)
        {

                printf("This is father process pid = %d, count is: %d (%p).\n", getpid(), count, &count);

        }
        else if(pid == 0)
        {

                printf("This is chird process pid = %d, count is: %d (%p).\n", getpid(), ++count, &count);

        }
        else
        {
                printf("fork error!");
        }

        return;

}

       operation result:

       /media/ext/home$ ./fork               
      This is father process pid = 12105, count is: 1 (0x7fffdbfaae68).
      This is chird process pid = 12106, count is: 2 (0x7fffdbfaae68).

  • It can be seen from the running results that the pids of the father and son processes are different, and the stack and data resources are completely copied.
  • The child process changes the value of count, but the count in the parent process has not been changed.
  • The address (virtual address) of the child process and the parent process count is the same (note that they are mapped to different physical addresses in the kernel).

      Two application scenarios of fork() function:

     1) A parent process wants to copy itself so that the parent and child processes execute different code segments at the same time. This is common in the network service process --- the parent process waits for the client's service request. When this request arrives, the parent process calls the child process to handle the request. The parent process continues to wait for the next service request to arrive.

     2) A process needs to execute a different program. This is a common situation for shells. In this case, the child process calls exec immediately after returning from fork.

3、vfork()

      1) Function prototype: pid_t vfork(void);

      2) Role: Create a new process and block the parent process;

      3) Header file: #include <unistd.h> #include <sys/types.h>

      4) Return value: the process ID is returned in the parent process, 0 is returned in the child process, and -1 is returned on error; pid_t is an unsigned integer;

      Features of vfork() function:

      1) vfork() is used to create a new process, and the purpose of the process is to exec a new program;

      2) vfork guarantees that the child process runs first, and the parent process may be scheduled to run after it calls exec or exit. (If the child process relies on further actions of the parent process before calling these two functions, it will cause a deadlock.)

      3) Before calling exec or exit, the child process created by vfork runs in the space of the parent process (using the copy-on-write strategy).

      4) The child process created by vfork should not use return to return to the caller, but it can use exit() or _exit() to exit

4. Examples:

#include <sys/types.h>
#include <sys/wait.h>

void main()
{
        pid_t pid;

        int count = 0;

        pid = vfork();

        count++;

        if(pid > 0)
        {

                printf("This is father process pid = %d, count is: %d (%p).\n", getpid(), ++count, &count);

        }
        else if(pid == 0)
        {

                printf("This is chird process pid = %d, count is: %d (%p).\n", getpid(), ++count, &count);

                /* 调用exec函数 */
                execl("/bin/ls", "ls", "/media/ext/home/fork", NULL);

                printf("This is chird process pid = %d!\n", pid);
                printf("This is chird process pid = %d, count is: %d (%p).\n", getpid(), ++count, &count);
        }
        else
        {
                printf("fork error!");
        }

        return;
}

      operation result:

       /media/ext/home/fork$ ./myvfork    
      This is chird process pid = 12285, count is: 2 (0x7ffeececf2c8).
      This is father process pid = 12284, count is: 4 (0x7ffeececf2c8).
       /media/ext/home/fork$ fork  fork.c  myvfork  vfork1.c

      /media/ext/home/fork$ 

      1) The child process created by vfork will cause the parent process to hang, unless the child process exit or execl will call the parent process;

      2) From the running results, we can see that the child process created by vfork shares the count variable of the parent process, and the count of both points to the same memory, so the child process modifies the count variable, and the count variable of the parent process is also affected. The count variable in the child process has been accumulated twice, and the printed value is 2; the child process calls the execl function to wake up the parent process, and the count variable in the parent process has been accumulated twice, and the printed value is 4;

     3) After the child process calls the execl function, the following two printf are not printed, indicating that all the code segments of the program have changed. The original code segment after execl will be overwritten. Regarding the exec family of functions, we will analyze it later.

     4) Note: The system call vfork is used to start a new application. Secondly, the child process runs directly in the stack space of the parent process after vfork() returns, and uses the memory and data of the parent process. This means that the child process may destroy the data structure or stack of the parent process and cause failure. In order to avoid these problems, it is necessary to ensure that once vfork() is called, the child process does not return from the current stack frame, and the exit function cannot be called if the child process changes the data structure of the parent process. The child process must also avoid changing any information in the global data structure or global variables, because these changes may make the parent process unable to continue. Generally, if the application does not call exec() immediately after fork(), it is necessary to check carefully before fork() is replaced with vfork().

 

5、exit()

       1) Definition:

       Header file: #include <stdlib.h>

       Function prototype: void exit(int status);

       Function: used to terminate a process normally.

       Parameter: status is an integer parameter, indicating the termination status;

      The exit function family has three functions for terminating a process normally: _exit and _Exit enter the kernel immediately, and exit performs some cleanup processing (calls and executes various termination handlers, closes all standard I/P streams, etc.), and then enters the kernel.

      #include <stdlib.h>

      void exit(int status);

      void _Exit(int status);

      #include <unistd.h>

      void _exit(int status);

      2) Exit and _exit:

      exit() Before ending the process that calls it, the following steps are required: 
      a)Call the function registered by atexit() (exit function); call all the functions registered by it in the reverse order of ATEXIT registration, which allows us to specify Perform your own cleanup actions when the program terminates. For example, save program status information in a file, unlock the shared database, etc.

      b) cleanup(); Close all open streams, which will cause all buffered output to be written and delete all temporary files created with the TMPFILE function.

      c) Finally, call the _exit() function to terminate the process.

      _exit() function:

      Stop the process directly, clear the memory space it uses, and destroy various data structures in the kernel;

      About the data in the buffer:

  •        In the standard function library of Linux, there is a set of functions called "Advanced I/O". The well-known printf(), fopen(), fread(), and fwrite() are all in this column. They are also called "Advanced I/O". Buffered I/O (buffered I/O)", its characteristic is that there is a buffer in the memory corresponding to each open file. Each time the file is read, several more records will be read, so that the file will be read next time It can be directly read from the buffer in the memory when the file is written. Each time a file is written, it is only written to the buffer in the memory, and a certain condition is met (a certain number is reached, or a specific character is encountered, such as a newline EOF), and then write the contents of the buffer to the file at one time, which greatly increases the speed of file reading and writing, but it also brings a little trouble to our programming. If there is some data, we think that it has been written to the file. In fact, because the specific conditions are not met, they are only saved in the buffer. At this time, we use the _exit() function to directly close the process, and the data in the buffer Otherwise, if you want to ensure the integrity of the data, you must use the exit() function.

      In the subprocess branch created by'fork()', it is incorrect to use'exit()' under normal circumstances, because using it will cause the stdio: Standard Input Output (stdio: Standard Input Output) buffer to be emptied Twice, and the temporary file was deleted unexpectedly (the temporary file is created by the tmpfile function in the system temporary directory, and the file name is randomly generated by the system).

  •       The basic rule that applies to most situations is that'exit()' is called only once after each entry into the'main' function.    

      PS: The above paragraph is a paragraph from another blogger's blog post. The link is as follows: https://blog.csdn.net/drdairen/article/details/51896141

      The use of exit() can ensure that the data in the buffer of the process calling this function is normally output to the target file. It should not damage the output of data in the standard input and output buffers of other processes. But whether deleting all the temporary files created by the TMPFILE function will affect other processes remains to be studied. Hope there will be some advice from the great god.

      There is a lot of talk about process termination in the advanced programming of UNIX environment, but it is very daunting. The frequently used functions are not so simple once they are deeply researched. Still have to thank all the great gods on the Internet for their guidance.

      3) exit and return:

  •  Returning an integer value in the main function is equivalent to calling exit with that value. So in the main function exit(0); is equivalent to return(0);

  •  exit is a function with parameters. void exit(int status). After the exit is executed, the control is transferred to the system. return is the return after the function is executed. After return is executed, control is passed to the calling function. return() is the current function return. If it is in the main function main, it will naturally end the current process, if not, it will return to the previous call.

  • Return is the language level, which represents the return of the call stack; and exit is the system call level, which represents the end of a process.

  • After exit() has performed some cleanup work (terminating the handler, flushing the output stream and closing all open streams), it calls _exit to exit directly without popping the stack. And return will pop the stack and return to the superior calling function. This difference is critical when executing vfork.

      4) Examples:

      

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

void main()
{
        pid_t pid;

        int count = 0;

        pid = fork();

        count++;

        if(pid > 0)
        {

                printf("This is father process pid = %d, count is: %d (%p).\n", getpid(), count, &count);

        }
        else if(pid == 0)
        {

                printf("This is chird process pid = %d, count is: %d (%p).\n", getpid(), ++count, &count);
                //_exit(0);   //使用_exit(0)则什么也没有输出,如果给第一个printf加上'\n'的话,那就只会输出第一句话。
                exit(0);

        }
        else
        {
                printf("fork error!");
        }

        return;

}

      Use exit(0) in the child process; the output result is:

       /media/ext/home$ ./myfork               
       This is father process pid = 17420, count is: 1 (0x7ffcef3643f8).
       This is chird process pid = 17421, count is: 2 (0x7ffcef3643f8).
       /media/ext/home$ 

      The above output is normal.

      

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

void main()
{
        pid_t pid;

        int count = 0;

        pid = fork();

        count++;

        if(pid > 0)
        {

                printf("This is father process pid = %d, count is: %d (%p).\n", getpid(), count, &count);

        }
        else if(pid == 0)
        {

                printf("This is chird process pid = %d, count is: %d (%p).", getpid(), ++count, &count);
                _exit(0);   //使用_exit(0)则什么也没有输出,如果给第一个printf加上'\n'的话,那就只会输出第一句话。
                //exit(0);

        }
        else
        {
                printf("fork error!");
        }

        return;

}

      Use _exit(0) in the child process; the output result is:

      /media/ext/home$ ./myfork               
      This is father process pid = 17429, count is: 1 (0x7ffc50217e78).
      /media/ext/home$ 

      The printf printing in the child process does not output. Reason: For printf, in order to improve the output efficiency, the computer will store the input information in the cache. Finally, write the output to the standard output file. So this can explain why _exit does not output anything, because it has exited without writing the cache to the standard output file.

 

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

void main()
{
        pid_t pid;

        int count = 0;

        pid = fork();

        count++;

        if(pid > 0)
        {

                printf("This is father process pid = %d, count is: %d (%p).\n", getpid(), count, &count);

        }
        else if(pid == 0)
        {

                printf("This is chird process pid = %d, count is: %d (%p).\n", getpid(), ++count, &count);
                _exit(0);   //使用_exit(0)则什么也没有输出,如果给第一个printf加上'\n'的话,那就只会输出第一句话。
                //exit(0);

        }
        else
        {
                printf("fork error!");
        }

        return;

}

      Use _exit(0) in the child process; "\n" is added to printf, the output result is:

      /media/ext/home$ ./myfork               
      This is father process pid = 17442, count is: 1 (0x7ffd4685be68).
      This is chird process pid = 17443, count is: 2 (0x7ffd4685be68).
      /media/ext/home$  

     The printf in the child process can output normally. Reason: You can output a sentence after adding'\n'. This is also related to the buffer writing to the file. The printf function automatically reads the record from the buffer when it encounters the "\n" newline character.

      Reference: https://www.cnblogs.com/chilumanxi/p/5136105.html  , netizens are great! like!

 

6、wait() 与 waitpid() 

      Reference: https://blog.csdn.net/dangzhangjing97/article/details/79745880  LInux: Wait() & waitpid() of Process Waiting --- It’s so fragrant! ! ! Thanks for sharing!

 

Quote:

https://blog.csdn.net/qq_32095699/article/details/88601494

https://blog.csdn.net/sykpour/article/details/25643861

https://blog.csdn.net/drdairen/article/details/51896141

Advanced Programming in UNIX Environment

Computer operating system


to sum up

To be continued.

Guess you like

Origin blog.csdn.net/the_wan/article/details/108170789