Process summary under linux

I. Overview

Definition: A running program and the resources it occupies (CPU, memory, system resources, etc.) are called processes.

The written c file is called source code (highly readable for humans, but unrecognizable by the computer), and compiled to generate a binary executable program recognizable by the CPU, which is stored on a storage medium (usually an external memory). Note that at this time A program is spawned instead of a process. Once the program starts running, the running program and the resources it occupies are called processes.

The concept of a process is for the system, not for the user.

Processes occupy four types of resources:

  • CPU (Central Processing Unit central processing unit) lscpu command can see the detailed information of the CPU
  • Memory (memory) free -h command can view the system memory size
  • Disk _
  • Network _
     

In a traditional operating system, a program cannot run independently, and the basic unit for resource allocation and independent operation is a process .

A program can be executed multiple times, which also means that multiple processes can execute the same program.

2. Process space memory distribution

The objects of Linux process memory management are virtual memory .

Each process has its own virtual memory space of 0-4G that does not interfere with each other. 0-3G is the user space to execute the user's own code, the upper 1GB space is the kernel space to execute the Linux x system call, where the code of the entire kernel and all kernel modules are stored, and what the user sees and touches is the virtual address . Not the actual physical memory address .

Under Linux, a process has three parts of data in the memory, namely "code segment", "stack segment" and "data segment" . The general CPU has the above three segment registers to facilitate the operation of the operating system. These three parts are the necessary parts to form a complete execution sequence.

"Code segment" : Stores the data of the program code. If there are several processes running the same program in the machine, they can use the same code segment.

"Stack segment" : Stores the return address of the subroutine, the parameters of the subroutine, the local variables of the program and the address of the memory dynamically requested by malloc().

"Data segment" : the memory space for storing the program's global variables, static variables and constants.

  •  Stack : Static stack, the value of the variable in the stack is a random value. The stack memory is completed by the compiler during the program compilation stage. The stack space of the process is located at the top of the process user space and grows downward. Each call of each function will open up its own stack space in the stack space. Function parameters, local variables , function return address, etc. will be pushed into the function stack in the order that the first one is the top of the stack. After the function returns, the stack space of the function disappears, so the address of the local variable returned in the function is illegal.
  • Heap : Dynamic heap. Heap memory is allocated during program execution and used to store variables that are dynamically allocated during process operation. The size is not fixed. The heap is located between the uninitialized data segment and the stack, and is used during use. close to the stack space. When a process calls a function such as malloc to allocate memory, the newly allocated memory is not in the stack frame of the function, but is dynamically added to the heap, and the heap expands to a high address at this time; when using a function such as free to release the memory, Freed memory is kicked from the heap, and the heap shrinks. Because the dynamically allocated memory is not in the function stack frame, even if the function returns this memory, it will not disappear.
  • Uninitialized data segment (.bss) : used to store uninitialized global variables and static static variables. And before the program starts to execute, that is, before main(), the kernel will initialize the data in this segment to 0 or a null pointer.
  • Initialized data segment (.data) : used to keep initialized global variables and static static variables.
  • Text constant area (.rodata) : store text constants, such as hello in char * str = "hello";
  • Text Segment (Code Segment) : This is the portion of machine instructions in an executable file that is executed by the CPU. The text section is often read-only to prevent a program from accidentally modifying its own execution.

PS:  memory allocation in linux :

The basic idea of ​​Linux memory management is to establish a physical mapping of an address only when an address is actually accessed.

There are three ways to allocate Linux C/C++ language:

(1) Allocated from the static storage area . It is the memory allocation of the data segment. This memory has been allocated during the program compilation stage and exists throughout the running of the program, such as global variables and static variables.

(2) Created on the stack . When a function is executed, the storage units of local variables in the function can be created on the stack, and these storage units are automatically released when the function execution ends. The stack memory allocation operation is built into the instruction set of the processor, which is very efficient, but the memory capacity allocated in the system stack is limited. For example, a large array will burst the stack space and cause a segmentation fault.

(3) Allocation from the heap , also known as dynamic memory allocation. When the program is running, use malloc or new to apply for any amount of memory, and the programmer is responsible for when to use free or delete to release the memory. The lifetime of dynamic memory is determined by us. It is very flexible to use, but it also has the most problems. For example, if the value of the pointer pointing to a certain memory block changes and there is no other pointer pointing to this memory, this memory cannot be accessed, and a memory leak occurs. .

3. Characteristics of the process


1. Structural features
Processes usually cannot be executed concurrently. In order to make the program run independently, a process control block (PCB) must be configured. The process entity is composed of the program segment, data segment and PCB. We usually refer to the process as the process entity. , to create a process is to create a PCB in the process entity.

2. Dynamics
The essence of a process is an execution process of the process entity. Therefore, dynamics is the most basic feature of a process, which is specifically manifested in: "generated by creation, executed by scheduling, and destroyed by cancellation".

3. Concurrency
refers to the fact that multiple process entities co-exist in the memory and can run simultaneously for a period of time. Concurrency is an important feature of the process and an important feature of the operating system. The purpose of introducing the process is for concurrency.

4. Independence
A process entity is a basic unit that runs independently, allocates resources independently, and accepts scheduling independently.

5. Asynchrony
Process entities run asynchronously

 

4. The basic state of the process

There are five basic states of a process, namely creation state, ready state, execution state, blocked state, and terminated state .

The common ones are ready, execute, and blocking.

  • Creation : When a process is created, it needs to apply for a blank PCB, fill in the information for controlling and managing the process, and complete resource allocation.
  • Ready : The process is ready, has been allocated the required resources, and can run immediately as long as the CPU is allocated.
  • Execution : After the process is scheduled in the ready state, the process enters the execution state.
  • Blocking : The executing process is temporarily unable to run due to some event, the process is blocked.
  • Termination : The process ends, or an error occurs, or is terminated by the system, enters the terminated state, and cannot be executed anymore.

The process state means that the life cycle of a process can be divided into a group of states, which describe the entire process, and the process state reflects the life state of a process.

5. Process termination

There are 6 ways to terminate a process under linux:

1. Normal return:

(1) Return from the main function;

(2) Call exit, _exit, _Exit functions;

(3) The last thread returns from the startup process or calls the pthread_exit function;

2. Abnormal return:

(1) Call the abort function;

(2) Receive a signal;

(3) The last thread responds to the cancel request.

Regardless of how the process terminates, the kernel closes all open file descriptors, freeing the resources they were using.

6. View the current process of the system

In Linux, enter ps aux in the terminal to view the current time node process information.

In addition, use commands such as ps aux | grep xiao to find the process used by the xiao user.

ps le \ps ef (l means to display detailed information, e means to display all processes).

  • USER: The process was spawned by that user
  • PID: ID number of the process
  • %CPU: The CPU resource usage percentage of the process
  • %MEM: The percentage of memory resources occupied by the process
  • VSZ: The size of the virtual memory of the process, in KB (convert part of the disk space to virtual memory, which will be used after the physical memory is full)
  • RSS: the size of the actual physical memory occupied by the process, in KB
  • TTY: Which terminal is the process running on (TTY1~TTY6 represents the local console terminal. TTY1 is a graphics terminal, TTY2~6 are local character interface terminals. PTS/0-255 represents a virtual terminal.)
  • STAT: Process status. R: running, S: sleep, T: stop, s: contains child processes, +: in the background
  • START: The process start time
  • TIME: The process occupies the system to get the computing time (note that it is not the system time)
  • COMMAND: The command name that spawned this process

 In addition, enter  top to enter the monitoring mode . After entering the monitoring mode, you can enter the following command to operate:

  • h: display help;
  • P: Sort by cpu;
  • M: Sort by memory;
  • N: Sort by PID;
  • q: can exit;

This command can dynamically display the information changes of the process.

7. Some important knowledge points related to the process

1. init process

The Linux kernel will create an init process to execute the program /sbin/init in the final stage of startup . This process is the first process run by the system . The process number is 1. It is called the initialization process of the Linux system. This process will create other sub-processes To start different writing system services, and each service may create different sub-processes to execute different programs. So the init process is the "ancestor" of all other processes, and it is created by the Linux kernel to run with root privileges and cannot be killed . Linux maintains a data structure called a process table, which stores information about all processes currently loaded in memory, including the process’s PID (Process ID), process status, command string, etc., the operating system through the process’s PID pair They are managed, and these PIDs are indexes into the process table.

2. What the child process inherits from the parent process

Child processes get a copy of the resources of the parent process , not themselves.

(1) The child process inherits from the parent process to:

  • Process eligibility (real/effective/saved user IDs (UIDs) and lease IDs (GIDs))
  • environment variables
  • the stack
  • Memory
  • The descriptor of the open file (note that the location of the corresponding file is shared between parent and child processes, which can cause ambiguity)
  • Close-on-exec flag during execution For details, see "APUE" WR Stevens, 1993, translated by You Jinyuan et al. (hereinafter referred to as "Advanced Programming"). Sections 3.13 and 8.9)
  • Signal (signal) control settings
  • nice value (the nice value is set by the nice function, the value indicates the priority of the process, the smaller the value, the higher the priority)
  • Process scheduling class (scheduler class) (Translator's Note: Process scheduling class refers to the class to which a process belongs when it is scheduled in the system. Different classes have different priorities. According to the process scheduling class and nice value, the process scheduler can calculate each The global priority of the process (Global process prority), the process with higher priority is executed first)
  • process group number
  • Session ID (Session ID) (Translator's Note: The translation is taken from "Advanced Programming", refers to: the session ID (session) ID to which the process belongs. A session includes one or more process groups. For more details, see "APUE" Section 9.5)
  • current working directory
  • Root directory (the root directory is not necessarily "/", it can be changed by the chroot function)
  • File mode creation mask (umask)
  • resource constraints
  • control terminal

(2) Exclusive to child processes

  • process number
  • Different parent process ID
  • Copy of own file descriptor and directory stream
  • The child process does not inherit the parent's process, text, data, and other memory locks (Translator's Note: Locked memory refers to locked virtual memory pages. (page out), see "The GNU C Library Reference Manual" version 2.2, 1999, section 3.4.2 for details)
  • The system time in the tms structure (Translator's Note: The tms structure can be obtained by the times function, which saves four data for recording the time that the process uses the central processing unit (CPU; Central Processing Unit), including: user time, system time, The total time of each sub-process of the user, the total time of each sub-process of the system)
  • Resource utilization (resource utilizations) set to 0
  • The blocking signal set is initialized to an empty set
  • Does not inherit timers created by the timer_create function
  • Does not inherit async input and output
  • The lock set by the parent process (because if it is an exclusive lock, it is contradictory if it is inherited)
     

 3. System limitations

A server cannot provide services to an unlimited number of clients. Under Linux, each resource has related soft and hard restrictions. For example, there is a limit on the maximum number of sub-processes that a single user can create, and the maximum number of file descriptions that can be opened by the same process. There are also corresponding limit values, which limit the number of clients that the server can provide concurrent access to.

Under Linux, you can use the getrlimit() and setrlimit() functions to get or set these limits:

#include   <sys/resource.h>


int getrlimit ( int resource , struct rlimit * rlim ); // get limit
int setrlimit ( int resource , const struct rlimit * rlim ); // set limit


(1) Parameter resource description:

  • RLIMIT_AS //The maximum virtual memory space of the process, in bytes.
  • RLIMIT_CORE //The maximum length of the core dump file.
  • RLIMIT_CPU //The maximum allowed CPU usage time in seconds. When the process reaches the soft limit, the kernel will send it the SIGXCPU signal, the default behavior of this signal is to terminate the execution of the process.
  • RLIMIT DATA //The maximum value of the process data segment.
  • RLIMIT_FSIZE //The maximum length of the file that the process can create. If the process tries to exceed this limit, the kernel will send it a SIGXFSZ signal, which will terminate the execution of the process by default.
  • RLIMIT_LOCKS //The maximum value of locks and leases that a process can establish.
  • RLIMIT_MEMLOCK //The maximum amount of data that a process can lock in memory, in bytes. RLIMIT_MSGQUEUE //The maximum number of bytes that a process can allocate to a POSIX message queue
  • RLIMIT_NICE //The process can set the maximum perfect value through setpriority) or nice (call.
  • RLIMIT_NOFILE //Specify a value that is one larger than the maximum file descriptor that can be opened by the process. If this value is exceeded, an EMFILE error will be generated.
  • RLIMIT_NPROC //The maximum number of processes a user can have.
  • RLIMIT_RTPRIO //The maximum real-time priority that the process can set through sched_setscheduler and sched_setparam.
  • RLIMIT SIGPENDING //The maximum number of pending signals that a user can have.
  • RLIMIT_STACK //The largest process stack, in bytes.

(2) rlim: A structure describing the soft and hard limits of resources

        struct rlimit {
                rlim_t rlim_cur;

                rlim_t rlim_max;

        };

In addition to hardware (CPU, memory, bandwidth) limitations, a server program is also limited by the resources of the Linux system. Therefore, if we want to increase the number of clients accessed concurrently by the Linux server, we need to modify these limits by calling the setrlimit() function in the server program. 

8. Some commonly used functions related to the process

1. There are two basic system calls under Linux that can be used to create child processes: fork() and vfork().

(1) fork () system call

Since the fork() system call will create a new process, it will return twice at this time. One return is to the parent process, and its return value is the PID (Process ID) of the child process, and the second return is to the child process, and its return value is 0.

After we call fork(), we need to use its return value to determine whether the current code is running in the parent process or the child process.

The return value is 0: child process

Return value > 0: parent process

Return value < 0: fork() system call error

There are two main reasons why the fork function call fails:

① There are already too many processes in the system;

② The total number of processes for this real user ID exceeds the system limit.

Each child process has only one parent process , and each process can obtain its own process PID through getpid(), or obtain the PID of the parent process through getppid(), so it is desirable to return 0 to the child process when fork() . A process can create multiple child processes, so for the parent process, he does not have an API function to obtain the process ID of its child process, so when the parent process creates a child process through fork(), it must pass the return value Tell the parent process the PID of the child process it created. This is also the reason for the design of the return value of the fork() system call twice.

Notice:

The fork() system call creates a new child process that is a copy of the parent process. This also means that after the system successfully creates a new child process, it will copy the text segment, data segment, and stack of the parent process to the child process, but the child process has its own independent space. The modification will not affect the corresponding memory in the parent process space. At this time, two basically identical processes (parent and child processes) appear in the system. There is no fixed order for the execution of these two processes. Which process executes first depends on the process scheduling policy of the system. If it is necessary to ensure that the parent process or the child process is executed first, the programmer needs to implement it himself through the inter-process communication mechanism in the code.

(2) The vfork() system call
vfork) is another function that can be used to create a process. It has the same usage as fork() and is also used to create a new process.

vfork() does not completely copy the address space of the parent process to the child process , because the child process will call exec or exit() immediately, so the address space will not be referenced. However, before the child process calls exec (or exit()), he will run in the space of the parent process, but if the child process tries to modify the data domain (data segment, heap, stack), it will bring unknown results, because he will Data that affects the space of the parent process may cause abnormal execution of the parent process.

vfork() will ensure that the child process runs first, and the parent process may be scheduled to run after he calls exec or exit() . If the child process depends on further actions of the parent process, a deadlock will result.

Copy on Write (CopyOnWrite) technology is used : these data areas are shared by the parent and child processes, and the kernel changes their access rights to read-only. If any of the parent process and the child process tries to modify these areas, the kernel will then modify the area. Make a copy of that block of memory .

2. exec*() executes another program

3. wait() and waitpid() corpse collection function

 4. system() and popen() functions

Guess you like

Origin blog.csdn.net/qq_51368339/article/details/126964687