Linux process thread and scheduling detailed explanation

Linux process thread and scheduling

1 Process concept

1.1 The definition of process and thread The
classic definition in the operating system:
process: resource allocation unit.
Thread: scheduling unit.
The operating system uses PCB (Process Control Block) to describe the process. The PCB in Linux is the task_struct structure.

1.2 Process life cycle
1.2.1 Process state
R, TASK_RUNNING: Ready state or running state, the process is ready to run, but not necessarily occupying the CPU
S, TASK_INTERRUPTIBLE: shallow sleep, waiting for resources, can respond to signals, generally the process actively sleep Entered state
D, TASK_UNINTERRUPTIBLE: Deep sleep, waiting for resources, not responding to signals, a typical scenario is that the process gets the semaphore blocked
Z, TASK_ZOMBIE: Zombie state, the process has exited or ended, but the parent process does not know yet, the state when there is no recycling
T, TASK_STOPED: stop, debug state, process hangs after receiving SIGSTOP signal
Insert picture description here

1.2.2 Process creation and death related APIs
1) system()
starts a new process by calling the shell
2) exec() starts a new process
by replacing the current process image
3) fork()
starts by copying the current process image For a new process, fork() in the child process returns 0, and fork() in the parent process returns the child process ID.
4) wait() The
parent process hangs, waiting for the end of the child process.
5) Orphan process and zombie process
Orphan process: A parent process exits while one or more of its child processes are still running, then those child processes will become orphan processes. The orphan process will be adopted by the init process (process number is 1), and the init process will complete the state collection work for them. The orphan process does not waste resources.
Zombie process: A process uses fork to create a child process. If the child process exits and the parent process does not call wait or waitpid to obtain the status information of the child process, the process descriptor of the child process is still stored in the system. This kind of process is called a zombie process. Zombie processes waste system resources (the process descriptor task_struct exists, the resources occupied by the process are recycled, there is no memory leak, in fact, system resources are basically not wasted, refer to Song Baohua's course).
Avoid zombie processes:
zombie process causes:
1. After the child process ends, send a SIGCHLD signal to the parent process, and the parent process ignores it by default;
2. The parent process does not call wait() or waitpid() to wait for the end of the child process.
Methods to avoid zombie processes:
1. The parent process calls wait() or waitpid() to wait for the end of the child process. In this way, the parent process is generally blocked at wait and cannot handle other things.
2. Catch the SIGCHLD signal and call the wait function in the signal processing function. This processing can avoid the problem described in 1.
3. Fork twice, the parent process creates the son process, the son process creates a grandson process, and then the son process commits suicide, and the grandson process becomes an orphan process and is adopted by the init process.
1.3 Inter-process communication
1) Signals
Signals here refer to events. For example, pressing the CTRL-C key combination will send a SIGINT signal, which can be captured in the process and processed accordingly.
2) Pipeline PIPE
is a file, and the operation of the pipeline is similar to that of a file.
The popen() function is similar to the fopen() function, and it returns an object pointer.
The pipe() function is similar to the open() function, and it returns an object descriptor.
The pipeline is for data transmission between relative processes (related processes created by the same parent process).
3) Named pipe FIFO
Named pipe can be used for inter-process communication without kinship.
mkfifo()/mknod() will create a file with path and name in the file system. Just use this pipe file as an ordinary file, and then inter-process communication can be realized.
4) Semaphore
Semaphore, message queue, and shared memory are System V IPC mechanisms.
Critical area: A code area that can only be accessed exclusively by one process at any time.
Semaphore: Most inter-process communication only requires binary semaphores, so only binary semaphores are discussed here. Before entering the critical section, perform the P operation (if the semaphore is greater than 1, subtract 1 and enter the critical section, otherwise the process will be suspended); when exiting the critical section, perform the V operation (if any process is waiting to be suspended, wake it up, otherwise Semaphore plus 1).
Mutex: Mutex semaphore is a subset of binary semaphore.
5) Message queue
Similar to named pipes, but you don’t have to consider the complicated operations of opening/closing pipes. The message queue exists independently of the process.
6) Shared memory
The processes that need to communicate share a piece of memory for data exchange.

[Article benefits] C/C++ Linux server architect learning materials plus group 832218493 (data including C/C++, Linux, golang technology, Nginx, ZeroMQ, MySQL, Redis, fastdfs, MongoDB, ZK, streaming media, CDN, P2P, K8S, Docker, TCP/IP, coroutine, DPDK, ffmpeg, etc.)

Insert picture description here

2 The realization essence of process thread

The Linux scheduler actually recognizes task_struct for scheduling.
Regardless of the process thread, the bottom layer corresponds to a task_struct. The difference between a process and a thread is the amount of shared resources. Two processes do not share resources at all, and all resources are shared between two threads.

2.1 After fork()
executes the fork, the task_struck of the parent process is copied to the child process. Initially, the resources of the parent and child processes are exactly the same, but they are two different copies, so any changes will cause the two to split.
Insert picture description here

The parent and child processes use COW (Copy-On-Write, copy-on-write) technology to manage memory resources (mm):

1\Before fork, a piece of memory area corresponds to a physical address and a virtual address, and the permission of the memory area is RW;

2\After the fork, the virtual address and physical address of the memory area seen by the parent and child processes are the same, and the parent and child processes actually use the same piece of physical memory. No memory copy occurs, and the operating system will change the permissions of this memory area to RO;

3\ The parent or child process writing to the memory area will trigger PageFault. The operating system will copy the memory area at this time. The virtual address seen by the parent and child process is still the same, but the physical address is already different. The mapping of virtual addresses to physical addresses of each process is managed by MMU (Memory Management Unit, memory management unit).

Insert picture description here

Fork runs on a CPU with MMU.

2.2 vfork()
Insert picture description here

For CPUs without MMU, COW cannot be applied and fork cannot be supported.
A CPU without MMU uses vfork to create a process, and the parent process will block until the child process exit or exec.
The essential difference between vfork and fork is that the parent and child processes in vfork share the same memory area.

2.3 pthread_create()

Insert picture description here

A Linux thread is essentially a process, but it is different from resource sharing between processes. All resources are shared between threads, as shown in the figure above.
Each thread has its own task_struct, so each thread can be scheduled by the CPU. The same process resources are shared between multiple threads. These two points just meet the definition of thread.
This is how Linux implements threads with processes, so threads are also called lightweight processes.

2.4 PID and TGID

Insert picture description here

POSIX requires that multiple threads of the same process obtain a process ID to obtain a unique ID value.
In the multithreading of the same process in Linux, in the kernel perspective, each thread actually has a PID, but in the user space, getpid needs to return a unique value. Linux uses a small trick to introduce the concept of TGID, the TGID returned by getpid() value.
The top command from the perspective of the process:

The top command without parameters (default) shows the utilization of the single-core CPU by the process. For example, there are three threads in a process, the main thread creates thread 1 and thread 2, and both thread 1 and thread 2 call one while(1), for dual-core CPUs, thread 1 and thread 2 each use one core, and the occupancy rate is 100%. The CPU utilization rate of the process seen by the top command is 200%, and the process ID is the PID of the main thread. (That is TGID).

The top command
from the thread perspective : The top -H command displays the CPU occupancy rate from the thread perspective. In the above example, it will show that the occupancy rate of thread 1 is 100%, and the occupancy rate of thread 2 is 100%.
The PID of the thread refers to the process ID in the user space, and the value is TGID; when specifically pointed out, the PID of the thread in the kernel space refers to the unique PID of the thread in the task_struct in the kernel.

3 process scheduling

Insert picture description here

3.1 Real-time process scheduling
SCHED_FIFO: Different priority runs to sleep according to the higher priority, and then runs to the lower priority; the same priority is first in, first out.
SCHED_RR: Different priorities run to sleep according to the higher priority, and then run to the lower priority; the same priority rotates.
Kernel RT patch: The
following two parameters
/proc/sys/kernel/sched_rt_period_us
/proc/sys/kernel/sched_rt_runtime_us
indicate that RT can only run at runtime in the period time.
3.2 Ordinary process scheduling
SCHED_OTHER:

3.2.1 Dynamic priority (early 2.6)
Process has two measurement parameters: IO consumption and CPU consumption.
High priority means: 1) get more time slices, 2) can preempt the low priority when waking up. Time slices rotate.
The kernel stores the static priority, and the user can modify the static priority through nice.
The dynamic priority of the process is calculated in real time based on the static priority. The scheduling algorithm rewards IO consumption (increasing the priority to increase real-time performance), and penalizes the CPU consumption type (lowering the priority to reduce real-time performance)

3.2.2 CFS: Completely fair scheduling (new kernel)
red-black tree, the left node is smaller than the value of the right node.
Run the process
with the smallest vruntime so far. At the same time, CPU/IO and nice
always find the thread scheduling with the smallest vruntime.
vruntime = pruntime/weight × 1024;
vruntime is the virtual running time, pruntime is the physical running time, and the weight is determined by the nice value (the lower the nice, the higher the weight), the thread with the less running time and the lower nice value will have a smaller vruntime. Get priority scheduling. This is a process that changes dynamically with operation.

Insert picture description here

Tools chrt and renice:

设置SCHED_FIFO和50 RT优先级

chrt -f -a -p 50 10576
 设置nice 
 # renice -n -5 -g 9394 
 # nice -n 5 ./a.out

Guess you like

Origin blog.csdn.net/lingshengxueyuan/article/details/112615517