Linux - Thread Concept / Thread Control

1. Thread concept

1. A line of execution in a program is called a thread. A more precise definition is: a thread is a more detailed flow of execution running inside a process than a process. And there is no real thread under Linux, because the structure of thread and process is similar, so we use process to simulate thread.

In Linux, the current implementation of threads is called Native POSIX Thread Libary as NPTL for short. In this implementation, each user-level thread corresponds to a scheduling entity in the kernel, that is, a kernel-level thread, and also has its own process descriptor (task_struct structure).

In the eyes of the CPU, a PCB (process descriptor) represents a process, so the processes under Linux are all lightweight processes . It is lightweight in that it only contains a PCB and a small amount of resources.

The CPU is scheduled according to the PCB, so a PCB can be regarded as an execution flow, so a PCB can be understood as a thread (resources are ignored).

2. For a process, it must have a PCB when it is created, so any process has at least one execution flow and resources (address space, etc.). How much resources a process has is seen through its address space.

In the above figure, a process includes the above-mentioned 4 PCBs, virtual address space, page table, physical memory, etc. The above whole figure can be called a process. And for threads there is only one PCB and a small amount of resources in the process. So it can be said that in the process shown in the above figure, there are 4 PCBs, so there are 4 execution streams, so there are 4 threads.

In the eyes of the CPU, a PCB (process descriptor) represents a process, so the processes under Linux are all lightweight processes, which only contain a PCB and a small amount of resources.

2. Implementation of the thread

Learn from the blog here click to open the link

Threads are implemented in two ways: user-level threads and kernel-level threads

1. Kernel-level threads

(1) Definition

In the operating system, whether it is a system process or a user process, it runs with the support of the operating system kernel. Kernel-level threads also run with the support of the kernel. Their creation, switching, destruction, etc. are all implemented in kernel space. In order to control and manage kernel-level threads, a thread control block (implemented by PCB simulation) is set for each thread in the kernel space, and the kernel perceives the existence of the thread according to the control block.

(2) Advantage: If a thread in a process is blocked, the kernel can schedule other kernel-level threads in the process to run.

(3) Disadvantage: For the thread of the user process, it runs in user space, while the scheduling and management of kernel-level threads are implemented in the kernel. Therefore, when a thread wants to switch to another thread, it needs to enter from the user mode to the kernel mode, and the overhead of the system is relatively large.

2. User-level threads

(1) Definition

User-level threads are implemented in user space, and the creation, switching, and destruction of user-level threads do not require the support of the kernel, that is, user-level threads are independent of the kernel. It can be said that the kernel is completely unaware of the existence of user-level threads. For systems with user-level threads set, the scheduling is still performed in units of processes.

(2) Advantages

1) Thread switching does not need to be converted to kernel space;

2) The scheduling algorithm can be process-specific;

3) The implementation of user-level threads is independent of the platform of the operating system.

(3) Disadvantages

1) When a user-level thread executes a system call, not only that thread will be blocked, but all threads in the process will also be blocked. For kernel-level threads, other threads in the process can still run;

2) The scheduling of the CPU is performed in units of processes, while for kernel-level threads, scheduling is performed in units of threads.

3. Thread resources

Most of the resources of threads in a process are shared, but threads also have some of their own data.

1. Shared resources:

(1) Address space: By definition, all threads in the same process can see the same address space. For example, if a global variable is defined in a process, all threads in the process can see the global variable;

(2) File descriptor table: It is allocated in units of processes, and the principle is the same as (1);

(3) Processing methods of various signals;

(4) The current working directory;

(5) User ID and Group ID

2. Private resources:

(1) Thread ID: used to uniquely identify the thread;

(2) A set of registers (private context information): Because the thread is the entity of scheduling, the current state should be saved in the register when switching, for use when switching back in the future;

(3) Private stack: used to save temporary data (temporary variables generated when calling functions, etc.);

（4）errno；

(5) Signal mask word;

(6) Scheduling priority.

4. The difference between thread and process

1. A process is the basic entity responsible for system resources; a thread is the basic unit of scheduling.

2. A process is a basic unit with resources in the system; a thread itself does not own resources, but only some essential resources that can ensure its independent operation.

3. The cost of creating a new thread is much less than the cost of creating a new process. To create a process, you need to create a PCB, address space, page table, and allocate resources such as data on physical memory; while creating a thread only needs to create a PCB and allocate a small amount of resources to it.

4. Switching between threads requires much less work from the operating system than switching between processes. For example, switching between threads does not require switching address spaces, but processes do.

5. Destroying a thread is much less expensive than destroying a process. Destroying a thread only needs to recycle the PCB and a small amount of resources, while destroying a process needs to recycle all the PCBs in the process, that is, all the resources it owns.

6. When a program runs and a thread fails, the process to which the thread belongs (including other threads in the process) all hang up. The system recycles process resources. An error in one process will not affect other processes. So, the process is safer and the thread is more convenient.

7. Most of the resources between threads are shared. The processes are independent, so the mutual exclusion and synchronization of critical resources between threads requires more than between processes.

Linux - Thread Concept / Thread Control

Linux - Thread Concept / Thread Control

Guess you like