Linux multithreading (1): What is a thread?

I. Introduction

What are threads? Operating system books may give you such an explanation and definition:

  • A thread is a flow of execution that runs inside a process
  • The execution of threads is finer than that of processes, and the scheduling cost of threads is lower
  • Thread is the basic unit of CPU scheduling

 Obviously these words are universal and applicable to any operating system. The speech is too general, it is really "listening to what you have to say, like listening to what you have to say", and I still don't have a clear understanding of threads. But in this blog, we don’t talk about anything high-level, I will use the most simple language to let everyone understand what threads under Linux look like, and how we should create and control multi-threading

Second, what is a thread?

 Before learning threads, everyone must have learned processes. In a process, we can use the fork() function to create a child process (see below). The code and data between the parent and child processes are shared (if the data is not modified), we use the if judgment statement to let the parent and child processes execute different codes respectively, so we can initially form the following understanding: the same space can be divided for different execution flows , make it execute different code .

int main()
{
    
    
    pid_t pid = fork();
    if(pid == 0){
    
    
        // 子进程代码
    }
   else {
    
    
        // 父进程代码
    }
}

 Based on this understanding, we can't help thinking, can we divide the process address space of a process? A thread is essentially a "lightweight process" , that is, it only executes part of the code of a process, occupies part of the process address space, and occupies part of the physical memory.

3. How is the thread implemented?

 Unlike the Windows operating system, Linux does not have threads in the true sense. Threads in Linux essentially reuse the data structure of the process - we also create task_struct, but do not allocate process address space and page table independently, so each thread sees the same process address space, but only executes part of the code and uses part physical space.
 The basic unit of CPU scheduling is PCB (task_struct in Linux). When the CPU schedules a process, it sees a PCB; when the CPU schedules a thread, it still sees a PCB and uses the same process. address space. Does it make a difference from a CPU point of view? This doesn't make any difference, in Linux there is no distinction between threads and processes, only execution flow . But based on our above description of "lightweight process", you can easily understand that the process address space and page table behind the task_struct of the thread must be a small part of the previous process address space and page table.
 The CPU can schedule each task_struct in turn, so that the original serial code can be advanced at the same time at the same time . This solution is called a thread.
insert image description here
 Linux reuses PCB (process control block) to simulate TCB (thread control block), so TCB in Linux is PCB. This is a very elegant design solution. Here are a few advantages for you:

  • No need to design TCB separately
  • No need to maintain the relationship between TCB and PCB
  • No need to write TCB scheduling algorithm separately
  • ……

Let's re-understand what is a process and what is a thread:

  • From the perspective of the kernel, a process is the basic entity responsible for allocating system resources
  • Thread is the basic unit of scheduling

4. Basic concepts

Let's understand the explanation and analysis of threads in the textbook mentioned before:

  1. A thread is an execution flow that runs inside a process
    [Understanding]: Because a thread runs inside the process address space of a process, it is executed inside the process
  2. The execution of threads is more detailed than that of processes
    [understanding]: because threads only execute part of the code of the process and access part of the data of the process
  3. The thread scheduling cost is lower
    [Understanding]: Threads share the same set of process address space, so when threads switch, there is no need to switch process address space and page table, only need to switch context data
  4. Thread is the basic unit of CPU scheduling
    [Understanding]: CPU only looks at PCB when scheduling. A PCB corresponds to a thread

5. Postscript

 What are the advantages and disadvantages of threads? Talking about theory alone must be difficult for everyone to understand, and it will be gradually presented to you later. How to create and use threads will be analyzed in depth in the next blog

Guess you like

Origin blog.csdn.net/whc18858/article/details/128208886