[Linux] The concept of thread and the difference from process

Table of contents

background knowledge

What are threads?   

The difference between process and thread 

Advantages and disadvantages of threads


background knowledge

        Before understanding threads, we must first know that OS can divide processes into fine-grained divisions !

For example, what we call the heap area         in the process address space , it has a start and end in the mm_structz in the process PCB to indicate the start and end positions of the entire heap area, but every time we malloc applies for a space on the heap, There will be a        structure named  vm_area struct to represent this space. This structure also contains start and end, but it represents the start and end positions of the space requested. Then after applying for multiple spaces, they will be organized using a double-linked list. Just like the picture below:


        We need to know that the exe executable program we usually run is essentially a file, and its interior follows the following rules:

       The executable program is compiled according to the address space of the process , as shown in the right part of the above figure, that is, the address of each code has been determined when compiling.

        Executable programs are internally divided into blocks of 4KB in size , and each block is called a page frame.

        The physical memory is also divided into memory blocks with a size of 4KB . Each memory block is called a page frame , and the attributes of each memory block are described by the struct page structure.

        In virtual memory management, programs usually use virtual memory addresses to access data, and the virtual address space is divided into fixed-size pages (usually 4KB, or page frames), which are loaded from disk to physical memory when needed , form a continuous physical page frame (Page Frame). When a program accesses a virtual memory page to access physical memory through the page table , if the page is not currently in physical memory, a page fault interrupt will be triggered.

       At this time, the operating system will apply for a space in the physical memory, and then read the data in the disk into the page frame in the memory, and then update the page table to update the mapping relationship between the virtual memory and the physical page frame. In this way, the program can continue to execute, and the operating system can perform page replacement and memory management as needed to ensure the performance and availability of the system.

        The detailed process can be seen in the figure below:


What are threads?   

        Process: A process is a running instance of a program. It is an independent execution environment with independent memory space, including the code, data and execution status of the program. Each process runs in an independent memory space and does not directly share memory with each other.              

        Thread : A thread is a part of a process and an execution unit within a process . It shares the memory space and resources of the process, including code segments, data segments, and open files.

        

         There is no real thread under Linux, because it is simulated with the PCB structure (task_struct) of the process!

        A PCB from the perspective of the CPU in Linux may only be an execution flow of this process, and this process may have multiple execution flows, that is, multiple threads (multiple PCBs), so it will be more lightweight. In other OS, one process enjoys an independent resource, while in Linux, multiple threads share resources.        

        Therefore, processes and threads are collectively referred to as lightweight processes.

        Although it is collectively referred to as a lightweight process, there must be a thread (process) before there are other threads. This thread is called the main thread, and this main thread can be understood as a process.

        The main thread is a special thread in a process, usually the first thread automatically created by the operating system when the process is created. It is responsible for the entry point of the execution program and is responsible for the initialization of the process. The context information of the main thread will be recorded in the PCB corresponding to the process. In the PCB ( task_struct) of this process, in addition to the context information of the main thread, it also includes the context information of other threads.

        From the perspective of our users , we only see the process (main thread), and each process contains independent address space, code data, etc.

        But from the perspective of the kernel , a process has multiple execution flows (thread [task_struct]) inside , and it only recognizes task_struct, so it is the basic unit of OS operation scheduling , and the code we wrote before has only one execution inside stream (task_struct), so from a kernel perspective it's just a special case of a process.

        Under Linux, a process has only one execution flow, which is equivalent to the process under other OS

                A process has multiple execution streams, which is equivalent to multi-threading under other OS.


In summary:

        Threads are part of a process and share process resources. But also has its own part of the data.

        A process is the basic unit of resource allocation, and a thread is the basic unit of OS execution and scheduling.


        This is very abstract, and things are not easy to explain. Below I searched for some differences from the Internet, and I feel that it is more helpful to understand the distinction between threads and processes.

        So in Linux, the distinction between processes and threads is based on their execution environment in user space and the resources they share.

  1. Execution environment : A process is an independent execution environment with its own address space and other resources, including global variables, file descriptor tables, etc. By calling a system call (such as fork()or exec()), a new process can be created, which uses  task_struct a description and has its own address space.

            A thread is an execution flow created within the same process , which shares the same address space and most of the resources as the main thread. Threads are created by invoking   system calls and share   data structures owned by the main thread. Each thread has its own stack space and scheduling information, but they share code segments, global variables, file descriptor tables, etc.clone()task_struct

  2. Resource sharing : Threads share the same address space and most resources, including global variables, heap memory, open files, etc. This also means that the communication between threads is more efficient than that between processes, because they can directly access the shared memory without copying and passing data through the inter-process communication mechanism.


The difference between process and thread 

Let me summarize the difference between processes and threads:

  • A process is the basic unit of resource allocation, and a thread is the basic unit of operating system execution and scheduling.
  • A process has an independent address space, and a thread does not have a separate address space (threads in the same process share the address space of the process)
  • There is a more convenient communication mechanism between threads. Since the data space is shared between threads, the data of one thread can be directly used by other threads. The inter-process is an independent space, which needs to be shared by various communication methods.

Advantages and disadvantages of threads

advantage:

  • Creation: The cost of creating a new thread is much lower than creating a new process, because there is no need to create a separate memory space.
  • Occupying resources: Threads occupy much less resources than processes.
  • Efficiency: Threads share the same resources in the same process, including memory space, global variables, etc. This can reduce the repeated allocation of resources and improve resource utilization
  • Concurrency: The concurrency of threads enables simultaneous execution of multiple tasks, thereby improving the concurrency and performance of applications.

shortcoming:

  • Lack of access control: the process is the basic granularity of access control, calling some OS functions in a thread will affect the entire process
  • Performance penalty: Threads that are rarely blocked by external events often fail to share the same processor with other threads. That is, additional synchronization and scheduling overhead is added, while the available resources remain unchanged.
  • Programming Difficulty: Writing and debugging a multi-threaded program is much more difficult than a single-threaded program

Here is just a general explanation of the concept of thread and the difference between process and thread, specific use, thread control, etc. We will explain it in the next chapter.

Guess you like

Origin blog.csdn.net/weixin_47257473/article/details/132215643