Summary of process and thread comparison under linux

        After learning the book Unix environment programming and some articles on the Internet, I made some summaries. The pictures are from the Internet (invasion and deletion), and I put the links to related articles at the end. 

 

Table of contents

1. Process

1. Definition and simple understanding

2. Some Notes

3. Advantages and disadvantages

4. Some commonly used related functions

2. Thread

1. Definition and simple understanding

2. Some Notes

3. Advantages and disadvantages

4. Some commonly used related functions (class usage process)

3. Comparative summary

4. Suggestions for use (one more multiplexing and coroutine here)


1. Process

1. Definition and simple understanding

  • Definition: A running program and the resources it occupies (CPU, memory, system resources, etc.) are called processes.
  • Understanding: You can also play NetEase Cloud Music and listen to songs while logging in to QQ and chatting with others. Here, QQ and NetEase Cloud Music are two processes. Closing one of them will not affect the operation of the other. 

2. Some Notes

(1) The difference between a process and a program: a program is a collection of data and instructions, a static concept , a bunch of codes, which can be stored in the system for a long time; a process is a process in which a program runs, and is a dynamic concept . A process has a life cycle, that is to say, the process will be destroyed with the termination of the program, and will not exist permanently in the system.

(2) The unique identifier of a process in the system is PID, which is a positive integer ranging from 1 to 32768, of which 1 is generally the special process init, and other processes are numbered sequentially from 2. When you run out of 32768, start over from 2.

(3) Pay attention to zombie processes. A zombie process is a process that has ended, but has not been removed from the process table. Too many zombie processes will cause the entry in the process table to be full, which will cause the system to crash, but it does not occupy system resources.

(4) Everyone has to pay attention to the problem of the process pool. The process pool is the resource process ,  the application of the technology that manages the composition of the process.

Define a pool, put a fixed number of processes in it , and use a process in the pool to handle tasks when there is a demand;

After the processing is completed, the process does not close , but puts the process back into the process pool and continues to wait for the task;

If there are many tasks to be executed and the number of processes in the pool is not enough, the task must wait for the previous process to complete the task and return, and then continue to execute after getting an idle process.

That is to say, the number of processes in the pool is fixed, so at most a fixed number of processes are running at the same time. This will not increase the scheduling difficulty of the operating system, but also saves the time of switching processes, and can achieve concurrency effects to a certain extent . 

 

3. Advantages and disadvantages

(1) Advantages:

  • Programming is simple and very easy to understand.
  • Since the address spaces of each process are isolated from each other, a process crash will not affect other processes.
  • Make full use of multi-core resources.

(2) Disadvantages:

  • The address spaces of each process are isolated from each other, and this advantage will also become a disadvantage, that is, it will become more difficult to communicate between processes. You need to use the interprocess communication (IPC, interprocess communications) mechanism. Think about it and you know now Which inter-process communication mechanism, and then let you implement it with code? Obviously, inter-process communication programming is relatively complex, and performance is also a big problem.
  • We know that the overhead of creating a process is greater than that of threads, and frequent creation and destruction of processes will undoubtedly increase the burden on the system.

4. Some commonly used related functions

  • fork(), vfork(): Create a child process. The fork() child process copies the data segment and stack segment of the parent process, while the vfork() child process shares the data segment with the parent process and does not copy. The execution order of the fork() parent-child process is uncertain. vfork ensures that the child process runs first. Before calling exec() or exit(), the data is shared with the parent process. After it calls exec*() or exit(), the parent process It may be scheduled to run, and if the child process depends on the further actions of the parent process before calling these two functions, it will cause a deadlock. It should also be noted that after vfork() successfully creates a child process, it is strictly forbidden to use return, and only exit() or exec family functions can be called, otherwise a segment error will be reported.
  • getpid(), getppid(): Get the PID. getpid() is to obtain the ID number of the parent process or the child process itself. The child process can use getppid() to obtain the ID number of the parent process, but there is no API function that allows the parent process to obtain the process ID of the child process.
  • exec*(): A series of functions used to execute another program
  • wait(), waitpid(): used to solve the dead process problem
  • system(): Used to execute another linux command, often used with snprintf()
  • popen(): Used to execute another linux command and return a pipe-based file stream, so that we can parse and extract the data we need line by line from the file stream

2. Thread

 

1. Definition and simple understanding

  • Definition: A thread is an execution path of a process. Under the Linux system, threads are usually called lightweight processes.
  • Understanding: Log in to QQ, click on multiple people in the list to chat with multiple people at the same time, at this time QQ is a process (it can also be called the main thread at this time), and each chat window in it is a thread of this process , closing one of the chat windows will not affect other chat windows, but if you exit the QQ program, these windows will all exit.

2. Some Notes

(1) Since threads share the process address space, communication between threads naturally does not require any communication mechanism, just read the memory directly. The overhead of thread creation and destruction is also reduced. You must know that threads are like hermit crabs. The house (address space) is all processes, and you are just a tenant. Therefore, it is very lightweight and the overhead of creation and destruction is also very small .

(2) Since threads share the process address space, this brings convenience to inter-thread communication but also brings endless troubles. It is precisely because the address space is shared between threads, so a thread crash will cause the entire process to crash and exit. At the same time, the communication between threads is simply too simple. It is so simple that the communication between threads only needs to read the memory directly, and it is so simple that there are no problems. It is extremely easy, deadlock, synchronization and mutual exclusion between threads, etc., these are extremely prone to bugs, and a considerable part of the precious time of countless programmers is used to solve the endless problems caused by multi-threading . Although threads also have disadvantages, threads have more advantages than multi-processes, but it is impractical to solve high-concurrency problems simply by using multi-threads . Because although the thread creation overhead is smaller than that of the process, it still has overhead. For a high-concurrency server with tens of thousands or hundreds of thousands of connections, creating tens of thousands of threads will cause performance problems, including memory usage, inter-thread Switching, that is, the overhead of scheduling.

 

3. Advantages and disadvantages

(1) Advantages:

  • Thread execution overhead is small
  • The speed of thread generation is fast, the communication between threads is fast, and the switching is fast, etc., because the threads are in the same address space.
  • The resource rate of threads is better, also because they are in the same address space.

(2) Disadvantages:

  • Thread execution overhead is small
  • Multi-process Each process has its own address space, and threads share the address space
  • Threads need to use a synchronization mechanism when using public variables/ memory, and learn how to use mutexes

4. Some commonly used related functions (class usage process)

  • pthread_attr_init(&thread_attr); //Initialize the parameter structure, and then set the attribute
  • pthread_attr_setstacksize(&thread_attr, 120*1024); //Set the stack size (local variables are kept on the stack). Generally, the default stack size is sufficient. If there are many, many local variables and the stack size is not enough, use this function to modify it.
  • Pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_DETACHED); //After the child thread is created, it defaults to rendezvous mode. Use this function and the corresponding parameters to set it to phase separation mode. The main thread does not need to wait for the child thread to merge with it. Once the child thread finishes running, it enters the terminated state. It is automatically destroyed to release system resources.
  • Pthread_create(&tid, &thread_attr, thread_worker1, &shared_var); //create child thread
  • After creation, the id number of the sub-thread is placed in tid, thread_attr is the attribute set by the sub-thread, whread_worker1 is the function to be executed by the sub-thread (usually a custom function), and shared_var is the parameter passed to the function (if it is What should I do with multiple parameters? You can pass a structure, if you don’t want this parameter, pass NULL)
  • Pthread_attr_destroy(&thread_attr); //After setting the attributes of the child thread, call this function to destroy the attribute structure, which corresponds to init.
  • Pthread_exit(NULL); //The exit() function cannot be called in the child thread, which will cause the entire process to exit.

3. Comparative summary

        A process is the smallest unit of resource management, and a thread is the smallest unit of program execution. Processes are cells, and threads are equivalent to elements. Creating processes and threads both takes up resources and takes time.

        A process can have many threads, and each thread performs different tasks in parallel. Threads can share all resources in the process, or all resources of the main thread, but the resources between threads are independent of each other and cannot be accessed and used. .

        There is no fixed order of who executes first between the parent process and the child process, the main thread and the child thread . Which process executes first depends on the process scheduling policy of the system. If it is necessary to ensure that the parent process or child process is executed first, the programmer needs to implement it himself through the inter-process communication mechanism (IPC) in the code, and the thread should pay attention to the use of mutexes.

        In a thread (compared to a process), a thread is a concept closer to the execution body. It can share data with other threads of the same process, but has its own stack space and an independent execution sequence. Both of these can improve the concurrency of the program, improve the efficiency of the program and the response time.

        Threads can improve the performance of applications dealing with blocking situations such as file I/O or socket I/O in a multi-core environment. In a Unix system, a process consists of many things, including executable programs and a lot of resources such as file descriptor address space. In many cases, data needs to be exchanged between different pieces of code that perform related tasks. If a multi-process method is adopted, the creation of a process takes a larger time slice than that of a thread. In addition, the communication between processes is more troublesome, requiring frequent switching between user space and kernel space, which is very expensive. But if you use multi-threading, because you can use shared global variables, the communication (data exchange) between threads becomes very efficient.

        Multi-threading is an intersection. Multi-threading is a plane traffic system. The cost is low, but there are many traffic lights and traffic jams. The multi-process is an overpass. Although the cost is high and the uphill and downhill consume more fuel, there is no traffic jam.

4. Suggestions for use (one more multiplexing and coroutine here)

To serve multiple clients at the same time, there are three main technologies:

1. Multi-process, each process space is independent of each other and has no significant impact on each other. Only the process with the current number of CPUs can be opened. To execute another program in the program, this method must be used. It can be used to process computing tasks and has the largest overhead. .

2. If parallelism is not necessary (for example, you don’t care much about blocking, and there are not many public resources to pay attention to), then you can consider using threads concurrently, and the unit overhead is much smaller than that of processes.

    Thread: concurrent (polling scheduling, switch when encountering blocking)

    As long as it is a network, there will be a delay, and if there is a delay, it will be blocked, so it is better than the general single channel.

3. If polling is not necessary, consider whether it is only necessary to switch when encountering blocking.

    At this time, IO multiplexing technology + coroutine can be used to realize blocking switching, which consumes few resources and has the highest concurrency. I will also write a summary article after multiplexing.

Links to related articles:

The love-hate entanglement between Linux system processes and threads

There is also a public account related article called "Code Farmer's Desert Island Survival", the link is easy to hang up, sorry

Guess you like

Origin blog.csdn.net/qq_51368339/article/details/129391738