Python process and thread process vs. thread

Python study notes, special records, share with you, I hope it will be helpful to everyone.

Process vs. Thread

We introduced multi-process and multi-threading, which are the two most common ways to achieve multi-tasking. Now, let's discuss the advantages and disadvantages of these two methods.

First of all, to achieve multitasking, usually we will design the Master-Worker model, the Master is responsible for allocating tasks, and the Worker is responsible for executing tasks. Therefore, in a multitasking environment, there is usually one Master and multiple Workers.

If you use multiple processes to implement Master-Worker, the main process is the Master, and the other processes are the Workers.

If you use multiple threads to implement Master-Worker, the main thread is the Master, and the other threads are the Workers.

The biggest advantage of the multi-process mode is its high stability, because a child process crashes and it will not affect the main process and other child processes. (Of course, the main process hangs and all processes hang, but the master process is only responsible for assigning tasks, and the probability of hangs is low) The famous Apache first adopted the multi-process mode.

The disadvantage of the multi-process mode is the high cost of creating a process. Under Unix/Linux systems, it is okay to call with fork, but creating a process under Windows is expensive. In addition, the number of processes that the operating system can run at the same time is also limited. Under the limitations of memory and CPU, if there are thousands of processes running at the same time, even the scheduling of the operating system will be a problem.

Multi-threaded mode is usually faster than multi-process, but it is not much faster. Moreover, the fatal disadvantage of multi-threaded mode is that any thread hangs may directly cause the entire process to crash, because all threads share the memory of the process. On Windows, if there is a problem with the code executed by a thread, you can often see this prompt: "This program has performed an illegal operation and will be shut down". In fact, it is often a thread that has a problem, but the operating system will force it. End the whole process.

Under Windows, the efficiency of multi-threading is higher than that of multi-process, so Microsoft's IIS server adopts multi-threading mode by default. Due to the stability problem of multithreading, the stability of IIS is not as good as that of Apache. In order to alleviate this problem, IIS and Apache now have a multi-process + multi-threaded mixed mode, which really makes the problem more complicated.

Thread switching

Whether it is multi-process or multi-threaded, as long as the number is large, the efficiency will definitely not improve. Why?

Let's make an analogy. Suppose you are unfortunately preparing for the high school entrance exam. You need to do five homework in Chinese, mathematics, English, physics, and chemistry every night. Each homework takes 1 hour.

If you spend 1 hour doing Chinese homework first, after finishing it, spend 1 hour doing math homework, so that you can finish all of them one by one, and it will take 5 hours in total. This method is called a single-task model or a batch-processing task model.

Suppose you plan to switch to the multitasking model. You can do 1 minute of Chinese, then switch to math homework, do 1 minute, then switch to English, and so on. As long as the switching speed is fast enough, this method will execute with a single-core CPU. Multitasking is the same. From the perspective of a kindergartener, you are writing 5 homework at the same time.

However, switching homework comes at a price. For example, when switching from Chinese to math, you must first clean up the language books and pens on the table (this is called saving the scene), then open the math textbook and find the compass and straightedge (this is called preparing for a new environment). ) Before you can start doing math homework. The operating system is the same when switching processes or threads. It needs to save the current execution scene environment (CPU register state, memory page, etc.), and then prepare the execution environment of the new task (restore the last register state, switch Memory pages, etc.) before execution can begin. Although this switching process is fast, it also takes time. If there are thousands of tasks in progress at the same time, the operating system may be mainly busy switching tasks, and there is not much time to perform tasks. The most common situation is that the hard disk hums, the window does not respond, and the system is in a state of suspended animation.

Therefore, once the multitasking reaches a limit, it will consume all the resources of the system. As a result, the efficiency drops sharply, and all tasks cannot be done well.

Computationally intensive vs. IO intensive

The second consideration for whether to use multitasking is the type of task. We can divide tasks into computationally intensive and IO-intensive.

The characteristic of computationally intensive tasks is to perform a large number of calculations and consume CPU resources, such as calculating the pi ratio, and performing high-definition decoding of videos, etc., all relying on the computing power of the CPU. Although this kind of computationally intensive task can also be completed by multi-tasking, the more tasks, the more time spent on task switching, and the lower the efficiency of the CPU to perform tasks. Therefore, the most efficient use of the CPU, the more computationally intensive The number of simultaneous tasks should be equal to the number of CPU cores.

As computationally intensive tasks mainly consume CPU resources, the efficiency of code operation is very important. Scripting languages ​​such as Python have very low operating efficiency and are completely unsuitable for computationally intensive tasks. For computationally intensive tasks, it is best to write in C language.

The second type of task is IO-intensive. Tasks involving network and disk IO are all IO-intensive tasks. This type of task is characterized by low CPU consumption and most of the task is waiting for the completion of the IO operation (because The speed of IO is much lower than the speed of CPU and memory). For IO-intensive tasks, the more tasks, the higher the CPU efficiency, but there is a limit. Most of the common tasks are IO-intensive tasks, such as web applications.

During the execution of IO-intensive tasks, 99% of the time is spent on IO, and very little time is spent on the CPU. Therefore, it is completely impossible to replace a scripting language such as Python with a very fast running C language. Improve operational efficiency. For IO-intensive tasks, the most suitable language is the language with the highest development efficiency (least amount of code), scripting language is the first choice, and C language the worst.

Asynchronous IO

Taking into account the huge speed difference between CPU and IO, a task spends most of the time waiting for IO operations during execution. The single-process single-threaded model will cause other tasks to be executed in parallel. Therefore, we need a multi-process model. Or a multi-threaded model to support concurrent execution of multiple tasks.

Modern operating systems have made huge improvements to IO operations, and the biggest feature is that they support asynchronous IO. If you make full use of the asynchronous IO support provided by the operating system, you can use the single-process single-threaded model to perform multitasking. This brand-new model is called the event-driven model. Nginx is a web server that supports asynchronous IO. It is on a single-core CPU. The single-process model can efficiently support multi-tasking. On a multi-core CPU, you can run multiple processes (the number is the same as the number of CPU cores), making full use of the multi-core CPU. Since the total number of processes in the system is very limited, the operating system scheduling is very efficient. The use of asynchronous IO programming model to achieve multi-tasking is a major trend.

Corresponding to the Python language, the single-threaded asynchronous programming model is called a coroutine. With the support of a coroutine, you can write efficient multitasking programs based on event-driven. We will discuss how to write a coroutine later.

Welcome to pay attention to the public account "Web Development" , you can receive python test demo and learning resources, everyone learn python together, and collect the world's methods, which is convenient for you and me to develop .

I hope I can help you. If you have any questions, you can join the QQ technical exchange group: 668562416
If there is something wrong or insufficient, I also hope that readers can provide more comments or suggestions.
If you need to reprint, please contact me. You can reprint with authorization, thank you


Welcome to pay attention to the public account "Web Development"

image

Guess you like

Origin blog.csdn.net/qq_36478920/article/details/101759093