Introduction to Python Multithreading and Multiprocessing, Coroutines for Python Web Study Notes

What exactly are processes and threads? How to use processes and threads? In what scenarios do you need to use processes and threads? What is a coroutine? What are the relationships and differences between coroutines and threads?

 

Program switching - allocation of CPU time

First of all, any of our programs needs to run in an operating system, such as Windows XP, RedHat Linux, FreeBSD, AIX, etc.;

Secondly, there are not only one program running in the operating system, but hundreds of programs with different functions, such as keyboard driver, monitor driver, HTTP service, game, chat, web page...;

Finally, resources such as CPU are limited. Among hundreds of programs, it is impossible for each program to occupy a CPU to run, and it is impossible for each program to run only once for a short time;

So how to allocate a certain amount of resources such as CPU, memory, etc. to the application?

 

Implemented by program switching

It means that the operating system automatically allocates some CPU/memory/disk/keyboard/display and other resource usage time for each program, and automatically switches to the next program after expiration.

Of course, if the switched program has not been executed, its state will be saved so that it can continue to be executed the next time the polling arrives.

In practice, this switching is so fast (on the order of milliseconds) that we don't feel it, as if the computer is naturally executing multiple pieces of software at the same time.

 

process

Process, is the first way of this "program switching".

definition

A process is a computer program in execution. That is to say, when each code is executed, it is itself a process first.

A process has: ready, running, interrupted, dead, terminated and other states (different operating systems are different).

use

  1. User writes code (the code itself runs as a process)

  2. Start the program and enter the process "ready" state

  3. The operating system schedules resources and performs "program switching" to make the process enter the "running" state

  4. end/interrupt

    1. After the program is executed, it will enter the "end" state

    2. The program has not been executed, but the operating system meets the requirements of "program switching", enters the "interrupted" state, and waits for the next scheduled execution.

characteristic

  • Every program, itself first and foremost a process

  • Each running process has its own address space, memory, data stack and other resources.

  • The operating system itself automatically manages all processes (without user code intervention), and allocates execution time to these processes reasonably.

  • Processes can perform other tasks by forking new processes, but each process still has its own memory and data stack.

  • Inter-process communication (messages and data) can be used by means of inter-process communication (IPC).

illustrate

  • Multiple processes can run on different CPUs without interfering with each other

  • On the same CPU, multiple processes can run, and the time slice is automatically allocated by the operating system

  • Since inter-process resources cannot be shared, inter-process communication is required to send data, receive messages, etc.

Multiprocessing, also known as "parallel".

More information

    Process acquisition under Linux

 

thread

Threads are also a way of "program switching".

definition

A thread is the code that executes in a process.

A process can run multiple threads, and these threads share the operating system resources requested in the main process.

When multiple threads are started in a process, each thread is executed sequentially. In the current operating system, thread preemption is also supported, that is to say, other threads waiting to run can suspend the running thread through priority, signal, etc., and run it first.

use

  1. User writes programs that contain threads (each program is itself a process)

  2. Operating system "program switch" into the current process

  3. If the current process contains a thread, start the thread

  4. Multiple threads execute sequentially unless preempted

characteristic

  • Thread, must start running in an existing process

  • The thread uses the system resources obtained by the process, and does not need to apply for CPU and other resources like the process does

  • Thread cannot be given fair execution time, it can be preempted by other threads, and the process allocates execution time according to the settings of the operating system

  • In each process, many threads can be started

illustrate

Multithreading, also known as "concurrent" execution.

Read more:

Thread acquisition method under Linux

 

difference between process and thread

Each thread in a process shares the same resources as the main process, and it is easier to share and communicate information between threads (all in the process, and share memory, etc.) compared to being independent of each other.

Threads are generally executed concurrently, and it is precisely because of this concurrency and data sharing mechanism that cooperation among multitasking is possible.

Processes are generally executed in parallel, which enables programs to run on multiple CPUs at the same time;

Different from multiple threads that can only run within the "time slice" applied for by the process (a process in a CPU starts multiple threads, and the thread scheduling shares the executable time slice of the process), the process can truly realize the program's execution. Running "simultaneously" (multiple CPUs running at the same time).

 

Common application scenarios for processes and threads

In general, experience writing concurrent programs in Python:

  1. Computationally intensive tasks use multiple processes

  2. IO-intensive (such as: network communication) tasks use multithreading, and rarely use multiprocessing.

This is because IO operations require exclusive resources, such as:

  1. 网络通讯(微观上每次只有一个人说话,宏观上看起来像同时聊天)每次只能有一个人说话

  2. 文件读写同时只能有一个程序操作(如果两个程序同时给同一个文件写入 'a', 'b',那么到底写入文件的哪个呢?)

都需要控制资源每次只能有一个程序在使用,在多线程中,由主进程申请IO资源,多线程逐个执行,哪怕抢占了,也是逐个运行,感觉上“多线程”并发执行了。

如果多进程,除非一个进程结束,否则另外一个完全不能用,显然多进程就“浪费”资源了。

当然如上解释可能还不足够立即理解问题所在,让我们通过不断的实操来体验其中的“门道”。

 

协程

协程,也是”程序切换“的一种。

这里提一个特殊的“线程”,也就是协程的概念。

定义

简单说,协程也是线程,只是协程的调度并不是由操作系统调度,而是自己”协同调度“。也就是”协程是不通过操作系统调度的线程“。当然,实际要比这更复杂一些,本课程不研究协程技术,对于这个很有挑战的技术,在我们完全掌握了进程线程后,自然会理解问题渊源。

协程,又称微线程。

说明

协程的主要特色是:

协程间是协同调度的,这使得并发量数万以上的时候,协程的性能是远远高于线程。

注意这里也是“并发”,不是“并行”。

 

参考

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324875245&siteId=291194637