Concurrent programming: GIL, thread pool, process pool, blocking, non-blocking, synchronous, asynchronous

一 GIL（global interpreter lock）

GIL is called global interpreter lock in Chinese. When we execute a file, a process will be generated. Then we know that the process is not a real execution unit, but a resource unit, so there are interpreters (cpython) and py files in the process, that is, the interpreter. The file that needs to be interpreted, that is, the file that the CPU is going to run.

GIL：

The GIL is essentially a mutex, a mutex added to the interpreter.

In the same process, all threads must grab the GIL lock before executing the interpreter code. 

Pros: 

Guarantees thread safety for cpython interpreter memory management. It is guaranteed that only one thread is running at a time. 

Disadvantages: 

All threads in the same process can only have one execution at a time, and the running efficiency is low. That is to say, the multi-threading of the cpython interpreter cannot achieve parallelism, but it can achieve concurrency. 
The principle of implementing concurrency: a thread switches threads when it encounters an I/O operation, and at the same time forcibly releases the GIL of the thread for use by other threads.

GIL and mutex:

       1. 100 threads go to grab the GIL lock, that is, grab the execution permission
        2. There must be a thread that grabs the GIL (for the time being called thread 1), and then starts to execute, and once executed, it will get lock.acquire()
       3. Extremely It is possible that thread 1 has not finished running, and another thread 2 grabs the GIL and starts running, but thread 2 finds that the mutex lock has not been released by thread 1, so it blocks and is forced to hand over the execution permission, 
that is, release it . GIL
       4. Until thread 1 grabs the GIL again, it starts executing from the last paused position until the mutex lock is released normally, and then other threads repeat the process of 2 3 4

Three points to pay attention to:
# 1. The thread grabs the GIL lock. The GIL lock is equivalent to the execution authority. After the execution authority is obtained, the mutex Lock can be obtained. Other threads can also grab the GIL, but if it is found that the Lock is still If it is not released, it will be blocked. Even if the execution permission GIL is obtained, it must be handed over immediately.

# 2. Join is to wait for everything, that is, the overall serial, and the lock only locks the part that modifies the shared data, that is, the partial serial. The fundamental principle to ensure data security is to make concurrency become serial, join and mutual exclusion lock All can be achieved, there is no doubt that the partial serial efficiency of the mutex is higher

# 3. Be sure to read the classic analysis of GIL and mutex at the end of this section

GIL VS Lock

Wise students may ask this question, that is, since you said it before, Python already has a GIL to ensure that only one thread can execute at the same time, why is lock needed here?

First we need to reach a consensus: the purpose of locks is to protect shared data, and only one thread can modify the shared data at a time

Then, we can conclude that different locks should be added to protect different data.

Finally, the problem is very clear. GIL and Lock are two locks, and the data they protect are different. The former is at the interpreter level (of course, it protects the data at the interpreter level, such as garbage collected data), while the latter is at the interpreter level. For the data of the application developed by the user, it is obvious that the GIL is not responsible for this, and only the user can customize the lock processing, that is, Lock

Process analysis: all threads grab the GIL lock, or all threads grab the execution permission

Thread 1 grabs the GIL lock, gets the execution permission, starts executing, and then adds a Lock, but the execution has not been completed, that is, thread 1 has not released the Lock, it is possible that thread 2 grabs the GIL lock, starts execution, and is in the process of execution. It is found that the Lock has not been released by thread 1, so thread 2 is blocked and the execution permission is taken away. It is possible that thread 1 gets the GIL, and then executes normally until the Lock is released. . . This leads to the effect of running serially

Since it is serial, then we execute

t1.start()

t1.join

t2.start()

t2.join()

This is also serial execution. Why do you need to add Lock? You need to know that join is waiting for all the codes of t1 to be executed, which is equivalent to locking all the codes of t1, and Lock only locks a part of the code that operates the shared data.

Because the Python interpreter automatically and regularly performs memory reclamation for you, you can understand that there is a separate thread in the python interpreter, which wakes up and does a global poll every once in a while to see which memory data can be emptied. At this time, the thread in your own program and the thread of the py interpreter are running concurrently. Suppose your thread deletes a variable, and the garbage collection thread of the py interpreter is in the process of clearing the variable. Other threads just re-assigned the memory space that has not been emptied in time. As a result, the newly assigned data may be deleted. In order to solve similar problems, the python interpreter simply and rudely adds a lock, that is, when a When the thread is running, no one else can move, which solves the above problem, which can be said to be the legacy of the early version of Python.

The relationship between GIL and lock and join

Let's further understand this process through a diagram:

Two thread pools and process pools:

First of all, let's be clear: neither process nor thread can be opened infinitely! ! ! ! Our memory is certain.

So what is the role of the pool: it is used to limit the number of concurrent tasks, and to limit our computer to execute tasks concurrently within a range that we can control.

In-pool process: for computationally intensive.

In-pool threads: for IO-intensive.

from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor
import time 

def task():
    print('%s is running')
    time.sleep(2)
if __name__ == '__main__':
    for i in range(10):
        Process pool: p =ProcessPoolExecutor(max_workers=5) #When we do not fill in the value in it, the number of processes opened by default is the number of CPU cores. Our general CPU is 4 cores. 
        
        Thread pool: p =ThreadPoolExecutor(max_workers=10) #When we do not fill in the value, the number of threads opened by default is five times the number of CPU cores, that is: cpu*5

The running status of the program:

Blocking: The program running process encounters I/ O, the program becomes blocked, and CPU resources are released.

Non-blocking:

In the running state or the ready state: no I /O is encountered or the program will not stop in place even if it encounters I/O by some means, continue to perform other operations, and occupy as much CPU as possible.

Two ways to submit a task: or two ways to call a task

Synchronization: low efficiency 
, wait in place after submitting the task, and continue to execute the next line of code until the return value of the task is obtained after the task is completed, emphasizing one point. Synchronous waiting is different from blocking waiting. 
Synchronous waiting may also encounter I/ O operations, but it is not blocking, and the program is still running, because the program is still performing synchronous tasks at this time.

Asynchronous: After submitting the task with high efficiency 
, it does not wait in place, and directly executes the next line of code. When the task finishes running, it will notify the asynchronous and the task ends.

Concurrent programming: GIL, thread pool, process pool, blocking, non-blocking, synchronous, asynchronous

Guess you like