GIL global interpreter lock

an introduction

'''
 Definition:
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple 
native threads from executing Python bytecodes at once. This lock is necessary mainly 
because CPython’s memory management is not thread-safe. (However, since the GIL 
exists, other features have grown to depend on the guarantees that it enforces.)
'''
 Conclusion: In the Cpython interpreter, multi-threading under the same process can only be executed by one thread at the same time, which cannot take advantage of multi-core

The first thing to be clear is that it is GILnot a feature of Python, it is a concept introduced when implementing the Python parser (CPython). Just like C++ is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as GCC, INTEL C++, Visual C++, etc. The same is true for Python. The same piece of code can be executed through different Python execution environments such as CPython, PyPy, and Psyco. Like JPython, there is no GIL. However, because CPython is the default Python execution environment in most environments. Therefore, in the concept of many people, CPython is Python, and it is taken for granted that it is GILattributed to the defects of the Python language. So let's make it clear first: GIL is not a feature of Python, and Python can be completely independent of GIL

Two GIL introduction

The essence of GIL is a mutual exclusion lock. Since it is a mutual exclusion lock, the essence of all mutual exclusion locks is the same. They all turn concurrent operation into serial, so as to control that shared data can only be modified by one task at the same time. , thereby ensuring data security.

One thing is certain: to protect the security of different data, you should add different locks.

To understand the GIL, first determine one thing: every time a python program is executed, a separate process is spawned. For example python test.py, python aaa.py, python bbb.py will spawn 3 different python processes

'''
 #Verify that python test.py will only spawn one process
#test.py content
import the team
print(os.getpid())
time.sleep(1000)
'''
python3 test.py 
#Under windows
tasklist |findstr python
#under linux
ps to | grep python

Verify that python test.py will only spawn one process
View Code

In a python process, there are not only the main thread of test.py or other threads started by the main thread, but also interpreter-level threads such as garbage collection started by the interpreter. In short, all threads run in this process inside, without a doubt

# 1 All data is shared, among which, code as a kind of data is also shared by all threads (all code of test.py and all code of Cpython interpreter)
For example: test.py defines a function work (the content of the code is as shown in the figure below), and all threads in the process can access the work code, so we can open three threads and then target all point to the code, which means that it can be accessed implement.

# 2 The tasks of all threads need to pass the code of the task as a parameter to the code of the interpreter to execute, that is, if all threads want to run their own tasks, the first thing they need to solve is the code that can access the interpreter.

In summary:

If target=work of multiple threads, then the execution flow is

Multiple threads first access the code of the interpreter, that is, get the execution permission, and then pass the code of the target to the code of the interpreter for execution

The code of the interpreter is shared by all threads, so the garbage collection thread may also access the code of the interpreter to execute, which leads to a problem: for the same data 100, thread 1 may execute x=100 at the same time, while Garbage collection performs the operation of recycling 100. There is no clever way to solve this problem, that is, locking processing, as shown in the GIL below, to ensure that the python interpreter can only execute the code of one task at a time

Three GIL and Lock

The GIL protects interpreter-level data, and to protect the user's own data, you need to lock it yourself, as shown in the following figure

Quad GIL with multithreading

With the existence of the GIL, only one thread in the same process is executed at the same time

Hearing this, some students immediately asked: the process can use multi-core, but the overhead is high, while the multi-threading overhead of python is small, but it cannot take advantage of the multi-core advantage, which means that python is useless, and php is the most powerful language. ?

Don't worry, I haven't finished speaking yet.

To solve this problem, we need to agree on several points:

# 1. Is the cpu used for computation or I/ O?

# 2. Multiple CPUs, which means that multiple cores can complete the calculation in parallel, so the multi-core improvement is the calculation performance

# 3. Once each cpu encounters I/O blocking, it still needs to wait, so multi-core is useless for I/O operations

A worker is equivalent to a CPU. At this time, the calculation is equivalent to the work of the worker, and the I/O blocking is equivalent to the process of providing the raw materials required for the worker to work. If there is no raw material during the worker's work, the worker is working. Need to stop until waiting for the arrival of raw materials.

If most of the tasks in your factory involve the process of preparing raw materials (I/O intensive), then no matter how many workers you have, it doesn't make much sense. to do other work,

Conversely, if your factory has complete raw materials, then of course, the more workers, the higher the efficiency.

 

in conclusion:

  For computing, more CPUs are better, but for I/O, more CPUs are useless

  Of course, for running a program, the execution efficiency will definitely improve with the increase of cpu (no matter how much the improvement is, it will always improve), this is because a program is basically not pure calculation or pure I/O , so we can only look at whether a program is computationally intensive or I/O intensive, so as to further analyze whether python's multithreading is useful or not.

#Analysis:
We have four tasks to be processed. The processing method must be to play a concurrent effect. The solution can be:
Option 1: Open four processes
Option 2: Under one process, open four threads

#In the case of a single core, analysis results:
  If the four tasks are computationally intensive, and there is no multi-core for parallel computing, the first solution increases the cost of creating a process, and the second solution wins.
  If the four tasks are I/O intensive, the first solution The overhead of creating a process is large, and the switching speed of the process is far less than that of the thread. The second solution wins

#In the case of multi-core, analysis results:
  if the four tasks are computationally intensive, multi-core means parallel computing. In python, only one thread executes at the same time in a process and does not need multi-core. The solution wins
  if the four tasks are I/ O-intensive, no amount of cores can solve the I/O problem, the second solution wins


#Conclusion: Nowadays, computers are basically multi-core. The efficiency of multi-threading python for computationally intensive tasks does not bring much performance improvement, and it is not even as good as serial (without a lot of switching). However, for IO-intensive tasks The task efficiency has been significantly improved.

 

Five multi-threaded performance tests

from multiprocessing import Process
from threading import Thread
import the team
def work():
    res=0
    for i in range(100000000):
        res*=i


if __name__ == '__main__':
    l=[]
    print(os.cpu_count()) #This machine is 4 cores
    start=time.time()
    for i in range(4):
        p =Process(target= work) #takes more than 5s
        p =Thread(target= work) #takes more than 18s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop=time.time()
    print('run time is %s' %(stop-start))

Computationally intensive: multi-process is efficient
View Code
from multiprocessing import Process
from threading import Thread
import threading
import the team
def work():
    time.sleep(2)
    print('===>')

if __name__ == '__main__':
    l=[]
    print(os.cpu_count()) #This machine is 4 cores
    start=time.time()
    for i in range(400):
        # p =Process(target= work) #It takes more than 12s, most of the time is spent on creating the process
        p =Thread(target= work) #takes more than 2s
        l.append(p)
        p.start()
    for p in l:
        p.join()
    stop=time.time()
    print('run time is %s' %(stop-start))

I /O intensive: multi-threading is efficient
View Code

application:

Multithreading is used for IO intensive, such as socket, crawler, web
multiprocessing is used for computing intensive, such as financial analysis

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324982405&siteId=291194637