Tasteless multithreading under the GIL mechanism of Python Web study notes

Why do some people say that Python multithreading is tasteless? Someone asked such a question on Zhihu. In our common sense, multi-process and multi-threading make full use of hardware resources to improve the running efficiency of programs through concurrency. How come they become tasteless in Python?

Some classmates may know the answer because of the notorious GIL in Python.

So what is GIL? Why is there a GIL? Is multithreading really tasteless? Can the GIL be removed? With these questions, we look down together, and you need a little patience.

Is multi-threading tasteless? Let's do an experiment first. The experiment is very simple, that is, decrement the number "100 million", and the program will terminate when it is reduced to 0. If we use a single thread to execute this task, what will the completion time be? How much would it be to use multithreading? show me the code.

def decrement(n):
    while n > 0:
        n -= 1

single thread

 

import time

start = time.time()
decrement(100000000)
cost = time.time() - start
>>> 6.541690826416016

 

On my 4-core CPU computer, the time taken for a single thread is 6.5 seconds. Some people may ask, where is the thread? In fact, when any program is running, there will be a main thread executing by default.

Multithreading

 

import threading

start = time.time()

t1 = threading.Thread(target=decrement, args=[50000000])
t2 = threading.Thread(target=decrement, args=[50000000])

t1.start() #Start the thread and execute the task 
t2.start() #Same as above 
t1.join() #The main thread is blocked until the execution of t1 is completed, and the main thread continues to execute 
t2.join() #Same as above cost 
= time. time() - start



>>>6.85541033744812

 

Create two sub-threads t1 and t2, and each thread performs 50 million subtraction operations. After both threads are executed, the main thread terminates the program. As a result, it was 6.8 seconds for the two threads to execute in a cooperative manner, which was actually slower. It stands to reason that two threads run on two CPUs in parallel at the same time, and the time should be halved. Now it does not decrease but increases.

What is the reason that multi-threading is not fast and slow?

The reason is that GIL, in the Cpython interpreter (the mainstream interpreter of the Python language), has a Global Interpreter Lock (Global Interpreter Lock). When the interpreter interprets and executes Python code, it must first obtain this lock, which means that any Only one thread may be executing code at a time. If other threads want to obtain CPU execution code instructions, they must first obtain the lock. If the lock is occupied by other threads, then the thread can only wait until the thread that occupies the lock It is possible to execute code instructions only when the lock is released.

Therefore, this is why the execution of two threads together is slower, because at the same time, only one thread is running, and the other threads can only wait. Even with a multi-core CPU, there is no way for multiple threads to be "parallel" at the same time. Executing code can only be executed alternately, because multi-threading involves on-line text switching, lock mechanism processing (acquiring locks, releasing locks, etc.) , so multi-thread execution is not fast but slow.

When is the GIL released?

The GIL is released when a thread encounters an I/O task. The GIL is also released when a compute-intensive (CPU-bound) thread executes 100 ticks of the interpreter (the ticks can be roughly regarded as instructions of the Python virtual machine). The step length can be set by sys.setcheckinterval(), and the step length can be viewed by sys.getcheckinterval(). Compared with single thread, these are the extra overhead brought by multi-threading.

Why is the CPython interpreter designed this way?

Multi-threading is the product of making full use of multi-core processors to adapt to the rapid development of modern computer hardware. Through multi-threading, CPU resources can be efficiently used. Python was born in 1991. At that time, the hardware configuration was far less luxurious than it is today. Servers with 32 cores and 64G of RAM are not commonplace.

But there is a problem with multithreading, how to solve the synchronization and consistency of shared data, because when multiple threads access shared data, there may be two threads modifying a data situation at the same time, if there is no suitable mechanism to ensure data consistency , then the program eventually leads to an exception. Therefore, the father of Python has a global thread lock. Regardless of whether your data has synchronization problems, it is one size fits all. The last global lock ensures data security. This is why multi-threading is tasteless, because it does not have fine-grained control over data security, but solves it in a simple and rude way.

这种解决办法放在90年代,其实是没什么问题的,毕竟,那时候的硬件配置还很简陋,单核 CPU 还是主流,多线程的应用场景也不多,大部分时候还是以单线程的方式运行,单线程不要涉及线程的上下文切换,效率反而比多线程更高(在多核环境下,不适用此规则)。所以,采用 GIL 的方式来保证数据的一致性和安全,未必不可取,至少在当时是一种成本很低的实现方式。

那么把 GIL 去掉可行吗?

还真有人这么干多,但是结果令人失望,在1999年Greg Stein 和Mark Hammond 两位哥们就创建了一个去掉 GIL 的 Python 分支,在所有可变数据结构上把 GIL 替换为更为细粒度的锁。然而,做过了基准测试之后,去掉GIL的 Python 在单线程条件下执行效率将近慢了2倍。

Python之父表示:基于以上的考虑,去掉GIL没有太大的价值而不必花太多精力。

小结

CPython解释器提供 GIL 保证线程数据同步,那么有了 GIL,虽然无法达到高并发利用多核的优势,我们还需要线程同步做一些其他的事情。

 

 

参考

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325564618&siteId=291194637