Python basic knowledge combing-GIL (Global Interpreter Lock)

1 Introduction

In the previous blog post, I sorted out the basic use of multi-process, multi-threading, and coroutines in Python. At that time, we discussed that multi-threading in Python is not actually "real" multi-threading. Why? This is inseparable from the GIL. Let's take a look at how the GIL in Python affects the use of multithreading in Python through a few examples.

1.1 Why is it slow?

import time
def Countnumber(n):
    while n > 0:
        n -= 1
start = time.time()
Countnumber(100000000)
end = time.time()
print('运行时间为：{}秒'.format(end-start))
# 输出
运行时间为：6.358428239822388秒

In my 2015 early MacBook Pro13single-threaded case, the running time is 6.3 seconds. Below we use multi-threading to speed up:

import time
import threading
N = 100000000
def Countnumber(n):
    while n > 0:
        n -= 1

start = time.time()
t1 = threading.Thread(target=Countnumber,args=[N // 2 ])
t2 = threading.Thread(target=Countnumber,args=[N // 2 ])
t3 = threading.Thread(target=Countnumber,args=[N // 2 ])
t4 = threading.Thread(target=Countnumber,args=[N // 2 ])
t1.start()
t2.start()
t3.start()
t4.start()
t1.join()
t2.join()
t3.join()
t4.join()
end = time.time()
print('运行时间为：{}秒'.format(end-start))
# 输出
运行时间为：12.465165138244629秒

We used 4 threads, but didn't expect the time to be twice as long as before, 12 seconds?

2. GIL

In fact, the reason why we added multi-threading and the speed slowed down is due to the GIL, which caused the performance of Python threads to not reach what we expected.

GIL is Python's own interpreter and a technology in the most popular Python interpreter CPython. Its Chinese name is: global interpreter lock. Each Python thread will be locked first when executed in the CPython interpreter. Your own thread prevents other threads from executing.

Moreover, CPython will pretend to execute Python threads in turn, making us seem to think that threads in Python are interleaved.

So why does CPython use GIL? In fact, this involves the reference counting of the garbage collection mechanism in Python.

Python's garbage collection mechanism is based on reference counting , with mark-sweep and generational collection as supplementary strategies.

import sys
a = []
b = a
print(sys.getrefcount(a))
# 输出
3

The output is a reference count of 3, because a, b, and the getrefcountthree places passed as parameters all refer to an empty list. Going back to the multi-threading we just used, if two Python threads reference a at the same time, it will cause a reference count race condition(competition), the reference count may only increase by 1, when the first thread access ends, the reference will be The count is reduced by 1, and the condition may be released at this time. When the second thread wants to access a again, it cannot find valid memory (the reference count is 0 and it will be recycled).

Therefore, there are actually two main reasons why CPython refers to GIl:

In order to avoid the race condition(competition) problem of memory management
As the name suggests, CPython uses C to interpret the Python language, and most C language libraries are not natively thread-safe

3. How does GIl work?

2020-12-25 031713

As shown in the figure, when Thread 1, 2, and 3 are executed in turn, each thread will lock the GIL at the beginning of execution to prevent other threads from executing; when the thread is completed, the GIL will be released so that other threads can start execution .

The check_intervalmechanism in CPython will check the lock status of the thread GIL in turn. Every once in a while, the Python interpreter will force the current thread to release the GIL so that other threads can have a chance to execute.

In Python3, CPython will release the GIL within a "reasonable" range (using Python3 as an example, intervalthe time is about 15 milliseconds)

2020-12-25 031700

From the underlying code, we can find out, basically every Python is similar to this loop package:

for (;;) {
    
    
    if (--ticker < 0) {
    
    
        ticker = check_interval;
    
        /* Give another thread a chance */
        PyThread_release_lock(interpreter_lock);
    
        /* Other threads may run now */
    
        PyThread_acquire_lock(interpreter_lock, 1);
    }
 
    bytecode = *next_instr++;
    switch (bytecode) {
    
    
        /* execute the next instruction ... */ 
    }
}

Obviously, every thread of Python checks the tickercount, and only if the tickercount is greater than 0, the thread will execute its own byetecode.

4. Python thread safety

When we talked about multi-threading before, we often said that we should threading.lock()lock a shared variable first, and then use it for other threads after the modification is completed.

This is because the GIL only allows one Python thread to execute, which does not mean that the Python thread is completely safe.

Below we refer to a piece of code:

import threading

n = 0
def foo():
    global n
    n += 1

threads = []
for i in range(1000):
    t = threading.Thread(target=foo)
    threads.append(t)
for t in threads:
    t.start()
for t in threads:
    t.join()
print(n)

import dis
print(dis.dis(foo))
# 输出
  6           0 LOAD_GLOBAL              0 (n)
              2 LOAD_CONST               1 (1)
              4 INPLACE_ADD
              6 STORE_GLOBAL             0 (n)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
None

In most cases, the output result is 1000, but it may also be 999, 998, because n += 1this line of code makes the thread unsafe.

When we dis.dis()print foo()the bytecode of this function, we will find that the 6 lines of bytecode may be interrupted.

So, we can use threading.Lock()to ensure thread safety

n = 0
lock = threading.Lock()
def foo():
    global n
    with lock:
      n += 1

5. How to bypass GIL?

If you have read my previous blog post, you will definitely be %timeimpressed by the magic method. This is an iPythoninterpreter- based jupyter notebookmethod of outputting the running time of a function. Its interpreter is not CPython, then It is not affected by the GIL.

In fact, if you are a classmate of deep learning or machine learning or even data analysis, artificial intelligence-related majors, then you will not be NumPyunfamiliar with it. The bottom layer of such a matrix operation library is also implemented in C and is not affected by GIL.

Having said so much, do you feel that I am talking nonsense? In fact, there are two general ideas for bypassing the GIL:

Bypass CPython and use interpreters such as IPython or JPython (Python interpreter implemented by Java);
Put the code with high performance requirements into other languages for implementation;

6. Strange thoughts

import time
import multiprocessing
N = 100000000
def Countnumber(n):
    while n > 0:
        n -= 1

start = time.time()
t1 = multiprocessing.Process(target=Countnumber,args=[N // 2])
t2 = multiprocessing.Process(target=Countnumber,args=[N // 2])

t1.start()
t2.start()

t1.join()
t2.join()

end = time.time()
print('运行时间为：{}秒'.format(end-start))
# 输出
运行时间为：3.4095828533172607秒

Using multiple processes unexpectedly doubled the speed?

For the follow-up update of the blog post, please follow my personal blog: Stardust Blog