Python GIL (Global Interpreter Lock)

GIL (Global Interpreter Lock)

Another very important topic of Python multithreading-GIL (Global Interpreter Lock) is rarely known. Even many Python "veteran drivers" think GIL is a mystery.

For example, the following very simple cpu-bound code:


def CountDown(n):
    while n > 0:
        n -= 1

Now, assuming a large number n = 100000000, let's first try to execute CountDown(n) in a single thread. After running it on the 8-core MacBook in my hand, I found that it took 5.4s.

At this time, we want to use multiple threads to speed up, such as the following lines of operations:


from threading import Thread

n = 100000000

t1 = Thread(target=CountDown, args=[n // 2])
t2 = Thread(target=CountDown, args=[n // 2])
t1.start()
t2.start()
t1.join()
t2.join()

I ran it on the same machine again, and found that not only did it not improve the speed, but it made the operation slower. It took 9.6s in total.

I still didn't give up and decided to try again with four threads. The result was that the running time was still 9.8s, which was almost the same as the result of 2 threads.

What is going on here? Did I buy a fake MacBook? You can think about this problem yourself, or test it on your own computer. Of course, I also have to reflect on myself, and put forward the following two conjectures.

The first doubt: Is there something wrong with my machine?

It must be said that this is also a reasonable conjecture. So I found a desktop computer with a single-core CPU and ran the above experiment. This time I found that on a single-core CPU computer, it takes 11s for a single thread to run, and it takes 11s for two threads to run. Although not like the first machine, multi-threaded is slower than single-threaded, but the overall effect is almost the same two times!

It seems that this does not seem to be a computer problem, but that the thread of Python fails and does not play a role in parallel computing.

Naturally, I have a second doubt: Are Python threads fake threads?

Python threads do indeed encapsulate the underlying operating system threads. In the Linux system, it is Pthread (full name is POSIX Thread), and in the Windows system it is Windows Thread. In addition, Python threads are also fully managed by the operating system, such as coordinating when to execute, managing memory resources, managing interrupts, and so on.

Therefore, although Python threads and C++ threads are essentially different abstractions, their underlying layers are not different.

Why is there a GIL?

It seems that neither of my two conjectures can explain the unsolved mystery at the beginning. So who is the "culprit"? In fact, it is our protagonist today, the GIL, that causes the performance of the Python thread to not be as we expected.

GIL is a technical term in the most popular Python interpreter CPython. It means the global interpreter lock, which is essentially Mutex similar to the operating system. Each Python thread, when executed in the CPython interpreter, will first lock its own thread to prevent other threads from executing.

Of course, CPython will do some tricks and execute Python threads in turns. In this way, what the user sees is "pseudo-parallel"-Python threads are interleaved to simulate truly parallel threads.

So why does CPython need GIL? This is actually related to the implementation of CPython. In the next section, we will talk about Python's memory management mechanism. Let me talk about it a little bit today.

CPython uses reference counting to manage memory. All instances created in Python scripts will have a reference count to record how many pointers point to it. When the reference count is only 0, the memory is automatically released.

What does that mean? Let's look at the following example:


>>> import sys
>>> a = []
>>> b = a
>>> sys.getrefcount(a)
3

In this example, the reference count of a is 3, because there are three places a, b, and getrefcount passed as a parameter, all referencing an empty list.

In this way, if two Python threads reference a at the same time, it will cause a race condition of the reference count, and the reference count may only increase by 1, which will cause memory pollution. Because when the first thread ends, the reference count will be reduced by 1, then the condition may be reached to release the memory. When the second thread tries to access a again, no valid memory can be found.

Therefore, CPython introduced the GIL for two reasons:

  • One is for designers to avoid complex race conditions such as memory management;
  • The second is because CPython makes extensive use of the C language library, but most of the C language libraries are not natively thread-safe (thread safety will reduce performance and increase complexity).

How does the GIL work?

The picture below is a working example of GIL in Python program. Among them, Thread 1, 2, and 3 are executed in turn. When each thread starts to execute, it will lock the GIL to prevent other threads from executing; similarly, after each thread executes for a period of time, it will release the GIL to allow others The thread starts to utilize resources.
Insert picture description here
Careful you may find a question: Why does the Python thread actively release the GIL? After all, if the Python thread is only required to lock the GIL when it starts execution, and never release the GIL, then other threads will have no chance to run.

Yes, there is another mechanism in CPython, called check_interval, which means that the CPython interpreter will poll to check the thread GIL lock. Every once in a while, the Python interpreter will force the current thread to release the GIL, so that other threads can have a chance to execute.

In different versions of Python, check interval is implemented differently. Early Python had 100 ticks, roughly corresponding to 1000 bytecodes; after Python 3, the interval is 15 milliseconds. Of course, we don't have to look into how long the GIL will be forced to release. This should not be a dependency of our program design. We only need to understand that the CPython interpreter will release the GIL within a "reasonable" time frame.
Insert picture description here
On the whole, every Python thread is encapsulated like this loop. Let's look at the following code:


for (;;) {
    
    
    if (--ticker < 0) {
    
    
        ticker = check_interval;
    
        /* Give another thread a chance */
        PyThread_release_lock(interpreter_lock);
    
        /* Other threads may run now */
    
        PyThread_acquire_lock(interpreter_lock, 1);
    }

    bytecode = *next_instr++;
    switch (bytecode) {
    
    
        /* execute the next instruction ... */ 
    }
}

From this code, we can see that every Python thread checks the ticker count first. Only when ticker is greater than 0, the thread will execute its own bytecode.

Python's thread safety

However, having the GIL does not mean that we Python programmers do not need to think about thread safety. Even if we know that the GIL only allows one Python thread to execute, I also mentioned earlier that Python also has a preemption mechanism such as check interval. Let's consider such a piece of code:


import threading

n = 0

def foo():
    global n
    n += 1

threads = []
for i in range(100):
    t = threading.Thread(target=foo)
    threads.append(t)

for t in threads:
    t.start()

for t in threads:
    t.join()

print(n)

If you execute it, you will find that although it can print 100 most of the time, it sometimes prints 99 or 98.

This is actually because the code n+=1 makes the thread unsafe. If you translate the bytecode of the function foo, you will find that it actually consists of the following four lines of bytecode:


>>> import dis
>>> dis.dis(foo)
LOAD_GLOBAL              0 (n)
LOAD_CONST               1 (1)
INPLACE_ADD
STORE_GLOBAL             0 (n)

And these four lines of bytecode may be interrupted!

So, don't think about it. With GIL, your program can sit back and relax. We still need to pay attention to thread safety. As I said at the beginning, the design of GIL is mainly for the convenience of writers at the CPython interpreter level, not for programmers at the Python application level. As users of Python, we still need tools such as lock to ensure thread safety. For example, my following example:


n = 0
lock = threading.Lock()

def foo():
    global n
    with lock:
        n += 1

How to bypass the GIL?

After learning this, it is estimated that some Python users feel like they have been abolished martial arts, and feel that there is only one palm left in the eighteen palms of Jianglong. In fact, you don't need to, you don't need to be too frustrated. The GIL of Python is a restriction imposed by the interpreter of CPython. If your code does not need a CPython interpreter to execute, it is no longer restricted by the GIL.

In fact, many high-performance application scenarios already have a large number of C-implemented Python libraries. For example, NumPy's matrix operations are all implemented in C and are not affected by GIL.

Therefore, in most applications, you don't need to think about GIL too much. Because if multi-threaded computing becomes a performance bottleneck, there are often Python libraries to solve this problem.

In other words, if your application really has super strict requirements on performance, such as 100us will have a great impact on your application, then I must say that Python may not be your best choice.

Of course, it is understandable that sometimes it is inevitable that we want to temporarily loosen ourselves and get rid of the GIL. For example, in deep learning applications, most of the code is Python. In actual work, if we want to implement a custom differential operator, or a specific hardware accelerator, then we have to implement these performance-critical codes in C++ (no longer affected by the GIL Limited), and then provide a Python call interface.

In general, you only need to keep in mind that there are two general ideas for bypassing the GIL:

  • Bypass CPython and use JPython (Python interpreter implemented by Java) and other implementations;
  • Put the key performance code into other languages ​​(usually C++).

Guess you like

Origin blog.csdn.net/qq_41485273/article/details/114178699