What is the Python Global Lock (GIL), and how to avoid GIL restrictions?

1. What is a Python global lock

1. What is a global lock?

In simple terms, the Python Global Interpreter Lock ( GIL for short ) is a mutex (or lock) that allows only one thread to maintain control of the Python interpreter.

This means that only one thread can be executing at any point in time. Developers executing single-threaded programs cannot see the effects of the GIL, but it can be a performance bottleneck in CPU-intensive and multi-threaded code.

The GIL has gained the reputation of being a "notorious" feature of Python because it only allows one thread to execute at a time, even on multithreaded architectures with multiple CPU cores.

In this article, you'll learn how the GIL affects the performance of Python programs, and how to mitigate the effects it can have on your code.

2. What problem does the GIL solve for Python?

Python uses reference counting for memory management. This means that objects created in Python have a reference count variable that keeps track of the number of references pointing to that object. When this count reaches zero, the memory occupied by the object is freed.

Let's look at a short code example to demonstrate how reference counting works:

>>> import sys
>>> a = []
>>> b = a
>>> sys.getrefcount(a)
3

In the example above, the empty list object [] has a reference count of 3. The list object is referenced by a, b, the parameter passed to sys.getrefcount().

Back to the GIL, the problem is that this reference count variable needs to be protected from a race condition where two threads increment or decrement its value at the same time. If this happens, it can lead to memory leaks that are never freed, or worse, incorrectly freed memory while references to the object still exist. This may cause crashes or other "weird" errors in Python programs.

This reference-counted variable can be secured by adding locks to all data variables shared across threads so that they are not modified inconsistently.

However, adding a lock to each object or group of objects means that there will be multiple locks, which can lead to another problem - deadlocks (deadlocks can only happen when there are multiple locks). Another side effect is that repeatedly acquiring and releasing locks can cause performance degradation.

The GIL is a single lock on the interpreter itself, which adds the rule that executing any Python bytecode requires acquiring the interpreter lock. This prevents deadlocks (since there is only one lock) and doesn't introduce much performance overhead. But it effectively makes any CPU-bound Python program single-threaded.

The GIL, although used by interpreters for other languages ​​(such as Ruby), is not the only solution to this problem. Some languages ​​avoid the GIL requirement for thread-safe memory management by using methods other than reference counting, such as garbage collection.

On the other hand, this means that these languages ​​often have to compensate for the loss of the GIL's single-threaded performance benefits by adding other performance-enhancing features, such as JIT compilers.

3. Why choose GIL as a solution?

So why is a seemingly so obstructive method used in Python? Is this a bad decision by the Python developers?

Well, to paraphrase Larry Hastings, the GIL design decision was one of the factors that made Python as popular as it is today.

Python has been around since the days when operating systems had no concept of threads. Python was designed to be easy to use to make development faster and more and more developers are starting to use it.

Many extensions are being written to existing C libraries whose functionality is required in Python. To prevent inconsistent changes, these C extensions require thread-safe memory management provided by the GIL.

The GIL is easy to implement and easy to add to Python. It provides a performance boost for single-threaded programs because only one lock needs to be managed.

C libraries that are not thread-safe have been made easier to integrate. And these C extensions became one of the reasons why Python was easily adopted by different communities.

As you can see, the GIL is a pragmatic solution to a difficult problem that CPython developers faced in the early days of Python.

4. Impact on multi-threaded Python programs

When you look at a typical Python program, or any computer program, there is a difference in performance between a CPU-bound program and an I/O-bound program.

CPU-intensive programs are those that push the CPU to its limits. This includes programs that perform mathematical calculations such as matrix multiplication, searching, image processing, and more.

I/O bound programs are programs that spend time waiting for input/output, which may come from users, files, databases, networks, etc. I/O bound programs sometimes have to wait a long time until they get what they need from the source, because the source may need to do its own processing before the input/output is ready, e.g. What is entered in the database query that is run in the process.

Let's look at a simple CPU-intensive program that performs a countdown:
``py

# single_threaded.py
import time
from threading import Thread

COUNT = 50000000

def countdown(n):
    while n>0:
        n -= 1

start = time.time()
countdown(COUNT)
end = time.time()

print('Time taken in seconds -', end - start)

The results of running on the test machine with 4-core CPU are as follows

$ python single_threaded.py
Time taken in seconds - 6.20024037361145

The code is now modified to execute the same countdown code in parallel using two threads:

# multi_threaded.py
import time
from threading import Thread

COUNT = 50000000

def countdown(n):
    while n>0:
        n -= 1

t1 = Thread(target=countdown, args=(COUNT//2,))
t2 = Thread(target=countdown, args=(COUNT//2,))

start = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
end = time.time()

print('Time taken in seconds -', end - start)

Run it again, the result is as follows:

$ python multi_threaded.py
Time taken in seconds - 6.924342632293701

As you can see, both versions take almost the same amount of time to complete. In the multithreaded version, the GIL prevents CPU-bound threads from executing in parallel.

The GIL does not have much impact on the performance of I/O-bound multithreaded programs because locks are shared among threads while they wait for I/O.

However, a program whose threads are completely CPU-bound, for example one that uses threads to process images in sections, will not only become single-threaded because of locks, but execution time will also increase, as in the example above, compared to being written as fully single-threaded Compared with the case of multithreading, the increased time of multithreading is the overhead caused by acquiring and releasing locks.

Why not drop the GIL?

The developers of Python got a lot of complaints about this, but a language as popular as Python couldn't bring about a change as significant as removing the GIL without causing backwards incompatibility issues.

It is clearly possible to remove the GIL, and developers and researchers have done so many times in the past, but all these attempts have broken existing C extensions, which rely heavily on the solutions provided by the GIL.

Of course, there are other solutions to the problems the GIL solves, but some of them slow down the performance of single-threaded and multithreaded I/O-bound programs, and some of them are too hard. After all, you wouldn't want your existing Python programs to run slower after a new version was released, would you?

Python creator and BDFL Guido van Rossum gave the answer to the community in his September 2007 article "It isn't Easy to remove the GIL":

"I would welcome a set of patches into Py3k only if the performance of single-threaded programs (and multi-threaded but I/O-bound programs) does not degrade"

Any attempt since then has failed to meet this condition.

Why wasn't it removed in Python 3?

Python 3 does have the opportunity to start a lot of features from scratch, and break some existing C extensions in the process, and then need to update and port the changes to work with Python 3. This is why earlier versions of Python 3 were slower to be adopted by the community.

Removing the GIL makes Python 3 slower in terms of single-thread performance compared to Python 2, and you can imagine what that might cause. You can't argue with the single-threaded performance benefits of the GIL. So it turns out that Python 3 still has the GIL.

But Python 3 does bring significant improvements to the existing GIL --

We discussed the effect of the GIL on "only CPU bound" and "only I/O bound" multithreaded programs, but what about programs where some threads are I/O bound and some threads are CPU bound?

In such programs, Python's GIL is known not to starve I/O-bound threads, since they have no chance of acquiring the GIL from CPU-bound threads.

This is because of a mechanism built into Python that forces threads to release the GIL after a fixed interval of continuous use, and if no one else has acquired the GIL, the same thread can continue to use it.

>>> import sys
>>> # The interval is set to 100 instructions:
>>> sys.getcheckinterval()
100

The problem with this mechanism is that, most of the time, CPU-bound threads reacquire the GIL themselves before other threads acquire the GIL. This was researched by David Beazley and the visualization can be found here.

This problem was fixed in Python 3.2 in 2009 by Antoine Pitrou, who added a mechanism to see the number of GIL acquisition requests discarded by other threads, and not allow the current thread to reacquire the GIL before other threads had a chance to run.

2. How to get rid of the limitation of Python's global lock

If the GIL is causing you problems, here are a few things you can try to get around the limitations of the global lock

1. Use multi-process programming

The most popular approach is to use a multi-process approach, where multiple processes are used instead of threads. Each Python process has its own Python interpreter and memory space, so the GIL won't be an issue. Python has a multiprocessing module that allows us to easily create processes like this:

from multiprocessing import Pool
import time

COUNT = 50000000
def countdown(n):
    while n>0:
        n -= 1

if __name__ == '__main__':
    pool = Pool(processes=2)
    start = time.time()
    r1 = pool.apply_async(countdown, [COUNT//2])
    r2 = pool.apply_async(countdown, [COUNT//2])
    pool.close()
    pool.join()
    end = time.time()
    print('Time taken in seconds -', end - start)

output:

$ python multiprocess.py
Time taken in seconds - 4.060242414474487

Performance has improved compared to the multi-threaded version, right?

The time didn't drop to half of what we saw above, because process management has its own overhead. Multiple processes are heavier than multiple threads, so keep in mind that this can become a scaling bottleneck.

2. Use cython to avoid global locks

Cython is usually used to handle computationally intensive tasks to speed up the overall running speed of python programs.

If you want C-style cython functions to avoid GIL restrictions, simply use the with nogil parameter

cdef void some_func() noexcept nogil:
    # 函数功能代码
    ....

A pure Python implementation of cython, using decorators to avoid the GIL

@cython.nogil
@cython.cfunc
@cython.noexcept
def some_func() -> None:
    ...

In a cython function, if you want some code blocks not to use the GIL, use the with nogil: statement

def use_add(n):
    result = 1
    with nogil:
        for i in range(n):
            result = add(result, result)
    return result

In the cython function, you can specify a part of the code to avoid the GIL, and another part to use the GIL


with nogil:
    ...  # some code that runs without the GIL
    with gil:
        ...  # some code that runs with the GIL
    ...  # some more code without the GIL

3. Use an alternative Python interpreter

Python has multiple interpreter implementations. CPython, Jython, IronPython, and PyPy, written in C, Java, C#, and Python respectively, are the most popular. The GIL exists only in the original Python implementation of CPython. If your program and its libraries are available for one of the other implementations, then you can try those as well.

Summarize

The Python GIL is generally considered a difficult topic. But python programmers are usually only affected by it when writing CPU-intensive multi-threaded code.

You can use 3 methods to avoid the limitation of global locks: multi-process, cython

Guess you like

Origin blog.csdn.net/captain5339/article/details/131375952