Python principle knowledge

Python principle knowledge

Python is less efficient than C and C++ for a number of reasons intertwined.

I think the key issues are the four aspects of dynamic type, interpreted execution, virtual machine, and GIL:

1. In order to support dynamic types, Python objects have added a lot of abstractions. When executing, it is necessary to constantly judge the data type, which brings a lot of overhead. The dynamic inspection overhead of dynamic types reduces the running speed.

2. The python code is interpreted and executed by the interpreter one by one (interactive model), or each execution must be translated and then run, which greatly reduces the running efficiency. The statically compiled program is all translated into machine code before execution, while the interpretation is executed sentence by sentence while running.

3. The virtual machine brings indirect overhead, and the Python virtual machine cpython is not as good as jvm.

4. The pseudo-multithreading problem caused by the GIL global lock, so python vigorously develops coroutines and asynchronous syntax

C++ is directly compiled into executable binary code, with system native process and thread blessing, no dynamic type overhead, no virtual machine overhead, if no virtual function, virtual inheritance and other features are used, C++ code can get the same as C language. effectiveness.

Java, which also runs on a virtual machine, is compiled into bytecode and then executed, which is much more efficient. The IO-heavy scene is not much better than C++.

Another Python interpreter, PyPy, introduced JIT, multithreading, etc., and the execution efficiency went up.

cpython

That is, the cpython interpreter is too rubbish? Why doesn't python official use pypy?

Because pypy is still very limited, the library support problem written in c is relatively large. Not suitable for all scenarios. pypy sacrifices C interface compatibility

JIT

JIT compiler , written in English as Just-In-Time Compiler, Chinese means just-in-time compiler.

JIT is a way to improve the efficiency of program operation. Generally, programs run in two ways: static compilation and dynamic interpretation. The statically compiled program is all translated into machine code before execution, while the interpretation is executed sentence by sentence while running.

What the hell is Python's GIL, and how is multi-threading performance?

Foreword: Bloggers often heard the word GIL when they first came into contact with Python, and found that this word is often equated with the inability of Python to efficiently implement multithreading. In line with the research attitude of not only knowing what it is, but also knowing why it is, the blogger collected all kinds of information, spent a few hours of leisure time in a week to deeply understand the GIL, and summarized it into this article. Readers can better and objectively understand GIL through this article.

The article is welcome to reprint, but please keep this paragraph when reprinting, and place it at the top of the article Author: Lu Junyi (cenalulu) The original address of this article: http://cenalulu.github.io/python/gil-in-python/

what is GIL

The first thing to be clear is that it is not a feature of Python, it is a concept introduced when implementing the Python parser (CPython). Just like C++ is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as GCC, INTEL C++, Visual C++, etc. The same is true for Python. The same piece of code can be executed through different Python execution environments such as CPython, PyPy, and Psyco. Like JPython, there is no GIL. However, because CPython is the default Python execution environment in most environments. Therefore, in the concept of many people, CPython is Python, and it is taken for granted that it is attributed to the defects of the Python language. So let's make it clear first: GIL is not a feature of Python, and Python can be completely independent of GILGILGIL

So what is the GIL in the CPython implementation? The full name of GIL is to avoid misleading, let's take a look at the official explanation:Global Interpreter Lock

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

Well, doesn't it look bad? A Mutex that prevents multiple threads from executing machine code concurrently is a global lock that exists like a bug at first glance! Don't worry, we will analyze it slowly below.


Why is there a GIL

Due to physical constraints, the core frequency competition among CPU manufacturers has been replaced by multi-core. In order to make more effective use of the performance of multi-core processors, multi-threaded programming methods have emerged, which brings the difficulties of data consistency and state synchronization between threads. Even the Cache inside the CPU is no exception . In order to effectively solve the data synchronization between multiple caches, manufacturers have spent a lot of thought, which inevitably brings a certain performance loss.

Of course, Python can't escape. In order to take advantage of multi-core, Python began to support multi-threading. The easiest way to solve data integrity and state synchronization between multiple threads is naturally to lock. So there is the super lock of GIL, and when more and more code base developers accept this setting, they begin to rely heavily on this feature (that is, the default python internal objects are thread-safe, and there is no need to Implementation takes into account additional memory locks and synchronization operations).

Slowly this implementation was found to be painful and inefficient. But when everyone tried to split and remove the GIL, it was found that a large number of library code developers had already relied heavily on the GIL and were very difficult to remove. How hard is it? To make an analogy, it took a "small project" like MySQL to split the large lock of Buffer Pool Mutex into various small locks, which took nearly 5 years from 5.5 to 5.6 to 5.7, and still continuing. MySQL, a product with company support and a fixed development team behind it, is going so hard, not to mention a highly community-based team of core development and code contributors like Python?

So simply say that the existence of GIL is more for historical reasons. If it is pushed again, the problem of multi-threading will still be faced, but at least it will be more elegant than the current GIL method.


Effects of GIL

From the above introduction and official definition, GIL is undoubtedly a global exclusive lock. There is no doubt that the existence of global locks will have a large impact on the efficiency of multi-threading. It's almost as if Python is a single-threaded program. Then the reader will say that as long as the global lock is released diligently, the efficiency will not be bad. As long as the GIL can be released when time-consuming IO operations are performed, it can still improve the operating efficiency. Or even worse, it will not be worse than the efficiency of a single thread. In theory, but in practice? Python is worse than you think.

Let's compare the efficiency of Python under multi-threading and single-threading. The test method is simple, a counter function that loops 100 million times. One is executed twice by single thread and one is executed by multi-thread. Finally compare the total execution time. The test environment is a dual-core Mac pro. Note: In order to reduce the impact of the performance loss of the thread library itself on the test results, the single-threaded code here also uses threads. Just execute it twice in sequence, simulating a single thread.

Single thread for sequential execution (single_thread.py)

#! /usr/bin/python

from threading import Thread
import time

def my_counter():
    i = 0
    for _ in range(100000000):
        i = i + 1
    return True

def main():
    thread_array = {}
    start_time = time.time()
    for tid in range(2):
        t = Thread(target=my_counter)
        t.start()
        t.join()
    end_time = time.time()
    print("Total time: {}".format(end_time - start_time))

if __name__ == '__main__':
    main()

Two concurrent threads executing simultaneously (multi_thread.py)

#! /usr/bin/python

from threading import Thread
import time

def my_counter():
    i = 0
    for _ in range(100000000):
        i = i + 1
    return True

def main():
    thread_array = {}
    start_time = time.time()
    for tid in range(2):
        t = Thread(target=my_counter)
        t.start()
        thread_array[tid] = t
    for i in range(2):
        thread_array[i].join()
    end_time = time.time()
    print("Total time: {}".format(end_time - start_time))

if __name__ == '__main__':
    main()

Below is the test result

It can be seen that python is actually 45% slower than single thread in the case of multi-threading. According to the previous analysis, even if there is a GIL global lock, serialized multithreading should have the same efficiency as single threading. So how could there be such a bad result?

Let us analyze the reasons for this through the implementation principle of the GIL.


Flaws of the current GIL design

Scheduling method based on the number of pcodes

According to the ideas of the Python community, the thread scheduling of the operating system itself is very mature and stable, and there is no need to do it yourself. So Python's thread is a pthread of C language, and it is scheduled through the operating system scheduling algorithm (for example, linux is CFS). In order to allow each thread to use the CPU time evenly, python will calculate the number of microcodes currently executed, and force the GIL to be released when a certain threshold is reached. At this time, a thread scheduling of the operating system will also be triggered (of course, whether the context switch is really performed is independently determined by the operating system).

Fake code

while True:
    acquire GIL
    for i in 1000:
        do something
    release GIL
    /* Give Operating System a chance to do thread scheduling */

This mode works fine with only one CPU core. Any thread can successfully acquire the GIL when it is invoked (because only the GIL is released before thread scheduling occurs). But the problem comes when the CPU has multiple cores. As you can see from the pseudo code, there is almost no gap between from and to . So when other threads on other cores are woken up, in most cases the main thread has already acquired the GIL again. At this time, the thread that is awakened and executed can only waste CPU time in vain, watching another thread happily execute with the GIL. Then, when the switching time is reached, it enters the state to be scheduled, wakes up, and waits again, in a vicious circle.release GILacquire GIL

PS: Of course this implementation is primitive and ugly, and the interaction between the GIL and thread scheduling is gradually improving in each version of Python. For example, try to hold the GIL for thread context switching, release the GIL while waiting for IO, and so on. But what cannot be changed is that the existence of the GIL makes this already expensive operation of operating system thread scheduling more luxurious. Extended reading on the impact of the GIL

In order to intuitively understand the performance impact of GIL on multi-threading, here is a test result chart directly borrowed (see the figure below). The figure shows the execution of two threads on a dual-core CPU. Both threads are CPU-intensive computing threads. The green part indicates that the thread is running and performing useful calculations, and the red part is the time when the thread is scheduled to wake up, but the GIL cannot be obtained, resulting in the inability to perform effective operations. As can be seen from the figure, the existence of the GIL causes multi-threading to fail to have a good immediate multi-core CPU concurrent processing capability.

So can Python's IO-intensive threads benefit from multithreading? Let's look at the test results below. The meaning of the color is the same as the above picture. The white part indicates that the IO thread is waiting. It can be seen that when the IO thread receives the data packet and causes the terminal to switch, it still cannot acquire the GIL lock due to the existence of a CPU-intensive thread, thus performing endless loop waiting.

A simple summary is: Python's multi-threading on multi-core CPUs only has a positive effect on IO-intensive calculations; and when there is at least one CPU-intensive thread, the multi-thread efficiency will be greatly reduced due to the GIL.


How to avoid being affected by the GIL

Having said so much, if you don't say the solution, it's just a popular science post, but it doesn't matter. GIL is so bad, is there a way to bypass it? Let's take a look at what's out there.

Replacing Thread with multiprocessing

The emergence of the multiprocessing library is largely to make up for the inefficiency of the thread library due to the GIL. It completely replicates the interface provided by a set of threads to facilitate migration. The only difference is that it uses multiprocessing instead of multithreading. Each process has its own independent GIL, so there is no GIL contention between processes.

Of course, multiprocessing is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between threads during program implementation. Take the counter as an example. If we want multiple threads to accumulate the same variable, for thread, declare a global variable and wrap three lines with the context of thread.Lock. In multiprocessing, because processes cannot see each other's data, they can only declare a Queue in the main thread, put and then get or use share memory. This additional implementation cost makes the already painful coding of multithreaded programs even more painful. Where are the specific difficulties? Interested readers can expand to read this article

use another parser

It was also mentioned before that since the GIL is only a product of CPython, are other parsers better? That's right, parsers like JPython and IronPython don't need the help of the GIL due to the features of the implementation language. However, by using Java/C# for the parser implementation, they also lost the opportunity to take advantage of the many useful features of the community's many C language modules. Therefore, these parsers have always been relatively small. After all, everyone will choose the former for function and performance in the early stage .Done is better than perfect

So it doesn't help?

Of course, the Python community is also working very hard to continuously improve the GIL, and even try to remove it. And a lot of progress has been made in each minor version. Interested readers can expand to read this Slide another improvement Reworking the GIL

  • Changed switching granularity from opcode-based to time-slice-based

  • Avoid the thread that recently released the GIL lock from being scheduled again immediately

  • Added thread priority function (high-priority threads can force other threads to release GIL locks held)

Summarize

The Python GIL is actually the product of a trade-off between function and performance. It is especially rational and has objective factors that are difficult to change. From the analysis of this part, we can make the following simple conclusions:

  • Because of the existence of the GIL, only in the IO Bound scenario, multiple threads will get better performance

  • For programs with high parallel computing performance, consider converting the core part into a C module, or simply implement it in other languages

  • The GIL will continue to exist for a longer period of time, but it will continue to be improved

Reference

Python’s hardest problemOfficial documents about GILRevisiting thread priorities and the new GIL

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324651677&siteId=291194637