What is the relationship between GIL and multithreading in Python?

Question 1: The principle of infinite nesting of list self append

Insert picture description here
Let’s answer the first question first. Both students have asked, why is the x in the following code an infinitely nested list?


x = [1]
x.append(x)
x
[1, [...]]

We can draw a picture of the above operation to make it easier for you to understand more intuitively:
Insert picture description here
here, x points to a list, and the first element of the list is 1; after the append operation is performed, the second element in turn points to x, that is to say The list pointed to by x is formed, thus forming an infinitely nested loop: [1, [1, [1, [1, …]]]].

However, although x is an infinitely nested list, the operation of x.append(x) does not recursively traverse every element in it. It just expands the second element of the original list and points it to x, so there will be no stack overflow problems, and no error will be reported naturally.

As for the second point, why does len(x) return 2? Let's look at x. Although it is an infinitely nested list, the top level of x is composed of only 2 elements. The first element is 1, and the second element is a list that points to itself, so len(x) returns 2. .

Question 2: The macro understanding of decorators

Insert picture description here

Let's look at the second question, Hu Yao's question about decorators. In fact, the role and significance of the decorator is that it can change some functions of the original function without changing the original function through a custom function or class.


Decorators is to modify the behavior of the function through a wrapper so we don't have to actually modify the function.

The decorator will encapsulate the added function in its own decorator function or class; if you want to call it, you only need to add @decorator to the top of the original function. Obviously, this can allow your code to get a high degree of abstraction, separation and simplification.

The concept may still be a bit abstract. We can imagine the following scene to feel the charm of decorators from real examples. In the background of some social networking sites, there are countless operations that need to check whether the user is logged in before calling them, such as commenting on some posts, posting status, and so on.

If you don't know the decorator, use the conventional method to program, the code written is probably as follows:


# 发表评论
def post_comment(request, ...):
    if not authenticate(request):
        raise Exception('U must log in first')
    ...
    
# 发表状态
def post_moment(request, ...):
    if not authenticate(request):
        raise Exception('U must log in first')
    ...

Obviously, repeating the steps of calling the authentication function authenticate() in this way is very redundant. A better solution is to separate the authentication function authenticate() separately and write it as a decorator, as we write below. In this way, the code is highly optimized:


# 发表评论
@authenticate
def post_comment(request, ...):

# 发表状态
@authenticate
def post_moment(request, ...):

But also note that in many cases, decorators are not the only way. And what I emphasize here is mainly the benefits of using decorators:

  • The code is more concise;
  • The logic is clearer;
  • The hierarchy and separation of procedures are more obvious.

And this is also the development model that we should follow and prioritize.

Question 3: The relationship between GIL and multithreading

GIL only supports single threading, while Python supports multithreading. What is the relationship between the two?

In fact, the existence of GIL is not inconsistent with Python's support for multithreading. As we mentioned earlier, GIL means that the program can only have one thread running at the same time; while multithreading in Python means that multiple threads are executed alternately, resulting in a "pseudo-parallel" result, but specific to a certain moment, There is still only 1 thread running, which is not true multi-threaded parallelism. This mechanism, I drew the following picture to show:

Insert picture description here
Give an example to understand. For example, I use 10 threads to crawl the content of 50 websites. When thread 1 was crawling the first website, it was stuck in the I/O block and was in a waiting state; at this time, the GIL would be released, and thread 2 would start executing to crawl the second website, and so on. When the I/O operation of thread 1 is completed, the main program will switch back to thread 1 and let it complete the remaining operations. In this way, what we see from the user's perspective is what we call multithreading.

Question 4: Application scenarios of multi-process and multi-thread

Insert picture description here
The fourth question is mentioned many times in the article, but I still want to emphasize it again here.

If you want to speed up CPU-intensive tasks, using multi-threading is not effective, please use multi-process. The so-called CPU-intensive tasks here refer to tasks that consume a lot of CPU resources, such as finding the product of 1 to 100000000, or encoding a long text and then decoding it.

The reason why the use of multithreading is invalid is exactly what we have just mentioned. The essence of Python multithreading is that multiple threads switch each other, but only one thread is allowed to run at a time. Therefore, there is essentially no difference between using multiple threads and using a main thread; on the contrary, in many cases, the additional loss caused by thread switching will also reduce the efficiency of the program.

And if you use multiple processes, you can allow multiple processes to execute tasks in parallel, so you can effectively improve the efficiency of the program.

As for I/O-intensive tasks, if you want to speed up, please use multi-threading or Asyncio first. Of course, the use of multiple processes can also achieve the goal, but this is completely unnecessary. Because for I/O intensive tasks, most of the time is wasted in I/O waiting. Therefore, when a thread/task is waiting for I/O, we only need to switch the thread/task to perform other I/O operations.

However, if there are many I/O operations, very heavy, and a lot of connections need to be established, we generally choose Asyncio. Because Asyncio's task switching is more lightweight, and the number of tasks it can start is far more than the number of threads started by multiple threads. Of course, if I/O operations are not so heavy, then multithreading is sufficient.

Guess you like

Origin blog.csdn.net/qq_41485273/article/details/114179300