With Numpy, who says Python (women) is not as good as men

Interpretation: It is essentially a problem with the official Cpython interpreter. The reason is that in the 1990s, under the background of a single-core CPU, Uncle Turtle was specially designed to solve many problems caused by malicious competition for resources between multiple threads. GIL lock. However, due to advances in technology and changes in computer hardware conditions, this problem is becoming more and more prominent in the context of multi-core CPUs. However, Uncle Turtle reported in the community: Considering the GIL lock, it has been implanted in the Cpython interpreter for many years. And it is inextricably linked with the interpreter, so forcible adjustment may affect the whole body, causing various problems in the Cpython interpreter. Therefore, Python's multi-threaded concurrency problem has not been effectively solved.

Then let's talk about the efficiency of multithreading in Python?

This should be viewed from two aspects, CPU-intensive operations and IO-intensive operations:

  • 1. CPU-intensive code (various loop processing, counting, etc. involve scientific computing business parts). In this case, the ticks count will soon reach the threshold, and then trigger the release and re-competition of the GIL (multiple threads switch back and forth) Of course it needs to consume resources), so multithreading under Cpython is not friendly to CPU-intensive code.

  • 2. IO-intensive code (file processing, web crawlers, etc.), multi-threading can effectively improve efficiency (single-threaded IO operation will perform IO waiting, causing unnecessary time waste, and opening multi-threading can wait while thread A , Automatically switch to thread B, without wasting CPU resources, thereby improving the efficiency of program execution). So Cpython's multithreading is more friendly to IO-intensive code.

  • 3. In python3.x, the GIL does not use ticks to count, but instead uses a timer (after the execution time reaches the threshold, the current thread releases the GIL), which is more friendly to CPU-intensive programs, but still does not solve the same problem caused by GIL Time can only execute one thread, so the efficiency is still not satisfactory.

The foreshadowing is over, and the focus is on today. Is Python really unsolvable for CPU-intensive code execution?

For a long time, Python has been criticized for the "pseudo multithreading" problem. With the emergence of the open source scientific computing library Numpy, it has largely been solved. Today we will talk about this topic.

First of all, let's first understand Numpy:

Numpy (Numerical Python) is an open source Python scientific computing library for quickly processing arrays of arbitrary dimensions.
Numpy supports common array and matrix operations. For the same numerical calculation task, using Numpy is much cleaner than using Python directly.
Numpy uses the ndarray object to handle multidimensional arrays, which is a fast and flexible container for big data.

Comparison of ndarray and Python native list operation efficiency:
Insert picture description here
A simple test case was prepared to measure the operation efficiency of ndarray and Python native list. It is not difficult to see from the actual running time that the efficiency of ndarray is much higher than that of Python native list. . Of course, the computational efficiency advantage of ndarray will become more obvious as the array becomes larger, which means that the larger the array, the more obvious the advantage of Numpy.

What is the reason why the efficiency of ndarray is higher than that of Python native list?

Insert picture description here
Let's take a brief look at the picture above:
1. Storage advantages:

  • As we all know, our Python native list is a kind of sequence table, and the sequence table is divided into an integrated storage structure and a separate storage structure according to whether the information area and the data area are separated. The Python native list that stores a large amount of data obviously belongs to the form of separate storage and external elements, which means that it does not store the corresponding data, but only stores the address of the element, so the python native list is The next element can only be found by addressing, which is one of the reasons why Python's native list calculation is inefficient.
  • In the figure, we can see that ndarray obviously belongs to an integrated storage structure when storing data , and the addresses of data and data are continuous, which makes the batch operation of array elements faster.

2. In scientific computing, Numpy's ndarray can save a lot of loop statements, and the code usage is much simpler than Python's native list.

3. Vectorized operation can also be understood as parallel operation.
Numpy has built-in parallel operation function. When the system has multiple cores, numpy will automatically do parallel calculation when doing certain calculations.

4. The bottom layer of Numpy is written in C language, and the GIL (Global Interpreter Lock) is released internally, and its operation speed on the array is not limited by the Python interpreter

to sum up:

The efficiency of using Numpy scientific computing library is much higher than that of native Python code to achieve scientific computing. Numpy can be called the light of Python! ! !

Guess you like

Origin blog.csdn.net/qq_41475067/article/details/112858999