Python Performance Optimization Guide--The secret to making your Python code x3 times faster

Python Performance Optimization Guide

The most criticized thing about Python is its execution speed. How to make Python programs run faster has always been the direction of the Python core team and community. As Python developers, we can also adopt certain principles and techniques to write Python code with better performance. This article will take you to an in-depth discussion of Python program performance optimization methods.

Insert image description here-

Article directory

Optimization principles

Some optimization principles are applicable to all programming languages, and of course they are also applicable to Python. These principles serve as "mental methods" for program optimization. Each of our programmers must keep them in mind and implement them in daily development.

1. Avoid optimizing while developing

Don't think about possible optimizations when writing your program, but focus on making sure your code is clean, correct, readable, and understandable. If you find after writing it that it's too big or too slow, then think about how to optimize it. The great god Donald Knuth once said a wise saying:

"Premature optimization is the root of all evil. (Premature optimization is the root of all evil.)"

This is actually Zeng Guofan's philosophy: "Accommodate things as they come, don't welcome the future, don't be confused at the moment, don't love the past . "

2. Remember the 20/80 rule

In many fields, you can get 80% of the results with 20% of the effort (sometimes it may be the 10/90 rule). Whenever you want to optimize your code, first use profiling tools to find out where 80% of the execution time is spent, so you know where to focus your optimization efforts.

3. Be sure to compare performance before and after optimization

Without performance comparison before and after optimization, we cannot know whether the optimization has actual effects. If the optimized code is only slightly faster than the pre-optimized version, undo the optimization and return to the pre-optimized version. Because it is not worth the slightest performance improvement at the expense of clear, tidy, easy-to-read, and understandable code.

Please keep the above three optimization principles in mind. No matter what language you use in the future, please abide by these three rules when doing performance optimization.

Optimization tools

As mentioned in the second optimization principle, we need to focus optimization efforts on the most time-consuming areas. So how do you find the most time-consuming parts of your program? At this time, we need to use optimization tools to collect data during program operation to help us find the bottleneck of the program. This process is called _Profiling_. There are multiple Profiling tools in Python, each of which has its own usage scenarios and focus. We will introduce them one by one below.

cProfile

Python comes with a Profiling tool called cProfile. This is also the Profiling tool that I recommend everyone to use, because it is the most powerful. It can be injected into every method in the program and collect rich data, including:

  • ncalls: The number of times the method is called
  • tottime: The total time of method execution (excluding the execution time of sub-functions)
  • percall: The average time spent on each execution, which is the quotient tottimedivided byncalls
  • cumtime: The cumulative time of method execution (including the execution time of sub-functions) is also accurate and effective for recursion.
  • percall: cumtimethe quotient divided by the number of original calls (excluding recursive calls)
  • filename:lineno(function): Provide corresponding data for each function

cProfileCan be used directly from the command line.

$  python -m cProfile main.py

Assume that our main.pyimplementation is to find the sum of prime numbers within 1,000,000. The code is as follows:

import math

def is_prime(n: int) -> bool:
    for i in range(2, math.floor(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

def main():
    s = 0
    for i in range(2, 1000000):
        if is_prime(i):
            s += i
    print(s)

if __name__ == "__main__": 
    main()

After execution cProfile, the console will output (part of) the following information:

Insert image description here

Judging from the output, the entire program took 3.091 seconds to run, with a total of 3,000,064 function calls. The following list is the detailed data of each function. cProfile is sorted by function name by default, and we pay more attention to the execution time of the function, so we usually bring it when using cProfile -s timeand let cProfile sort the output by execution time.

$ python -m cProfile -s time .\main.py

The output information sorted by time is as follows:

Insert image description here

As you can see from the output information above, the most time-consuming function is is_primethe function. If we want to optimize, is_primeit will be our optimization focus.

%%timeit and %timeit

The ones introduced above cProfileare mainly used in the command line. But in data analysis and machine learning we often use Jupyter as an interactive programming environment. In interactive programming environments such as Jupyter or IPython, cProfileit cannot be used. We need to use %%timeitand %timeit.

%%timeit%timeitThe difference between and is %%timeitthat it acts on the entire code block and counts the execution time of the entire code block; %timeitit acts on the statement and counts the execution time of the statement in that line. Still looking for the code to find the sum of prime numbers, let’s take a look at how to get its running time in Jupyter:

Insert image description here

The above code is added at the beginning of the code block %%timeit, and it will count the running time of the entire code. %%timeitIt will be run multiple times and the average running time will be calculated. The output is as follows:

3.87 s ± 151 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
timeit() method

Sometimes we may just want to know the execution status of a certain function or a certain line of statements in the code. At this time, using cProfile cProfileis too heavy (cProfile will output the execution status of all functions). We can import timeituse timeit()it to wrap the function we want to profile. or statement. For example:

import timeit

timeit.timeit('list(itertools.repeat("a", 100))', 'import itertools', number=10000000)

The above code will be tested list(itertools.repeat("a", 100))10000000 times and the average running time will be calculated.

10.997665435877963

timeitCan also be used from the command line. For example:

$ python -m timeit "'-'.join(str(n) for n in range(100))"
20000 loops, best of 5: 10.5 usec per loop
Third-party tool line_profiler

The tools introduced above all come with Python or IPython, and the functions provided are more about the running time of the function. When we need to deeply understand the execution of the program, the three tools introduced above are not enough. At this time we need to use the code optimization tool – line_profiler. ine_profiler is a third-party library for Python that functions as a function-based line-by-line code analysis tool. Through this library, time consumption analysis of the target function (allowing analysis of multiple functions) can be performed to facilitate code tuning.

Since it is a third-party tool, it needs to be installed before using line_profiler.

$ pip install line_profiler

After successful installation, we can use @profile annotation and kernprof command to collect code running status. We transform the above example of finding the sum of prime numbers into the form of @profile annotation.

import math
import profile

@profile
def is_prime(n: int) -> bool:
    for i in range(2, math.floor(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True

@profile
def main():
    s = 0
    for i in range(2, 1000000):
        if is_prime(i):
            s += i
    print(s)

if __name__ == "__main__": 
    main()

Then run kernprofthe command

$ kernprof -lv main.py

Among them, the parameter -lindicates using line-by-line profiler instead of cProfile. For the @profile annotation to take effect, parameters must be added -l. The analysis results will be saved to .lprofa file, and -vparameters can be added to write the results to a file and display them on the console. You can view the analysis results with line_profilerthe command:

$ python -m line_profiler <.lprof文件>

Solve bottlenecks

Choose appropriate algorithms and data structures

Using profiling tools we can easily find the bottlenecks of the program. The next step is how to solve the bottleneck. According to statistics, 90% of program performance problems are related to algorithms and data structures. Choosing the appropriate algorithm is most important to improve program performance. For example, if you want to sort a list containing thousands of elements, do not use bubble sort with a time complexity of O ( n 2 ) O(n^2) O(n2), but use quick sort (a time complexity of O ( n log ⁡ n ) O(n\log_n) O(nlogn)) will be much faster.

The above example talks about the impact of algorithms on performance. Choosing the appropriate data structure also has a great impact on performance. For example, searching massive amounts of data. If we use a list to store data, the time complexity of finding the specified element is O ( n ) O(n) O(n); but if we use a binary tree to store the data, the search speed will increase to O ( log ⁡ n ) O (\log_n) O(logn); if we use a hash table to store, the search speed will become O ( 1 ) O(1) O(1).

When describing algorithm complexity, we usually use Big O notation . Big O notation defines an upper bound on the time required by the algorithm. Insertion sort, for example, takes linear time in the best case and quadratic time in the worst case. So the time complexity of our insertion sort is O ( n 2 ) O(n^2) O(n2). This is the upper bound of the time required by the algorithm, and it will not exceed this time under any circumstances.

To this end, I have specially compiled the time complexity of common operations in Python for your reference.

∗ ∗ ∗ \ast \ast \ast ∗∗∗

In addition to choosing appropriate algorithms and data structures , there are also some techniques in the Python development process that can improve program execution speed.

Multipurpose list comprehension

Use list comprehensions wherever possible. For example, to find the multiples of 3 within 10,000, we can write:

l = []
for i in range (1, 10000):
    if i%3 == 0:
        l.append(i)

It would be better to write it in list comprehension. Not only is the code concise, but its performance is also higher than the above code, because list comprehension is more appendperformant.

l = [i for i in range (1, 100000) if i%3 == 0]

Comparison of running time of two pieces of code (%%timeit)

Insert image description here

It can be seen that by looping 100 times and taking the average time, list comprehension is faster than using append.

Use less .operations

Try to avoid using operations during development ., such as

import math
val = math.sqrt(60)

should be replaced by

from math import sqrt
val = sqrt(60)

It is thought that when we use .the calling method, we will first call __getattribute()__or __getattr()__. Both methods contain some dictionary operations, which are time-consuming.

Insert image description here

Judging from the above test, it doesn't need .to be much faster. Therefore, it is often used from module import functionto introduce methods directly and avoid .calling methods.

Make good use of multiple assignments

If you encounter continuous variable assignment, such as

a = 2
b = 3
c = 4
d = 5

It is recommended to write

a, b, c, d = 2, 3, 4, 5
Avoid using global variables

Python has globalkeywords to declare or associate global variables. But dealing with global variables takes more time than local variables. Therefore, do not use global variables unless necessary.

Use library methods whenever possible

If a function is already provided by the Python standard library or a third-party library, use the library method instead of implementing it yourself. Library methods are highly optimized, and even many of the underlying methods are implemented in C language. There is a high probability that the methods we write ourselves will not be more efficient than the library methods, and writing them ourselves is not in line with the DRY spirit.

Join strings together

Many languages ​​use +string concatenation. Of course, Python also supports +concatenating strings, but I prefer to use join()this method to concatenate strings, because join()concatenating strings is +faster. +will create a new string and copy the value of the old string over it, join()not .

Make good use of generators

When we have to deal with lists containing large amounts of data, it is faster to use generators. I wrote a special article "In-depth Understanding of Python Generators and Yield" to explain in detail Python generators and why it is faster to use generators to process large files or large data sets.

Take advantage of acceleration tools

There are many projects dedicated to making Python faster by providing a better running environment or runtime optimizations. Among them, mature ones include PyPy and Numba .

PyPy is on average 4.5 times faster than CPython; for information on how to use PyPy to speed up Python, please see this article "Accelerating Python Programs with PyPy" .

Numba is a JIT compiler that works well with Numpy and can compile Python functions into machine code, greatly improving the speed of scientific calculations. How to use Numb to improve the running speed of Python ? Please read this article "Using Numba: One line of code increases the running speed of Python programs by 100 times"

So if conditions permit, you can use the above two tools to speed up Python code.

Use C/C++/Rust to implement core functions

C/C++/Rust are all much faster than Python. The power of Python is that it can be bundled with other languages. So when dealing with certain performance-sensitive functions, we can consider using C/C++/Rust to implement the core functions and then bind them to the Python language. Many libraries in Python do this, such as Numpy , Scipy , Pandas , Polars , etc.

For how to develop C extension modules in C language, please refer to "Make your Python program as fast as C language"

Use the latest version of Python

Python's core team is also working tirelessly to optimize Python's performance. Each new version released is more optimized and faster than the last version. Not long ago, Python released the latest 3.11.0 . The performance of this version has been greatly improved, 10% to 60% higher than 3.10, and 5% faster than Python 2.7. Therefore, if conditions permit, try to use a newer version of Python or obtain performance improvements.

Guess you like

Origin blog.csdn.net/qq_40647372/article/details/135400795