These 8 Python speed-up tips are awesome

Follow the "Python Column" WeChat official account, reply with the password [Interview Encyclopedia], and receive interview questions + resume templates immediately.

Python is a scripting language. Compared with compiled languages ​​such as C/C++, it has some shortcomings in efficiency and performance. However, there are many times when Python's efficiency is not as exaggerated as imagined. This article summarizes some tips for speeding up the running of Python code.

 0. Code optimization principles

This article will introduce many techniques to speed up the running of Python code. Before going into the details of code optimization, you need to understand some basic principles of code optimization.

The first basic rule is not to optimize prematurely. Many people start writing code with the goal of performance optimization. "It is much easier to make a correct program faster than to make a fast program correct." Therefore, the prerequisite for optimization is that the code can work properly. Optimizing prematurely may neglect grasping the overall performance indicators. Don't reverse priorities before getting global results.

The second basic principle is to weigh the cost of optimization. Optimization comes at a cost, and it is almost impossible to solve all performance problems. The choice usually faced is time for space or space for time. In addition, development costs also need to be considered.

The third principle is not to optimize the parts that don't matter. If every part of the code were to be optimized, these changes would make the code difficult to read and understand. If your code is running slowly, first find where the code is slow, usually the inner loop, and focus on optimizing where it is slow. Elsewhere, a little loss of time makes little difference.

 1. Avoid global variables

# 不推荐写法。代码耗时:26.8秒
import math
 
size = 10000
for x in range(size):
    for y in range(size):
        z = math.sqrt(x) + math.sqrt(y)

Many programmers will first write some simple scripts in Python language. When writing scripts, they are usually accustomed to writing them directly as global variables, such as the above code. However, due to the different implementations of global variables and local variables, code defined in the global scope will run much slower than code defined in a function. By putting script statements into functions, you can typically achieve speed improvements of 15% - 30%.

# 推荐写法。代码耗时:20.6秒
import math
def main():  # 定义到函数中,以减少全部变量使用
    size = 10000
    for x in range(size):
        for y in range(size):
            z = math.sqrt(x) + math.sqrt(y)
 
main()

 2. avoid

2.1 Avoid module and function attribute access

# 不推荐写法。代码耗时:14.5秒
import math
 
def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(math.sqrt(i))
    return result
 
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
 
main()

Each time . (attribute access operator) is used, specific methods such as getattribute() and getattr() are triggered, which perform dictionary operations and therefore incur additional time overhead. Property access can be eliminated through the from import statement.

# 第一次优化写法。代码耗时:10.9秒
from math import sqrt
 
def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(sqrt(i))  # 避免math.sqrt的使用
    return result
 
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
 
main()

The search of local variables will be faster than that of global variables, so for the frequently accessed variable sqrt, you can speed up the operation by changing it to a local variable.

# 第二次优化写法。代码耗时:9.9秒
import math
 
def computeSqrt(size: int):
    result = []
    sqrt = math.sqrt  # 赋值给局部变量
    for i in range(size):
        result.append(sqrt(i))  # 避免math.sqrt的使用
    return result
 
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
 
main()

In addition to math.sqrt, there is also . in the computeSqrt function, which is to call the append method of the list. By assigning this method to a local variable, you can completely eliminate the use of . inside the for loop in the computeSqrt function.

# 推荐写法。代码耗时:7.9秒
import math
 
def computeSqrt(size: int):
    result = []
    append = result.append
    sqrt = math.sqrt    # 赋值给局部变量
    for i in range(size):
        append(sqrt(i))  # 避免 result.append 和 math.sqrt 的使用
    return result
 
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)
 
main()

2.2 Avoid intra-class attribute access

# 不推荐写法。代码耗时:10.4秒
import math
from typing import List
class DemoClass:
    def __init__(self, value: int):
        self._value = value
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        for _ in range(size):
            append(sqrt(self._value))
        return result
def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        result = demo_instance.computeSqrt(size)
main()

The principle of avoiding . also applies to in-class attributes. Accessing self._value will be slower than accessing a local variable. By assigning frequently accessed intra-class properties to a local variable, you can improve code running speed.

 

# 推荐写法。代码耗时:8.0秒
import math
from typing import List
class DemoClass:
    def __init__(self, value: int):
        self._value = value
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        value = self._value
        for _ in range(size):
            append(sqrt(value))  # 避免 self._value 的使用
        return result
def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        demo_instance.computeSqrt(size)
main()

 3. Avoid unnecessary abstractions

# 不推荐写法,代码耗时:0.55秒
class DemoClass:
    def __init__(self, value: int):
        self.value = value
 
    @property
    def value(self) -> int:
        return self._value
 
    @value.setter
    def value(self, x: int):
        self._value = x
 
def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i
 
main()

Any time you wrap code with additional layers of processing (such as decorators, property access, descriptors), you're going to make the code slower. In most cases, it is necessary to re-examine the definition of using property accessors. Using getter/setter functions to access properties is usually a legacy coding style of C/C++ programmers. If it's really not necessary, use simple attributes.

# 推荐写法,代码耗时:0.33秒
class DemoClass:
    def __init__(self, value: int):
        self.value = value  # 避免不必要的属性访问器
 
def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i
 
main()

 

 4. Avoid data duplication

4.1 Avoid meaningless data copying

# 不推荐写法,代码耗时:6.5秒
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        value_list = [x for x in value]
        square_list = [x * x for x in value_list]
 
main()

The value_list in the above code is completely unnecessary and would create unnecessary data structures or copies.

# 推荐写法,代码耗时:4.8秒
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        square_list = [x * x for x in value]  # 避免无意义的复制
 
main()

Another situation is that you are too paranoid about Python's data sharing mechanism, do not understand or trust Python's memory model well, and abuse functions such as copy.deepcopy(). Usually the copy operation can be eliminated in these codes.

4.2 Do not use intermediate variables when exchanging values

Not recommended. The code takes 0.07 seconds.

# 不推荐写法,代码耗时:0.07秒
def main():
    size = 1000000
    for _ in range(size):
        a = 3
        b = 5
        temp = a
        a = b
        b = temp
 
main()

The above code creates a temporary variable temp when exchanging values. Without the help of intermediate variables, the code is more concise and runs faster.

# 不推荐写法,代码耗时:0.07秒
def main():
    size = 1000000
    for _ in range(size):
        a = 3
        b = 5
        temp = a
        a = b
        b = temp
 
main()

4.3 Use join instead of + for string concatenation

# 不推荐写法,代码耗时:2.6秒
import string
from typing import List
def concatString(string_list: List[str]) -> str:
    result = ''
    for str_i in string_list:
        result += str_i
    return result
def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)
 
main() 

When using a + b to concatenate a string, since the string in Python is an immutable object, it will apply for a memory space, and copy a and b to the newly allocated memory space. Therefore, if n strings are to be spliced, n-1 intermediate results will be generated. Each intermediate result needs to apply for and copy memory, which seriously affects the operating efficiency. When using join() to splice strings, the total memory space that needs to be applied for will be calculated first, then the required memory will be applied for all at once, and each string element will be copied to the memory.

# 推荐写法,代码耗时:0.3秒
import string
from typing import List
def concatString(string_list: List[str]) -> str:
    return ''.join(string_list)  # 使用 join 而不是 +
def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)
 
main()

 5. Make use of the short-circuit characteristics of if conditions

# 不推荐写法,代码耗时:0.05秒
from typing import List
 
def concatString(string_list: List[str]) -> str:
    abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i in abbreviations:
            result += str_i
    return result
 
def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)
 
main()

The short-circuit characteristic of the if condition means that for statements like if a and b, when a is False, it will be returned directly without calculating b; for statements like if a or b, when a is True, it will be returned directly and no longer calculated. Calculate b. Therefore, in order to save running time, for the or statement, variables with a higher probability of being True should be written before or, while and should be postponed.

# 推荐写法,代码耗时:0.03秒
from typing import List
 
def concatString(string_list: List[str]) -> str:
    abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i[-1] == '.' and str_i in abbreviations:  # 利用 if 条件的短路特性
            result += str_i
    return result
 
def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)
 
main()

 6. Loop optimization

6.1 Use for loop instead of while loop

# 不推荐写法。代码耗时:6.7秒
def computeSum(size: int) -> int:
    sum_ = 0
    i = 0
    while i < size:
        sum_ += i
        i += 1
    return sum_
def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)
 
main()

Python's for loop is much faster than the while loop.

# 推荐写法。代码耗时:4.3秒
def computeSum(size: int) -> int:
    sum_ = 0
    for i in range(size):  # for 循环代替 while 循环
        sum_ += i
    return sum_
def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)
 
main()

6.2 Use implicit for loops instead of explicit for loops

For the above example, you can go one step further and use an implicit for loop instead of an explicit for loop.

# 推荐写法。代码耗时:1.7秒
def computeSum(size: int) -> int:
    return sum(range(size))  # 隐式 for 循环代替显式 for 循环
 
def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)
 
main()

6.3 Reduce the calculation of the inner for loop

# 不推荐写法。代码耗时:12.8秒
import math
 
def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        for y in range(size):
            z = sqrt(x) + sqrt(y)
 
main()

In the above code, sqrt(x) is located inside the for loop and will be recalculated during each training process, which increases time overhead.

# 推荐写法。代码耗时:7.0秒
import math
 
def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        sqrt_x = sqrt(x)  # 减少内层 for 循环的计算
        for y in range(size):
            z = sqrt_x + sqrt(y)
 
main()

 7. Use numba.jit

We follow the example introduced above and use numba.jit on this basis. numba can JIT compile Python functions into machine code for execution, greatly improving the code running speed.

# 推荐写法。代码耗时:0.62秒
import numba
 
@numba.jit
def computeSum(size: float) -> int:
    sum = 0
    for i in range(size):
        sum += i
    return sum
def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)
 
main()

 

 

 8. Choose the right data structure

Python's built-in data structures such as str, tuple, list, set, and dict are all implemented in C at the bottom level and are very fast. It is almost impossible to implement new data structures by yourself to achieve the built-in speed in terms of performance.

list is similar to std::vector in C++ and is a dynamic array. It will pre-allocate a certain amount of memory space. When the pre-allocated memory space is used up and elements are added to it, it will apply for a larger memory space, then copy all the original elements there, and then destroy the previous memory. space before inserting new elements. The operation is similar when deleting elements. When the used memory space is less than half of the pre-allocated memory space, an additional small memory will be applied for, an element copy will be made, and then the original large memory space will be destroyed. Therefore, if there are frequent addition and deletion operations, and the number of added and deleted elements is large, the efficiency of the list will not be high. At this point, you should consider using collections.deque. collections.deque is a double-ended queue that has the characteristics of both stack and queue, and can perform O(1) complexity insertion and deletion operations on both ends.

List search operations are also very time-consuming. When you need to frequently search for certain elements in the list, or frequently access these elements in order, you can use bisect to maintain the order of the list object and perform a binary search in it to improve the efficiency of the search.

Another common requirement is to find the minimum or maximum value. In this case, the heapq module can be used to convert the list into a heap, so that the time complexity of obtaining the minimum value is O(1).

Guess you like

Origin blog.csdn.net/weixin_41692221/article/details/131474634