12 ways to speed up Python loops, which can speed up up to 900 times

In this article, I will introduce some simple methods that can make Python for loops 1.3 to 900 times faster.

A common feature built into Python is the timeit module. We will use this in the following sections to measure the current and improved performance of loops.

For each method, we established a baseline by running a test that consisted of running the function under test 100K times (loops) over 10 test runs and then calculating the average time per loop (in nanoseconds, ns).

A few simple methods

1. List comprehension

 # Baseline version (Inefficient way)
 # Calculating the power of numbers
 # Without using List Comprehension
 deftest_01_v0(numbers):
   output= []
   forninnumbers:
       output.append(n**2.5)
   returnoutput
 
 # Improved version
 # (Using List Comprehension)
 deftest_01_v1(numbers):
   output= [n**2.5forninnumbers]
   returnoutput

The result is as follows:

 # Summary Of Test Results
      Baseline: 32.158 ns per loop
      Improved: 16.040 ns per loop
 % Improvement: 50.1 %
       Speedup: 2.00x

You can see that using list comprehensions can increase the speed by 2 times

2. Calculate the length externally

If you need to rely on the length of the list for iteration, do the calculation outside the for loop.

 # Baseline version (Inefficient way)
 # (Length calculation inside for loop)
 deftest_02_v0(numbers):
   output_list= []
   foriinrange(len(numbers)):
     output_list.append(i*2)
   returnoutput_list
 
 # Improved version
 # (Length calculation outside for loop)
 deftest_02_v1(numbers):
   my_list_length=len(numbers)
   output_list= []
   foriinrange(my_list_length):
     output_list.append(i*2)
   returnoutput_list

By moving the list length calculation out of the for loop, it is accelerated by 1.6 times. Few people may know this method.

 # Summary Of Test Results
      Baseline: 112.135 ns per loop
      Improved:  68.304 ns per loop
 % Improvement: 39.1 %
       Speedup: 1.64x

3. Use Set

Use set in case of comparison using for loop.

 # Use for loops for nested lookups
 deftest_03_v0(list_1, list_2):
   # Baseline version (Inefficient way)
   # (nested lookups using for loop)
   common_items= []
   foriteminlist_1:
       ifiteminlist_2:
           common_items.append(item)
   returncommon_items
 
 deftest_03_v1(list_1, list_2):
   # Improved version
   # (sets to replace nested lookups)
   s_1=set(list_1)
   s_2=set(list_2)
   output_list= []
   common_items=s_1.intersection(s_2)
   returncommon_items

Using set speeds up 498x when using nested for loops for comparisons

 # Summary Of Test Results
      Baseline: 9047.078 ns per loop
      Improved:   18.161 ns per loop
 % Improvement: 99.8 %
       Speedup: 498.17x

4. Skip irrelevant iterations

Avoid redundant calculations, i.e. skip irrelevant iterations.

 # Example of inefficient code used to find 
 # the first even square in a list of numbers
 deffunction_do_something(numbers):
   forninnumbers:
     square=n*n
     ifsquare%2==0:
         returnsquare
 
   returnNone  # No even square found
 
 # Example of improved code that 
 # finds result without redundant computations
 deffunction_do_something_v1(numbers):
   even_numbers= [iforninnumbersifn%2==0]
   fornineven_numbers:
     square=n*n
     returnsquare
 
   returnNone  # No even square found

This method requires code design when designing the content of the for loop. The specific improvement may vary depending on the actual situation:

 # Summary Of Test Results
      Baseline: 16.912 ns per loop
      Improved:  8.697 ns per loop
 % Improvement: 48.6 %
       Speedup: 1.94x

5. Code merge

In some cases, incorporating the code of a simple function directly into a loop can improve code compactness and execution speed.

 # Example of inefficient code
 # Loop that calls the is_prime function n times.
 defis_prime(n):
   ifn<=1:
     returnFalse
   foriinrange(2, int(n**0.5) +1):
     ifn%i==0:
       returnFalse
 
   returnTrue
 
 deftest_05_v0(n):
   # Baseline version (Inefficient way)
   # (calls the is_prime function n times)
   count=0
   foriinrange(2, n+1):
     ifis_prime(i):
       count+=1
   returncount
 
 deftest_05_v1(n):
   # Improved version
   # (inlines the logic of the is_prime function)
   count=0
   foriinrange(2, n+1):
     ifi<=1:
       continue
     forjinrange(2, int(i**0.5) +1):
       ifi%j==0:
         break
     else:
       count+=1
   returncount

This can also increase by 1.3 times

 # Summary Of Test Results
      Baseline: 1271.188 ns per loop
      Improved:  939.603 ns per loop
 % Improvement: 26.1 %
       Speedup: 1.35x

Why is this?

Calling functions involves overhead such as pushing and popping variables on the stack, function lookups, and argument passing. When a simple function is called repeatedly in a loop, the overhead of the function call increases and affects performance. So inlining the function's code directly into the loop eliminates this overhead, potentially improving speed significantly.

⚠️But it should be noted here that balancing code readability and frequency of function calls is an issue to be considered.

some tips

6. Avoid duplication

Consider avoiding repeated calculations, some of which may be redundant and slow down your code. Instead, consider precomputing where applicable.

 deftest_07_v0(n):
   # Example of inefficient code
   # Repetitive calculation within nested loop
   result=0
   foriinrange(n):
     forjinrange(n):
       result+=i*j
   returnresult
 
 deftest_07_v1(n):
   # Example of improved code
   # Utilize precomputed values to help speedup
   pv= [[i*jforjinrange(n)] foriinrange(n)]
   result=0
   foriinrange(n):
     result+=sum(pv[i][:i+1])
   returnresult

The results are as follows

 # Summary Of Test Results
      Baseline: 139.146 ns per loop
      Improved:  92.325 ns per loop
 % Improvement: 33.6 %
       Speedup: 1.51x

7. Use Generators

Generators support lazy evaluation, which means that the expression inside will only be evaluated when you request the next value from it. Dynamically processing data can help reduce memory usage and improve performance. Especially in large data sets

 deftest_08_v0(n):
   # Baseline version (Inefficient way)
   # (Inefficiently calculates the nth Fibonacci
   # number using a list)
   ifn<=1:
     returnn
   f_list= [0, 1]
   foriinrange(2, n+1):
     f_list.append(f_list[i-1] +f_list[i-2])
   returnf_list[n]
 
 deftest_08_v1(n):
   # Improved version
   # (Efficiently calculates the nth Fibonacci
   # number using a generator)
   a, b=0, 1
   for_inrange(n):
     yielda
     a, b=b, a+b

You can see the improvement is obvious:

 # Summary Of Test Results
      Baseline: 0.083 ns per loop
      Improved: 0.004 ns per loop
 % Improvement: 95.5 %
       Speedup: 22.06x

8. map() function

Use Python's built-in map() function. It allows processing and transforming all items in an iterable object without using an explicit for loop.

 defsome_function_X(x):
   # This would normally be a function containing application logic
   # which required it to be made into a separate function
   # (for the purpose of this test, just calculate and return the square)
   returnx**2
 
 deftest_09_v0(numbers):
   # Baseline version (Inefficient way)
   output= []
   foriinnumbers:
     output.append(some_function_X(i))
 
   returnoutput
 
 deftest_09_v1(numbers):
   # Improved version
   # (Using Python's built-in map() function)
   output=map(some_function_X, numbers)
   returnoutput

Using Python's built-in map() function instead of an explicit for loop speeds up 970x.

 # Summary Of Test Results
      Baseline: 4.402 ns per loop
      Improved: 0.005 ns per loop
 % Improvement: 99.9 %
       Speedup: 970.69x

Why is this?

The map() function is written in C and is highly optimized so that its inner implicit loop is much more efficient than a regular Python for loop. So the speed has increased, or you can say that Python is still too slow, ha.

9. Use Memoization

The idea of ​​memory optimization algorithms is to cache (or "memory") the results of expensive function calls and return them when the same input occurs. It can reduce redundant calculations and speed up programs.

First is the inefficient version.

 # Example of inefficient code
 deffibonacci(n):
   ifn==0:
     return0
   elifn==1:
     return1
   returnfibonacci(n-1) +fibonacci(n-2)
 
 deftest_10_v0(list_of_numbers):
   output= []
   foriinnumbers:
     output.append(fibonacci(i))
 
   returnoutput

Then we use the lru_cache function of Python's built-in functools.

 # Example of efficient code
 # Using Python's functools' lru_cache function
 importfunctools
 
 @functools.lru_cache()
 deffibonacci_v2(n):
   ifn==0:
     return0
   elifn==1:
     return1
   returnfibonacci_v2(n-1) +fibonacci_v2(n-2)
 
 def_test_10_v1(numbers):
   output= []
   foriinnumbers:
     output.append(fibonacci_v2(i))
 
   returnoutput

The result is as follows:

 # Summary Of Test Results
      Baseline: 63.664 ns per loop
      Improved:  1.104 ns per loop
 % Improvement: 98.3 %
       Speedup: 57.69x

Using Python's built-in functools' lru_cache function uses Memoization to speed up 57x.

How is the lru_cache function implemented?

"LRU" is the abbreviation of "Least Recently Used". lru_cache is a decorator that can be applied to functions to enable memoization. It stores the results of recent function calls in a cache so that when the same input appears again, the cached results can be provided, thus saving computation time. The lru_cache function, when applied as a decorator, allows an optional maxsize parameter, which determines the maximum size of the cache (i.e., how many different input values ​​it stores results for). If the maxsize parameter is set to None, the LRU feature is disabled and the cache can grow unconstrained, which consumes a lot of memory. This is the simplest optimization method of exchanging space for time.

10. Vectorization

 importnumpyasnp
 
 deftest_11_v0(n):
   # Baseline version
   # (Inefficient way of summing numbers in a range)
   output=0
   foriinrange(0, n):
     output=output+i
 
   returnoutput
 
 deftest_11_v1(n):
   # Improved version
   # (# Efficient way of summing numbers in a range)
   output=np.sum(np.arange(n))
   returnoutput

Vectorization is generally used in the data processing libraries numpy and pandas of machine learning.

 # Summary Of Test Results
      Baseline: 32.936 ns per loop
      Improved:  1.171 ns per loop
 % Improvement: 96.4 %
       Speedup: 28.13x

11. Avoid creating intermediate lists

Use filterfalse to avoid creating intermediate lists. It helps to use less memory.

 deftest_12_v0(numbers):
   # Baseline version (Inefficient way)
   filtered_data= []
   foriinnumbers:
     filtered_data.extend(list(
         filter(lambdax: x%5==0,
                 range(1, i**2))))
   
   returnfiltered_data

An improved version of the same functionality is implemented using Python's built-in itertools' filterfalse function.

 fromitertoolsimportfilterfalse
 
 deftest_12_v1(numbers):
   # Improved version
   # (using filterfalse)
   filtered_data= []
   foriinnumbers:
     filtered_data.extend(list(
         filterfalse(lambdax: x%5!=0,
                     range(1, i**2))))
     
     returnfiltered_data

Depending on the use case, this approach may not significantly increase execution speed, but may reduce memory usage by avoiding the creation of intermediate lists. We got a 131x improvement here

 # Summary Of Test Results
      Baseline: 333167.790 ns per loop
      Improved: 2541.850 ns per loop
 % Improvement: 99.2 %
       Speedup: 131.07x

12. Efficient connection string

Any string concatenation operation using the + operator will be slow and consume more memory. Use join instead.

 deftest_13_v0(l_strings):
   # Baseline version (Inefficient way)
   # (concatenation using the += operator)
   output=""
   fora_strinl_strings:
     output+=a_str
 
   returnoutput
 
 deftest_13_v1(numbers):
   # Improved version
   # (using join)
   output_list= []
   fora_strinl_strings:
     output_list.append(a_str)
 
   return"".join(output_list)

The test needed a simple way to generate a larger list of strings, so a simple helper function was written to generate the list of strings needed to run the test.

 fromfakerimportFaker
 
 defgenerate_fake_names(count : int=10000):
   # Helper function used to generate a 
   # large-ish list of names
   fake=Faker()
   output_list= []
   for_inrange(count):
     output_list.append(fake.name())
 
   returnoutput_list
 
 l_strings=generate_fake_names(count=50000)

The result is as follows:

 # Summary Of Test Results
      Baseline: 32.423 ns per loop
      Improved: 21.051 ns per loop
 % Improvement: 35.1 %
       Speedup: 1.54x

Using join functions instead of + operator speeds up by 1.5x. Why is the join function faster?

The time complexity of the string concatenation operation using the + operator is O(n²), while the time complexity of the string concatenation operation using the join function is O(n).

Summarize

This article introduces some simple methods to improve the performance of Python for loops by 1.3 to 970x.

  • Using Python’s built-in map() function instead of an explicit for loop speeds up 970x
  • Use set instead of nested for loops to speed up 498x [Tip #3]
  • Using itertools' filterfalse function speeds up 131x
  • Speed ​​up 57x with Memoization using lru_cache function

https://avoid.overfit.cn/post/b01a152cfb824acc86f5118431201fe3

Author: Nirmalya Ghosh

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/135334893