In this article, I will introduce some simple methods that can make Python for loops 1.3 to 900 times faster.
A common feature built into Python is the timeit module. We will use this in the following sections to measure the current and improved performance of loops.
For each method, we established a baseline by running a test that consisted of running the function under test 100K times (loops) over 10 test runs and then calculating the average time per loop (in nanoseconds, ns).
A few simple methods
1. List comprehension
# Baseline version (Inefficient way)
# Calculating the power of numbers
# Without using List Comprehension
deftest_01_v0(numbers):
output= []
forninnumbers:
output.append(n**2.5)
returnoutput
# Improved version
# (Using List Comprehension)
deftest_01_v1(numbers):
output= [n**2.5forninnumbers]
returnoutput
The result is as follows:
# Summary Of Test Results
Baseline: 32.158 ns per loop
Improved: 16.040 ns per loop
% Improvement: 50.1 %
Speedup: 2.00x
You can see that using list comprehensions can increase the speed by 2 times
2. Calculate the length externally
If you need to rely on the length of the list for iteration, do the calculation outside the for loop.
# Baseline version (Inefficient way)
# (Length calculation inside for loop)
deftest_02_v0(numbers):
output_list= []
foriinrange(len(numbers)):
output_list.append(i*2)
returnoutput_list
# Improved version
# (Length calculation outside for loop)
deftest_02_v1(numbers):
my_list_length=len(numbers)
output_list= []
foriinrange(my_list_length):
output_list.append(i*2)
returnoutput_list
By moving the list length calculation out of the for loop, it is accelerated by 1.6 times. Few people may know this method.
# Summary Of Test Results
Baseline: 112.135 ns per loop
Improved: 68.304 ns per loop
% Improvement: 39.1 %
Speedup: 1.64x
3. Use Set
Use set in case of comparison using for loop.
# Use for loops for nested lookups
deftest_03_v0(list_1, list_2):
# Baseline version (Inefficient way)
# (nested lookups using for loop)
common_items= []
foriteminlist_1:
ifiteminlist_2:
common_items.append(item)
returncommon_items
deftest_03_v1(list_1, list_2):
# Improved version
# (sets to replace nested lookups)
s_1=set(list_1)
s_2=set(list_2)
output_list= []
common_items=s_1.intersection(s_2)
returncommon_items
Using set speeds up 498x when using nested for loops for comparisons
# Summary Of Test Results
Baseline: 9047.078 ns per loop
Improved: 18.161 ns per loop
% Improvement: 99.8 %
Speedup: 498.17x
4. Skip irrelevant iterations
Avoid redundant calculations, i.e. skip irrelevant iterations.
# Example of inefficient code used to find
# the first even square in a list of numbers
deffunction_do_something(numbers):
forninnumbers:
square=n*n
ifsquare%2==0:
returnsquare
returnNone # No even square found
# Example of improved code that
# finds result without redundant computations
deffunction_do_something_v1(numbers):
even_numbers= [iforninnumbersifn%2==0]
fornineven_numbers:
square=n*n
returnsquare
returnNone # No even square found
This method requires code design when designing the content of the for loop. The specific improvement may vary depending on the actual situation:
# Summary Of Test Results
Baseline: 16.912 ns per loop
Improved: 8.697 ns per loop
% Improvement: 48.6 %
Speedup: 1.94x
5. Code merge
In some cases, incorporating the code of a simple function directly into a loop can improve code compactness and execution speed.
# Example of inefficient code
# Loop that calls the is_prime function n times.
defis_prime(n):
ifn<=1:
returnFalse
foriinrange(2, int(n**0.5) +1):
ifn%i==0:
returnFalse
returnTrue
deftest_05_v0(n):
# Baseline version (Inefficient way)
# (calls the is_prime function n times)
count=0
foriinrange(2, n+1):
ifis_prime(i):
count+=1
returncount
deftest_05_v1(n):
# Improved version
# (inlines the logic of the is_prime function)
count=0
foriinrange(2, n+1):
ifi<=1:
continue
forjinrange(2, int(i**0.5) +1):
ifi%j==0:
break
else:
count+=1
returncount
This can also increase by 1.3 times
# Summary Of Test Results
Baseline: 1271.188 ns per loop
Improved: 939.603 ns per loop
% Improvement: 26.1 %
Speedup: 1.35x
Why is this?
Calling functions involves overhead such as pushing and popping variables on the stack, function lookups, and argument passing. When a simple function is called repeatedly in a loop, the overhead of the function call increases and affects performance. So inlining the function's code directly into the loop eliminates this overhead, potentially improving speed significantly.
⚠️But it should be noted here that balancing code readability and frequency of function calls is an issue to be considered.
some tips
6. Avoid duplication
Consider avoiding repeated calculations, some of which may be redundant and slow down your code. Instead, consider precomputing where applicable.
deftest_07_v0(n):
# Example of inefficient code
# Repetitive calculation within nested loop
result=0
foriinrange(n):
forjinrange(n):
result+=i*j
returnresult
deftest_07_v1(n):
# Example of improved code
# Utilize precomputed values to help speedup
pv= [[i*jforjinrange(n)] foriinrange(n)]
result=0
foriinrange(n):
result+=sum(pv[i][:i+1])
returnresult
The results are as follows
# Summary Of Test Results
Baseline: 139.146 ns per loop
Improved: 92.325 ns per loop
% Improvement: 33.6 %
Speedup: 1.51x
7. Use Generators
Generators support lazy evaluation, which means that the expression inside will only be evaluated when you request the next value from it. Dynamically processing data can help reduce memory usage and improve performance. Especially in large data sets
deftest_08_v0(n):
# Baseline version (Inefficient way)
# (Inefficiently calculates the nth Fibonacci
# number using a list)
ifn<=1:
returnn
f_list= [0, 1]
foriinrange(2, n+1):
f_list.append(f_list[i-1] +f_list[i-2])
returnf_list[n]
deftest_08_v1(n):
# Improved version
# (Efficiently calculates the nth Fibonacci
# number using a generator)
a, b=0, 1
for_inrange(n):
yielda
a, b=b, a+b
You can see the improvement is obvious:
# Summary Of Test Results
Baseline: 0.083 ns per loop
Improved: 0.004 ns per loop
% Improvement: 95.5 %
Speedup: 22.06x
8. map() function
Use Python's built-in map() function. It allows processing and transforming all items in an iterable object without using an explicit for loop.
defsome_function_X(x):
# This would normally be a function containing application logic
# which required it to be made into a separate function
# (for the purpose of this test, just calculate and return the square)
returnx**2
deftest_09_v0(numbers):
# Baseline version (Inefficient way)
output= []
foriinnumbers:
output.append(some_function_X(i))
returnoutput
deftest_09_v1(numbers):
# Improved version
# (Using Python's built-in map() function)
output=map(some_function_X, numbers)
returnoutput
Using Python's built-in map() function instead of an explicit for loop speeds up 970x.
# Summary Of Test Results
Baseline: 4.402 ns per loop
Improved: 0.005 ns per loop
% Improvement: 99.9 %
Speedup: 970.69x
Why is this?
The map() function is written in C and is highly optimized so that its inner implicit loop is much more efficient than a regular Python for loop. So the speed has increased, or you can say that Python is still too slow, ha.
9. Use Memoization
The idea of memory optimization algorithms is to cache (or "memory") the results of expensive function calls and return them when the same input occurs. It can reduce redundant calculations and speed up programs.
First is the inefficient version.
# Example of inefficient code
deffibonacci(n):
ifn==0:
return0
elifn==1:
return1
returnfibonacci(n-1) +fibonacci(n-2)
deftest_10_v0(list_of_numbers):
output= []
foriinnumbers:
output.append(fibonacci(i))
returnoutput
Then we use the lru_cache function of Python's built-in functools.
# Example of efficient code
# Using Python's functools' lru_cache function
importfunctools
@functools.lru_cache()
deffibonacci_v2(n):
ifn==0:
return0
elifn==1:
return1
returnfibonacci_v2(n-1) +fibonacci_v2(n-2)
def_test_10_v1(numbers):
output= []
foriinnumbers:
output.append(fibonacci_v2(i))
returnoutput
The result is as follows:
# Summary Of Test Results
Baseline: 63.664 ns per loop
Improved: 1.104 ns per loop
% Improvement: 98.3 %
Speedup: 57.69x
Using Python's built-in functools' lru_cache function uses Memoization to speed up 57x.
How is the lru_cache function implemented?
"LRU" is the abbreviation of "Least Recently Used". lru_cache is a decorator that can be applied to functions to enable memoization. It stores the results of recent function calls in a cache so that when the same input appears again, the cached results can be provided, thus saving computation time. The lru_cache function, when applied as a decorator, allows an optional maxsize parameter, which determines the maximum size of the cache (i.e., how many different input values it stores results for). If the maxsize parameter is set to None, the LRU feature is disabled and the cache can grow unconstrained, which consumes a lot of memory. This is the simplest optimization method of exchanging space for time.
10. Vectorization
importnumpyasnp
deftest_11_v0(n):
# Baseline version
# (Inefficient way of summing numbers in a range)
output=0
foriinrange(0, n):
output=output+i
returnoutput
deftest_11_v1(n):
# Improved version
# (# Efficient way of summing numbers in a range)
output=np.sum(np.arange(n))
returnoutput
Vectorization is generally used in the data processing libraries numpy and pandas of machine learning.
# Summary Of Test Results
Baseline: 32.936 ns per loop
Improved: 1.171 ns per loop
% Improvement: 96.4 %
Speedup: 28.13x
11. Avoid creating intermediate lists
Use filterfalse to avoid creating intermediate lists. It helps to use less memory.
deftest_12_v0(numbers):
# Baseline version (Inefficient way)
filtered_data= []
foriinnumbers:
filtered_data.extend(list(
filter(lambdax: x%5==0,
range(1, i**2))))
returnfiltered_data
An improved version of the same functionality is implemented using Python's built-in itertools' filterfalse function.
fromitertoolsimportfilterfalse
deftest_12_v1(numbers):
# Improved version
# (using filterfalse)
filtered_data= []
foriinnumbers:
filtered_data.extend(list(
filterfalse(lambdax: x%5!=0,
range(1, i**2))))
returnfiltered_data
Depending on the use case, this approach may not significantly increase execution speed, but may reduce memory usage by avoiding the creation of intermediate lists. We got a 131x improvement here
# Summary Of Test Results
Baseline: 333167.790 ns per loop
Improved: 2541.850 ns per loop
% Improvement: 99.2 %
Speedup: 131.07x
12. Efficient connection string
Any string concatenation operation using the + operator will be slow and consume more memory. Use join instead.
deftest_13_v0(l_strings):
# Baseline version (Inefficient way)
# (concatenation using the += operator)
output=""
fora_strinl_strings:
output+=a_str
returnoutput
deftest_13_v1(numbers):
# Improved version
# (using join)
output_list= []
fora_strinl_strings:
output_list.append(a_str)
return"".join(output_list)
The test needed a simple way to generate a larger list of strings, so a simple helper function was written to generate the list of strings needed to run the test.
fromfakerimportFaker
defgenerate_fake_names(count : int=10000):
# Helper function used to generate a
# large-ish list of names
fake=Faker()
output_list= []
for_inrange(count):
output_list.append(fake.name())
returnoutput_list
l_strings=generate_fake_names(count=50000)
The result is as follows:
# Summary Of Test Results
Baseline: 32.423 ns per loop
Improved: 21.051 ns per loop
% Improvement: 35.1 %
Speedup: 1.54x
Using join functions instead of + operator speeds up by 1.5x. Why is the join function faster?
The time complexity of the string concatenation operation using the + operator is O(n²), while the time complexity of the string concatenation operation using the join function is O(n).
Summarize
This article introduces some simple methods to improve the performance of Python for loops by 1.3 to 970x.
- Using Python’s built-in map() function instead of an explicit for loop speeds up 970x
- Use set instead of nested for loops to speed up 498x [Tip #3]
- Using itertools' filterfalse function speeds up 131x
- Speed up 57x with Memoization using lru_cache function
https://avoid.overfit.cn/post/b01a152cfb824acc86f5118431201fe3
Author: Nirmalya Ghosh