Python itself is a slow language, so for computing scenarios, the best way to optimize is to optimize code writing. You can use existing scientific computing libraries: such as Numpy and Scipy. But what if you want to implement a new algorithm without implementing extensions in low-level languages like CPython, Rust, etc.?
For some specific computing scenarios, especially with arrays, Numba can significantly speed up your code. In use, we sometimes need to adjust the original code, and sometimes do not need to make any changes. When it does work, the effect will be very noticeable.
In this article, we talk about the following:
Why Numpy alone is sometimes not enough
Basic usage of Numba
How Numba affects how your code works at a high level
Numpy's "Can't Help" Moment
Suppose you want to convert a very large array to be sorted in increasing order: it is well understood that the elements are sorted in ascending order of value, such as:
[1, 2, 1, 3, 3, 5, 4, 6] → [1, 2, 2, 3, 3, 5, 5, 6]
Here's a simple in-place conversion:
def monotonically_increasing(a):
max_value = 0
for i in range(len(a)):
if a[i] > max_value:
max_value = a[i]
a[i] = max_value
Numpy runs fast because it can do all the calculations without calling python's own interpreter. But for the above scenario (loops in python), a problem emerges: we lose the unique performance advantages of Numpy.
Using the above function to convert a Numpy array with 10 million elements takes 2.5 seconds to run on my computer. So, can it be optimized even faster?
Speed up with Numba
Numba is a just-in-time compiler for python, specially designed for Numpy array loop calculation scenarios. Obviously, this is exactly what we need. Let's try adding two lines of code to the original function:
from numba import njit
@njit
def monotonically_increasing(a):
max_value = 0
for i in range(len(a)):
if a[i] > max_value:
max_value = a[i]
a[i] = max_value
Run it again and find that it only takes 0.19 seconds. Under the premise of completely reusing the logic of the old code, the effect is not bad.
In fact, Numpy also has a special function to solve this scenario (but it will modify the code logic of the original function): `numpy.maximum.accumulate` [1] . By using it, the runtime of the function is reduced to 0.03 seconds.
Runtime | |
---|---|
Python for loop |
2560ms |
Numba for loop |
190ms |
np.maximum.accumulate |
30ms |
Introduction to Numba
Find the objective function in Numpy or Scipy to quickly solve common computational problems. But what if the function doesn't exist? (like just now numpy.maximum.accumulate
). In this case, if you want to speed up the code execution. Other low-level programming languages may be chosen to implement the extension [2] , but this also means switching programming languages, which complicates the building of modules and the overall system.
With Numba you can do:
Run the same code with python and an interpreter with faster compilation
Simple and fast iterative algorithm
Numba parses the code first, then compiles them in a just-in-time fashion based on the input type of the data. For example, when the input is a u64 array and a float array, the compiled results are different.
Numba also works for non-CPU computing scenarios: for example, you can run code on the GPU [3] . Admittedly, the example above is just a minimal application of Numba, and there are many more features to choose from in the official documentation [4] .
Some shortcomings of Numba
It takes time to compile the code once
When a Numba-decorated function is called for the first time, it takes a certain amount of time to generate the corresponding machine code. For example, we can use an IPython %time
command to calculate how long it takes to run a Numba-decorated function:
In [1]: from numba import njit
In [2]: @njit
...: def add(a, b): a + b
In [3]: %time add(1, 2)
CPU times: user 320 ms, sys: 117 ms, total: 437 ms
Wall time: 207 ms
In [4]: %time add(1, 2)
CPU times: user 17 µs, sys: 0 ns, total: 17 µs
Wall time: 24.3 µs
In [5]: %time add(1, 2)
CPU times: user 8 µs, sys: 2 µs, total: 10 µs
Wall time: 13.6 µs
As you can see, the function runs very slowly after the first call (note the unit time is milliseconds instead of microseconds), this is because it takes time to compile and generate machine code. However, the execution speed behind the function will be significantly improved. This time cost is consumed again when the type of the input data changes, for example, we change the input type to a floating point number:
In [8]: %time add(1.5, 2.5)
CPU times: user 40.3 ms, sys: 1.14 ms, total: 41.5 ms
Wall time: 41 ms
In [9]: %time add(1.5, 2.5)
CPU times: user 16 µs, sys: 3 µs, total: 19 µs
Wall time: 26 µs
Of course, Numba does not need to be enabled to calculate the sum of two numbers. This case is used here because it is easier to see the time cost of compilation.
Different implementations with python and Numpy
Numba can be said to implement a subset of python in terms of functionality, and it can also be said to implement a subset of the Numpy API, which will lead to some potential problems:
There will be cases where some features of python and Numpy are not supported
Since Numba reimplemented Numpy's API, the following situations may occur when using it
Due to the use of different algorithms, the performance of the two will be different
May result in inconsistent results due to bugs
Also, when Numba fails to compile, the error messages it exposes can be hard to understand
Numba compared to other options
Only use Numpy and Scipy: You can make python code run at the speed of other language compilers, but it does not work for some loop calculation scenarios
Write code directly in low-level language: this means you can optimize all code statements, but need to ditch python and use another language
Using Numba: The scenario of python loop calculation can be optimized, but the use of some features of the python language itself and Numpy API will be limited
Epilogue
The great thing about Numba is how easy it is to try. So whenever you have a slow for loop that does some math, try using Numba: with any luck, it can significantly speed up your code in just two lines of code.
Interested students can quickly join our planet
3 weeks zero basic introduction provides 10 lessons
12 interesting practical projects throughout the year including source code,
Reward outstanding Top3 students every month to send books
Professional Q&A group, nanny-style teaching by Dachang teachers
If you are not satisfied, feel free to refund within three days! 88 a year, now 16 yuan off
Scan the code to join, get started with zero basics in 3 weeks
推荐阅读:
入门: 最全的零基础学Python的问题 | 零基础学了8个月的Python | 实战项目 |学Python就是这条捷径
干货:爬取豆瓣短评,电影《后来的我们》 | 38年NBA最佳球员分析 | 从万众期待到口碑扑街!唐探3令人失望 | 笑看新倚天屠龙记 | 灯谜答题王 |用Python做个海量小姐姐素描图 |碟中谍这么火,我用机器学习做个迷你推荐系统电影
趣味:弹球游戏 | 九宫格 | 漂亮的花 | 两百行Python《天天酷跑》游戏!
AI: 会做诗的机器人 | 给图片上色 | 预测收入 | 碟中谍这么火,我用机器学习做个迷你推荐系统电影
小工具: Pdf转Word,轻松搞定表格和水印! | 一键把html网页保存为pdf!| 再见PDF提取收费! | 用90行代码打造最强PDF转换器,word、PPT、excel、markdown、html一键转换 | 制作一款钉钉低价机票提示器! |60行代码做了一个语音壁纸切换器天天看小姐姐!|
The year's hottest copy
5). 20 python codes you must master, short and powerful, infinitely useful
7). The 80 pages I summarized in the "Rookie Learning Python Selected Dry Goods.pdf" are all dry goods
8). Goodbye Python! I'm going to learn Go! 2500 words in-depth analysis !
Click to read the original text to see 200 Python cases!