God works! Two lines of code, 13 times faster! Let Python feel like flying!

Python itself is a slow language, so for computing scenarios, the best way to optimize is to optimize code writing. You can use existing scientific computing libraries: such as Numpy and Scipy. But what if you want to implement a new algorithm without implementing extensions in low-level languages ​​like CPython, Rust, etc.?

For some specific computing scenarios, especially with arrays, Numba can significantly speed up your code. In use, we sometimes need to adjust the original code, and sometimes do not need to make any changes. When it does work, the effect will be very noticeable.

9286bf3153d4ccf8cc41ebed74f407e7.png

In this article, we talk about the following:

  • Why Numpy alone is sometimes not enough

  • Basic usage of Numba

  • How Numba affects how your code works at a high level

Numpy's "Can't Help" Moment

Suppose you want to convert a very large array to be sorted in increasing order: it is well understood that the elements are sorted in ascending order of value, such as:

[1, 2, 1, 3, 3, 5, 4, 6] → [1, 2, 2, 3, 3, 5, 5, 6]

Here's a simple in-place conversion:

def monotonically_increasing(a):
    max_value = 0
    for i in range(len(a)):
        if a[i] > max_value:
            max_value = a[i]
        a[i] = max_value

Numpy runs fast because it can do all the calculations without calling python's own interpreter. But for the above scenario (loops in python), a problem emerges: we lose the unique performance advantages of Numpy.

Using the above function to convert a Numpy array with 10 million elements takes 2.5 seconds to run on my computer. So, can it be optimized even faster?

Speed ​​up with Numba

Numba is a just-in-time compiler for python, specially designed for Numpy array loop calculation scenarios. Obviously, this is exactly what we need. Let's try adding two lines of code to the original function:

from numba import njit

@njit
def monotonically_increasing(a):
    max_value = 0
    for i in range(len(a)):
        if a[i] > max_value:
            max_value = a[i]
        a[i] = max_value

Run it again and find that it only takes 0.19 seconds. Under the premise of completely reusing the logic of the old code, the effect is not bad.

In fact, Numpy also has a special function to solve this scenario (but it will modify the code logic of the original function): `numpy.maximum.accumulate` [1] . By using it, the runtime of the function is reduced to 0.03 seconds.


Runtime
Python for loop 2560ms
Numba for loop 190ms
np.maximum.accumulate 30ms

Introduction to Numba

Find the objective function in Numpy or Scipy to quickly solve common computational problems. But what if the function doesn't exist? (like just now numpy.maximum.accumulate). In this case, if you want to speed up the code execution. Other low-level programming languages ​​may be chosen to implement the extension [2] , but this also means switching programming languages, which complicates the building of modules and the overall system.

With Numba you can do:

  • Run the same code with python and an interpreter with faster compilation

  • Simple and fast iterative algorithm

Numba parses the code first, then compiles them in a just-in-time fashion based on the input type of the data. For example, when the input is a u64 array and a float array, the compiled results are different.

Numba also works for non-CPU computing scenarios: for example, you can run code on the GPU [3] . Admittedly, the example above is just a minimal application of Numba, and there are many more features to choose from in the official documentation [4] .

Some shortcomings of Numba

It takes time to compile the code once

When a Numba-decorated function is called for the first time, it takes a certain amount of time to generate the corresponding machine code. For example, we can use an IPython %timecommand to calculate how long it takes to run a Numba-decorated function:

In [1]: from numba import njit

In [2]: @njit
   ...: def add(a, b): a + b

In [3]: %time add(1, 2)
CPU times: user 320 ms, sys: 117 ms, total: 437 ms
Wall time: 207 ms

In [4]: %time add(1, 2)
CPU times: user 17 µs, sys: 0 ns, total: 17 µs
Wall time: 24.3 µs

In [5]: %time add(1, 2)
CPU times: user 8 µs, sys: 2 µs, total: 10 µs
Wall time: 13.6 µs

As you can see, the function runs very slowly after the first call (note the unit time is milliseconds instead of microseconds), this is because it takes time to compile and generate machine code. However, the execution speed behind the function will be significantly improved. This time cost is consumed again when the type of the input data changes, for example, we change the input type to a floating point number:

In [8]: %time add(1.5, 2.5)
CPU times: user 40.3 ms, sys: 1.14 ms, total: 41.5 ms
Wall time: 41 ms

In [9]: %time add(1.5, 2.5)
CPU times: user 16 µs, sys: 3 µs, total: 19 µs
Wall time: 26 µs

Of course, Numba does not need to be enabled to calculate the sum of two numbers. This case is used here because it is easier to see the time cost of compilation.

Different implementations with python and Numpy

Numba can be said to implement a subset of python in terms of functionality, and it can also be said to implement a subset of the Numpy API, which will lead to some potential problems:

  • There will be cases where some features of python and Numpy are not supported

  • Since Numba reimplemented Numpy's API, the following situations may occur when using it

    • Due to the use of different algorithms, the performance of the two will be different

    • May result in inconsistent results due to bugs

  • Also, when Numba fails to compile, the error messages it exposes can be hard to understand

Numba compared to other options

  • Only use Numpy and Scipy: You can make python code run at the speed of other language compilers, but it does not work for some loop calculation scenarios

  • Write code directly in low-level language: this means you can optimize all code statements, but need to ditch python and use another language

  • Using Numba: The scenario of python loop calculation can be optimized, but the use of some features of the python language itself and Numpy API will be limited

Epilogue

The great thing about Numba is how easy it is to try. So whenever you have a slow for loop that does some math, try using Numba: with any luck, it can significantly speed up your code in just two lines of code.

Interested students can quickly join our planet

3 weeks zero basic introduction provides 10 lessons

12 interesting practical projects throughout the year including source code,

Reward outstanding Top3 students every month to send books

Professional Q&A group, nanny-style teaching by Dachang teachers

If you are not satisfied, feel free to refund within three days! 88 a year, now 16 yuan off

63a7f81be8d948697e211c006bb5996f.png

Scan the code to join, get started with zero basics in 3 weeks

推荐阅读:
入门: 最全的零基础学Python的问题  | 零基础学了8个月的Python  | 实战项目 |学Python就是这条捷径
干货:爬取豆瓣短评,电影《后来的我们》 | 38年NBA最佳球员分析 |   从万众期待到口碑扑街!唐探3令人失望  | 笑看新倚天屠龙记 | 灯谜答题王 |用Python做个海量小姐姐素描图 |碟中谍这么火,我用机器学习做个迷你推荐系统电影
趣味:弹球游戏  | 九宫格  | 漂亮的花 | 两百行Python《天天酷跑》游戏!
AI: 会做诗的机器人 | 给图片上色 | 预测收入 | 碟中谍这么火,我用机器学习做个迷你推荐系统电影
小工具: Pdf转Word,轻松搞定表格和水印! | 一键把html网页保存为pdf!|  再见PDF提取收费! | 用90行代码打造最强PDF转换器,word、PPT、excel、markdown、html一键转换 | 制作一款钉钉低价机票提示器! |60行代码做了一个语音壁纸切换器天天看小姐姐!|

The year's hottest copy

Click to read the original text to see 200 Python cases!

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326857770&siteId=291194637