【Python-GPU加速】基于Numba的GPU计算加速（一）基本

Numba是一个可以利用GPU/CPU和CUDA 对python函数进行动态编译，大幅提高执行速度的加速工具包。

利用修饰器@jit,@cuda.jit,@vectorize等对函数进行编译

JIT：即时编译，提高执行速度

基于特定数据类型

集中于数值计算(高效编译math，cmath，sub-numpy)

在这里插入图片描述
Numba是一个开源的JIT编译器，可以将python或者numpy 的一些数学计算编译为运算速度极快的机器码，从而大幅提升程序的运算效率。它使用通用的LLVM编译库，不仅可以编译用于CPU执行的代码，达到和C相比拟的速度，同时还可以调用GPU库（如NVIDIA的CUDA和AMD的ROCs等）来实现GPU加速，所左右这些，都可以简单的利用python中的装饰器来实现。

注：本系列主要集中于GPU和CUDA加速的使用。

1. 安装

如果安装过anaconda以及tensorflow等软件，numba可能已经在环境中了。先检查避免重复安装。
请仔细对照官网步骤以免GPU驱动出错！！！
官网步骤>>>link

直接利用conda或者pip即可安装:

$ conda install numba

$ pip install numba

GPU 安装请注意驱动！!

对于NvidiaGPU需要安装驱动和CUDA（推荐CUDA 8.0 or later）

#官网介绍：conda直接安装cudatoolkit即可，无需安装cuda
$ conda install cudatoolkit

但*pip安装可能需要自行安装cuda，并设置环境变量
NUMBAPRO_CUDA_DRIVER :Path to the CUDA driver shared library file
NUMBAPRO_NVVM :Path to the CUDA libNVVM shared library file
NUMBAPRO_LIBDEVICE :Path to the CUDA libNVVM libdevice directory which contains .bc files

最后使用:numba -s来查看安装情况。

对于numba，如果安装不便的情况下可以使用云服务或者在线notebook来学习, 以及一个GPU的notebook

2.基本使用

Numba主要使用修饰器来对python函数进行编译加速，其中包括了@jit,@vectorize,@cuda.jit等常用修饰器。

import numpy as np

def my_add(a,b):
    return a+b

使用Numpy加速：

from numba import jit
#利用jit编译加速 cpu
@jit
def my_numba_add(x, y):
    return x + y

测试一下函数的表现

###
#在jupyter 中可以使用%timeit来测试
import time
def test(n):
    a = np.array((n))
    b = np.array((n))
    tic1 = time.time()
    my_add(a,b)
    t1 = time.time()-tic1
    print('python time:',t1)
	
    tic2 = time.time()
    my_numba_add(a,b)
    t2 = time.time()-tic2
    print('Numba time:',t2)
    print('Numba acclerated %f times'%(t1/t2))

#由于计算比较简单，获得的加速比并不大。有兴趣可以加入复杂运算做测试
>>>test(1000)
python time: 2.956390380859375e-05
Numba time: 1.7881393432617188e-05
Numba acclerated 1.653333 times

在这里插入图片描述
pic from pexels.com

ref:
http://numba.pydata.org/
https://www.jianshu.com/p/f342ecf11c26
https://blog.csdn.net/u013975830/article/details/78822919?utm_source=blogxgwz8