使用codon加速你的python程序

作为高性能 Python 编译器，Codon 可将 Python 代码编译为本机机器代码，而无需任何运行时开销。在单线程上，Python 的典型加速大约为 10-100 倍或更多。Codon 的性能通常与 C/C++ 的性能相当。与 Python 不同，Codon 支持本机多线程，这可以使速度提高很多倍。Codon 可通过插件基础结构进行扩展，它允许用户合并新的库、编译器优化甚至关键字。
Codon 框架是完全模块化和可扩展的，允许无缝集成新模块、编译器优化、领域特定语言等，并积极为生物信息学和量化金融等多个领域开发新的 Codon 扩展。

github链接：https://github.com/exaloop/codon
官方文档链接：https://docs.exaloop.io/codon/
在这里插入图片描述

1.安装

目前codon编译好的包仅支持x86_64架构下的linux系统以及mac系统，windows系统下的安装正在开发中…
打开终端输入下述指令进行安装（当然也可以直接把这个install.sh文件下载下来bash install.sh进行安装）：

/bin/bash -c "$(curl -fsSL https://exaloop.io/install.sh)"

安装完成截图如下：
在这里插入图片描述
安装完成后，需要配置以下环境，其中第一行引号中的内容需要更换为自己的，上图中有指明正确路径：

echo "export PATH=/home/lzj/.codon/bin:$PATH" >> ~/.bashrc
source ~/.bashrc

此时，在终端中输入codon就有相关的提示了
在这里插入图片描述
但是如果是使用装饰器的方式加速程序，则还需要安装以下python的api（官方说类似flask功能的还不支持，建议使用@codon装饰器来加速特定部分）

# 进入codon安装路径
cd ~/.codon/python
# 进入指定的虚拟环境
conda activate xxx
# 安装python的api
python setup.py install

安装完成提示信息：
在这里插入图片描述

2.使用codon

Codon 是一种与 Python 兼容的语言，许多 Python 程序只要稍作修改就可以工作，使用@conda.jit装饰器可以快速将之前的代码进行加速（目前我只实验了jit这个装饰器，par这个装饰器没有试成功）：

import codon
from time import time

def is_prime_python(n):
    if n <= 1:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

@codon.jit
def is_prime_codon(n):
    if n <= 1:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

t0 = time()
ans = sum(1 for i in range(100000, 200000) if is_prime_python(i))
t1 = time()
print(f'[python] {
      
      ans} | took {
      
      t1 - t0} seconds')

t0 = time()
ans = sum(1 for i in range(100000, 200000) if is_prime_codon(i))
t1 = time()
print(f'[codon]  {
      
      ans} | took {
      
      t1 - t0} seconds')

同样的程序加速了5倍之多，虽然赶不上官方声称的10-100倍，但是性能也有了明显的提升
在这里插入图片描述
以下是github仓库里的例子：
新建一个fib.py的文件

def fib(n):
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()
fib(1000)

使用codon对其进行编译，编译codon有许多选项和模式：

# compile and run the program
codon run fib.py
# 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

# compile and run the program with optimizations enabled
codon run -release fib.py
# 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

# compile to executable with optimizations enabled
codon build -release -exe fib.py
./fib
# 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

# compile to LLVM IR file with optimizations enabled
codon build -release -llvm fib.py
# outputs file fib.ll

这个素数计数示例展示了 Codon 的OpenMP支持，通过添加一条线启用。装饰器@par告诉编译器并行化以下for循环，在本例中使用动态调度、块大小 100 和 16 个线程。

def is_prime(n):
    factors = 0
    for i in range(2, n):
        if n % i == 0:
            factors += 1
    return factors == 0

limit = 1000
total = 0

@par(schedule='dynamic', chunk_size=100, num_threads=16)
for i in range(2, limit):
    if is_prime(i):
        total += 1

print(total)

Codon 支持编写和执行 GPU 内核。下面是计算Mandelbrot 集的示例：

import gpu

MAX    = 1000  # maximum Mandelbrot iterations
N      = 4096  # width and height of image
pixels = [0 for _ in range(N * N)]

def scale(x, a, b):
    return a + (x/N)*(b - a)

@gpu.kernel
def mandelbrot(pixels):
    idx = (gpu.block.x * gpu.block.dim.x) + gpu.thread.x
    i, j = divmod(idx, N)
    c = complex(scale(j, -2.00, 0.47), scale(i, -1.12, 1.12))
    z = 0j
    iteration = 0

    while abs(z) <= 2 and iteration < MAX:
        z = z**2 + c
        iteration += 1

    pixels[idx] = int(255 * iteration/MAX)

mandelbrot(pixels, grid=(N*N)//1024, block=1024)