PyTorch model deployment process (ONNX Runtime)

Model deployment refers to the process of making a trained deep learning model run in a specific environment. Difficulties faced by model deployment:

  • The environment required to run the model is difficult to configure. Deep learning models are usually written by some frameworks, such as PyTorch and TensorFlow. Due to the limitations of the framework size and dependent environment, the framework is not suitable for installation in production environments such as mobile phones and development boards.
  • The structure of the deep learning model is usually relatively large, requiring a large amount of computing power to meet the real-time operation requirements, and the operation efficiency needs to be optimized.

Because of these difficulties, model deployment cannot be accomplished by simple environment configuration and installation. Currently there is a popular pipeline for model deployment:

        In order for the model to be finally deployed to a certain environment, any deep learning framework can be used to define the network structure, and the parameters in the network can be determined through training. After that, the structure and parameters of the model will be converted into an intermediate representation that only describes the network structure, and some optimizations for the network structure will be performed on the intermediate representation. Finally, written in a hardware-oriented high-performance programming framework (such as CUDA, OpenCL), the inference engine that can efficiently execute the operators in the deep learning network will convert the intermediate representation into a specific file format, and run the model efficiently on the corresponding hardware platform .

Create a PyTorch model

a. Configuration environment

#Create a virtual environment called deploy with pre-installed Python 3.7
conda create -n deploy python=3.7 -y
# Enter the virtual environment
conda activate deploy
# Install the Gpu version of PyTorch
# Select the appropriate configuration from the official website and copy the download path --- https://pytorch.org/get-started/locally/
cconda install pytorch torchvision cudatoolkit=11.3 -c pytorch
# Install ONNX Runtime, ONNX, OpenCV
pip install onnxruntime onnx opencv-python

b. Create a PyTorch model

import os

import cv2
import numpy as np
import requests
import torch
import torch.onnx
from torch import nn

class SuperResolutionNet(nn.Module):
    def __init__(self, upscale_factor):
        super().__init__()
        self.upscale_factor = upscale_factor
        self.img_upsampler = nn.Upsample(
            scale_factor=self.upscale_factor,
            mode='bicubic',
            align_corners=False)

        self.conv1 = nn.Conv2d(3,64,kernel_size=9,padding=4)
        self.conv2 = nn.Conv2d(64,32,kernel_size=1,padding=0)
        self.conv3 = nn.Conv2d(32,3,kernel_size=5,padding=2)

        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.img_upsampler(x)
        out = self.relu(self.conv1(x))
        out = self.relu(self.conv2(out))
        out = self.conv3(out)
        return out

# Download checkpoint and test image
urls = ['https://download.openmmlab.com/mmediting/restorers/srcnn/srcnn_x4k915_1x16_1000k_div2k_20200608-4186f232.pth',
        'https://raw.githubusercontent.com/open-mmlab/mmediting/master/tests/data/face/000001.png']
names = ['srcnn.pth', 'face.png']
for url, name in zip(urls, names):
    if not os.path.exists(name):
        open(name, 'wb').write(requests.get(url).content)

def init_torch_model():
    torch_model = SuperResolutionNet(upscale_factor=3)

    state_dict = torch.load('srcnn.pth')['state_dict']

    # Adapt the checkpoint
    for old_key in list(state_dict.keys()):
        new_key = '.'.join(old_key.split('.')[1:])
        state_dict[new_key] = state_dict.pop(old_key)

    torch_model.load_state_dict(state_dict)
    torch_model.eval()
    return torch_model

model = init_torch_model()
input_img = cv2.imread('face.png').astype(np.float32)

# HWC to NCHW
input_img = np.transpose(input_img, [2, 0, 1])
input_img = np.expand_dims(input_img, 0)

# Inference
torch_output = model(torch.from_numpy(input_img)).detach().numpy()

# NCHW to HWC
torch_output = np.squeeze(torch_output, 0)
torch_output = np.clip(torch_output, 0, 255)
torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8)

# Show image
cv2.imwrite("face_torch.png", torch_output)

The code creates a classic super-resolution network SRCNN . SRCNN first upsamples the image to the corresponding resolution, and then processes the image with 3 convolutional layers. If the script runs normally, a super-resolution photo of the face will be saved in face_torch.png .

After the PyTorch model is tested correctly, let's officially start deploying the model. Our next task is to convert the PyTorch model into a model described by the intermediate representation ONNX.

c. Convert the PyTorch model to a model described by ONNX

ONNX (Open Neural Network Exchange) is a format jointly released by Facebook and Microsoft in 2017 for standard descriptions of computational graphs. At present, under the joint maintenance of several institutions, ONNX has docked with various deep learning frameworks and various inference engines. Therefore, ONNX is regarded as a bridge from the deep learning framework to the inference engine, just like the intermediate language of the compiler. Due to the different compatibility of various frameworks, ONNX is usually only used to represent static graphs that are easier to deploy.

#  PyTorch 的模型转换成 ONNX 格式的模型
x = torch.randn(1, 3, 256, 256)

with torch.no_grad():
    torch.onnx.export(
        model,
        x,
        "srcnn.onnx",
        opset_version=11,
        input_names=['input'],
        output_names=['output'])

其中torch.onnx.export 是 PyTorch 自带的把模型转换成 ONNX 格式的函数。前三个参数分别是要转换的模型、模型的任意一组输入、导出的 ONNX 文件的文件名。

从 PyTorch 的模型到 ONNX 的模型,PyTorch提供了一种叫做追踪(trace)的模型转换方法:给定一组输入,再实际执行一遍模型,即把这组输入对应的计算图记录下来,保存为 ONNX 格式。export 函数用的就是追踪导出方法,需要给任意一组输入,让模型跑起来。测试图片是三通道256x256 大小的,这里也构造一个同样形状的随机张量。

opset_version 表示 ONNX 算子集的版本。input_names, output_names 是输入、输出 tensor 的名称。代码运行成功,目录下会新增一个 srcnn.onnx 的 ONNX 模型文件

# 验证模型文件是否正确,直接加在前面的代码后面就行
import onnx

onnx_model = onnx.load("srcnn.onnx")
try:
    onnx.checker.check_model(onnx_model)
except Exception:
    print("Model incorrect")
else:
    print("Model correct")

d.ONNX Runtime上运行模型和推理

推理引擎-ONNX Runtime 是由微软维护的一个跨平台机器学习推理加速器。ONNX Runtime 是直接对接ONNX的,即ONNX Runtime可以直接读取并运行.onnx文件,而不需要再把.onnx格式的文件转换成其他格式的文件。也就是说,对于 PyTorch - ONNX - ONNX Runtime 这条部署流水线,只要在目标设备中得到 .onnx 文件,并在 ONNX Runtime 上运行模型,模型部署就算大功告成了。

ONNX Runtime 提供了 Python 接口。

# ONNX Runtime完成模型推理,还是在之前脚本后添加代码
import onnxruntime

ort_session = onnxruntime.InferenceSession("srcnn.onnx")
ort_inputs = {'input': input_img}
ort_output = ort_session.run(['output'], ort_inputs)[0]

ort_output = np.squeeze(ort_output, 0)
ort_output = np.clip(ort_output, 0, 255)
ort_output = np.transpose(ort_output, [1, 2, 0]).astype(np.uint8)
cv2.imwrite("face_ort.png", ort_output)

onnxruntime.InferenceSession 用于获取一个 ONNX Runtime 推理器,其参数是用于推理的 ONNX 模型文件。推理器的 run 方法用于模型推理,其第一个参数为输出张量名的列表,第二个参数为输入值的字典。其中输入值字典的 key 为张量名,value 为 numpy 类型的张量值。输入输出张量的名称需要和 torch.onnx.export 中设置的输入输出名对应。

如果代码正常运行的话,另一幅超分辨率照片会保存在 face_ort.png 中。这幅图片和刚刚得到的 face_torch.png 是一样的。即ONNX Runtime 成功运行了 SRCNN 模型,完成模型部署。

以后再想实现超分辨率的操作,只需要提供一个srcnn.onnx文件,并帮助用户配置好 ONNX Runtime 的 Python 环境,用几行代码就可以运行模型。或者可以利用 ONNX Runtime 编译出一个可以直接执行模型的应用程序。只需给用户提供 ONNX 模型文件,并让用户在应用程序选择要执行的 ONNX 模型文件名就可以运行模型了。

Guess you like

Origin blog.csdn.net/xs1997/article/details/131747242