Pytorch exports the onnx model, converts C++ to TensorRT and implements reasoning

Event address: Graduation Season · Attack on Technology er

Pytorch exports the onnx model, converts C++ to TensorRT and implements reasoning

This article is a study note, and if there is any discrepancy with the reference text, useyellowMark it out.

Main references :
1. Pytorch exports the onnx model, C++ converts it to TensorRT and implements the reasoning process
2. Onnxruntime installation and use (with some problems found in practice)
3. TensorRT_Test

1. Pytorch export onnx model

  1. Create a new export_onnx.py file, the full content is as follows. The ResNet50_wSoftmax custom model is based on the official original, adding a softmax operation.

  2. Add some necessary post-processing to the model and export it together. This has two advantages:
    1) You can directly get the end-to-end onnx/tensorrt model without doing post-processing outside.
    2) After that, we will convert the onnx model to tensorrt model, during the conversion process, tensorrt will perform some reasoning optimizations on our model for specific Nvidia GPUs , and we will merge the post-processing into the onnx model, which may enable some operator operations to be converted to tensorrt. get optimized.

# export_onnx.py
import torch
import torchvision.models as models
import cv2
import numpy as np

class ResNet50_wSoftmax(torch.nn.Module):
    # 将softmax后处理合并到模型中,一起导出为onnx
    def __init__(self):
        super().__init__()
        self.base_model = models.resnet50(pretrained=True)
        self.softmax = torch.nn.Softmax(dim=1)

    def forward(self, x):
        y = self.base_model(x)
        prob = self.softmax(y)
        return prob

def preprocessing(img):
    # 预处理:BGR->RGB、归一化/除均值减标准差
    IMAGENET_MEAN = [0.485, 0.456, 0.406]
    IMAGENET_STD = [0.229, 0.224, 0.225]
    img = img[:, :, ::-1]
    img = cv2.resize(img, (224, 224))
    img = img / 255.0
    img = (img - IMAGENET_MEAN) / IMAGENET_STD
    img = img.transpose(2, 0, 1).astype(np.float32)
    tensor_img = torch.from_numpy(img)[None] # 此处增加一维, 从[3,224,224] 到 [1, 3,224,224] 
    return tensor_img

if __name__ == '__main__':
    # model = models.resnet50(pretrained=True)
    image_path = 'test.jpg'
    img = cv2.imread(image_path)
    tensor_img = preprocessing(img)
    model = ResNet50_wSoftmax()   # 将后处理添加到模型中
    model.eval()
    pred = model(tensor_img)[0]
    max_idx = torch.argmax(pred)
    print(f"test_image: {image_path}, max_idx: {max_idx}, max_logit: {pred[max_idx].item()}")

    dummpy_input = torch.zeros(1, 3, 224, 224)  # onnx的导出需要指定一个输入,这里直接用上面的tenosr_img也可
    torch.onnx.export(
            model, dummpy_input, 'resnet50_wSoftmax.onnx',
            input_names=['image'],
            output_names=['predict'],
            opset_version=11,
            dynamic_axes={'image': {0: 'batch'}, 'predict': {0: 'batch'}}		# 注意这里指定batchsize是动态可变的
    )
  1. Execute the script:
python export_onnx.py

Note : Before the operation, you need to prepare a picture named test.jpg, just download one from the Internet, and you can see similar output results as follows:

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /home/cui/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [07:36<00:00, 225kB/s]
test_image: test.jpg, max_idx: 285, max_logit: 0.5382498502731323

2. onnxruntime reasoning test

we will just getresnet50_wSoftmax.onnx, use onnxruntime to perform reasoning test to check whether the results are the same.

Create a new one here calledonnxruntime_test.pyThe full content of the file is as follows. The preprocessing operation in export_onnx.py is reused in this file, but the output result of the original function is torch, which needs to beConvert to numpy type

# onnxruntime_test.py
import onnxruntime as ort  #若缺少,使用pip install onnxruntime-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple高速安装 
import numpy as np
import cv2
from export_onnx import preprocessing

image_path = 'test.jpg'
ort_session = ort.InferenceSession("resnet50_wSoftmax.onnx", providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']) # 创建一个推理session

img = cv2.imread(image_path)
input_img = preprocessing(img)[None]
input_img = input_img.numpy() ## 新添加

pred = ort_session.run(None, { 'image' : input_img } )[0][0]
max_idx = np.argmax(pred)
print(f"test_image: {image_path}, max_idx: {max_idx}, probability: {pred[max_idx]}")

implement:

python onnxruntime_test.py

The results are as follows, which are basically consistent with the reasoning of the pytorch model, proving that the conversion is no problem.

test_image: test.jpg, max_idx: 285, probability: 0.5382504463195801

NOTE :
1) This fileparameters in ort.InferenceSessionThere is a slight change from the reference, and the following error will occur if the script of the original website is directly executed.
Solution reference: [ValueError: This ORT build has 'TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionP

ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

2) RuntimeError: Input must be a list of dictionaries or a single numpy array for input 'image' will appear when running the reasoning directly in the original text. Solution
reference: RuntimeError: Input must be a list of dictionaries or a single numpy array for input 'image

3) The onnx model exported in the original text is int64 weights, TensorRT does not support 64 bits, but it does not affect subsequent use. Currently only 32-bit is supported, and a related type warning will be reported during the next conversion.

Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32

Reference: onnx-typecast digit conversion tool

3. Convert the onnx model to tensorrt model

1. Environment installation

CUDA10.2+Cudnn8.4.1+TensorRT7.1.3.4+Onnx-tensorrt7.1, detailed installation can refer to the article of environment configuration installation, here is not much to expand, for example:

Note : There must be correspondence between versions , otherwise inexplicable errors will occur. After all the installation is complete, use the following command to check the version.

nvcc -V    # 查看cuda版本,10.2
cat /usr/local/cuda-10.2/include/cudnn_version.h | grep CUDNN_MAJOR -A 2    # 查看cudnn8.4版本,与之前版本稍有不同,可参考第一个链接安装cudnn
find / -name NvInferVersion.h   # 查看tensorRT版本号

2. Folder creation

Create a new C++ file of build_model.cc, mainly including the header file, logger class, build_model function, etc.

  • Logger class: to print some errors or warnings in the process of building the tensorrt model. According to the specified severity (severity), to print information.
  • build_model function: use onnx to build, exportengine model(equivalent to the trt model, both are serialized binary files)

2.1 The complete file content is as follows:

// tensorrt相关
#include <NvInfer.h>
#include <NvInferRuntime.h>
#include <NvInferRuntimeCommon.h>  //新添加,否则找不到nvinfer1::AsciiChar

// onnx解析器相关
#include <NvOnnxParser.h>  // 与原文不同,onnx-tensorrt build后,sudo make install

// cuda_runtime相关
#include <cuda_runtime.h>

// 常用头文件
#include <math.h>
#include <stdio.h>
#include <unistd.h>
#include <chrono>
#include <fstream>
#include <functional>
#include <iostream>
#include <memory>
#include <string>
#include <vector>
#include <dirent.h>

// opencv
#include <opencv2/opencv.hpp>

inline const char* severity_string(nvinfer1::ILogger::Severity t) {
  switch (t) {
    case nvinfer1::ILogger::Severity::kINTERNAL_ERROR:
      return "internal_error";
    case nvinfer1::ILogger::Severity::kERROR:
      return "error";
    case nvinfer1::ILogger::Severity::kWARNING:
      return "warning";
    case nvinfer1::ILogger::Severity::kINFO:
      return "info";
    case nvinfer1::ILogger::Severity::kVERBOSE:
      return "verbose";
    default:
      return "unknown";
  }
}

class TRTLogger : public nvinfer1::ILogger {
 public:
  virtual void log(Severity severity, const char* msg) noexcept override {
    if (severity <= Severity::kWARNING) {
      if (severity == Severity::kWARNING)
        printf("\033[33m%s: %s\033[0m\n", severity_string(severity), msg);
      else if (severity == Severity::kERROR)
        printf("\031[33m%s: %s\033[0m\n", severity_string(severity), msg);
      else
        printf("%s: %s\n", severity_string(severity), msg);
    }
  }
};

bool build_model() {
  if (access("resnet50.engine", 0) == 0) {
    printf("resnet50.engine already exists.\n");
    return true;
  }

  TRTLogger logger;

  // 下面的builder, config, network是基本需要的组件
  // 形象的理解是你需要一个builder去build这个网络,网络自身有结构,这个结构可以有不同的配置
  nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger);
  // 创建一个构建配置,指定TensorRT应该如何优化模型,tensorRT生成的模型只能在特定配置下运行
  nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();
  // 创建网络定义,其中createNetworkV2(1)表示采用显性batch
  // size,新版tensorRT(>=7.0)时,不建议采用0非显性batch size
  nvinfer1::INetworkDefinition* network = builder->createNetworkV2(1);

  // onnx parser解析器来解析onnx模型
  auto parser = nvonnxparser::createParser(*network, logger);
  if (!parser->parseFromFile("../resnet50_wSoftmax.onnx", 1)) {
    printf("Failed to parse resnet50_wSoftmax.onnx.\n");
    return false;
  }

  // 设置工作区大小
  printf("Workspace Size = %.2f MB\n", (1 << 28) / 1024.0f / 1024.0f);
  config->setMaxWorkspaceSize(1 << 28);

  // 需要通过profile来使得batchsize时动态可变的,这与我们之前导出onnx指定的动态batchsize是对应的
  int maxBatchSize = 10;
  auto profile = builder->createOptimizationProfile();
  auto input_tensor = network->getInput(0);
  auto input_dims = input_tensor->getDimensions();

  // 设置batchsize的最大/最小/最优值
  input_dims.d[0] = 1;
  profile->setDimensions(input_tensor->getName(),
                         nvinfer1::OptProfileSelector::kMIN, input_dims);
  profile->setDimensions(input_tensor->getName(),
                         nvinfer1::OptProfileSelector::kOPT, input_dims);

  input_dims.d[0] = maxBatchSize;
  profile->setDimensions(input_tensor->getName(),
                         nvinfer1::OptProfileSelector::kMAX, input_dims);
  config->addOptimizationProfile(profile);

  // 开始构建tensorrt模型engine
  nvinfer1::ICudaEngine* engine =
      builder->buildEngineWithConfig(*network, *config);

  if (engine == nullptr) {
    printf("Build engine failed.\n");
    return false;
  }

  // 将构建好的tensorrt模型engine反序列化(保存成文件)
  nvinfer1::IHostMemory* model_data = engine->serialize();
  FILE* f = fopen("resnet50.engine", "wb");
  fwrite(model_data->data(), 1, model_data->size(), f);
  fclose(f);

  // 逆序destory掉指针
  model_data->destroy();
  engine->destroy();
  network->destroy();
  config->destroy();
  builder->destroy();

  printf("Build Done.\n");
  return true;
}

int main() {
  if (!build_model()) {
    printf("Couldn't build engine!\n");
  }
  return 0;
}

2.2 Cmakelists.txt file
Complete project reference: TensorRT_Test

2.3 Compile and run

mkdir build && cd build
cmake ..
make -j8
./build_model

2.4 Results
The following situation occurred, and generatedrsenet50.enginefile, the conversion is successful.

warning: [TRT]/home/cui/workspace/tools/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Workspace Size = 256.00 MB
Build Done.

2.5 Possible problems
Q1 : error: 'nvinfer1::AsciiChar' has not been declared
Solve: tensorRT version problem, change it to const char*.

Q2 : error: Could not open file resnet50_wSoftmax.onnx
Solution: Modify line 78 in the build_model.cc file, the corresponding path of the resnet50_wSoftmax.onnx model file


4. TensorRT model reasoning test

Purpose : To verify whether the reasoning results of the previous Pytorch model and onnx model are consistent.

  1. For complete file content, please refer to the projectmodel.infer.ccDocument, with slight changes from the original text. increasedcheckRuntimeaccomplish:
#ifndef checkRuntime
#define checkRuntime(callstr)\
    {\
        cudaError_t error_code = callstr;\
        if (error_code != cudaSuccess) {\
            std::cerr << "CUDA error " << error_code << " at " << __FILE__ << ":" << __LINE__ << std::endl;\
            assert(0);\
        }\
    }
#endif  // checkRuntime

Project address: TensorRT_Test

  1. Compile and execute
mkdir build && cd build
cmake ..
make -j8
./model_infer
  1. result
test_image: ../test.jpg, max_idx: 285, probability: 0.538250(learning3d)
  1. Problems that may be encountered
    Q1 : error while loading shared libraries: libcudnn.so.8: cannot open shared object file: No such file or directory
    Solution: The cuda installation environment was not found, and you can enter a certain environment configured by conda, in Take the appropriate action. Or refer to the CUDA installation environment and configure it in the environment variable.

Guess you like

Origin blog.csdn.net/weixin_36354875/article/details/125596653