Onnxruntime installation and use (with some problems found in practice)

Some basic reference links about onnxruntime:

  1. onnxruntimeofficial documentation
  2. Convert pytorch model to onnx model and use onnxruntime for reasoning (Pytorch official documentation)

One, onnxruntime installation

(1) use CPU

If you only use CPU for inference, install it with the following command. [If you want to use GPU inference, do not run the following command]

pip install onnxruntime

(2) use GPU

The installation command is:

pip install onnxruntime-gpu

Notes on installing onnxruntime-gpu:

  • onnxruntime-gpu contains most of the functions of onnxruntime. If onnruntime is installed, uninstall onnruntime.
  • Be sure to pay attention to the compatibility with CUDA and cuDNN versions during installation. For the specific adaptation list, please refer to: CUDA Execution Provider

After installation, verify whether onnxruntime uses GPU:

>>> import onnxruntime
>>> onnxruntime.get_device()
'GPU'  #表示GPU可用
>>> onnxruntime.get_available_providers()
['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

If the GPU is not available, you can add the following two lines ~/.bashrcin :

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

2. Use onnxruntime reasoning

An example of the inference process is as follows:

import onnxruntime
import numpy as np

device_name = 'cuda:0' # or 'cpu'
print(onnxruntime.get_available)

if device_name == 'cpu':
    providers = ['CPUExecutionProvider']
elif device_name == 'cuda:0':
    providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
# Create inference session
onnx_model = onnxruntime.InferenceSession('slowfast.onnx', providers=providers)
# Create the input(这里的输入对应slowfast的输入)
data = np.random.rand(1, 1, 3, 32, 256, 256).astype(np.float32)
# Inference
onnx_input = {
    
    onnx_model.get_inputs()[0].name: data}
outputs = onnx_model.run(None, onnx_input)

3. Comparison of onnxruntime and Pytorch reasoning time

Here is an article from other bloggers: Comparison of ONNXRuntime and PyTorch runtime

Some problems encountered in the course of personal practice:

Recently, I am developing a behavior recognition function, and the model used is slowfast. After completing the basic development, I want to use onnnruntime to improve the reasoning performance of the model. After exporting the onnx model, I use torch and onnxruntime to perform reasoning tests (the graphics card is an RTX3090) . The inference time of torch is almost equal; (2) in the case of using GPU, the inference speed of torch is increased by about 10 times, but the inference speed of onnxruntime does not increase but decreases, which is nearly half slower.

Cause Analysis:

  1. After using the GPU, the onnxruntime reasoning speed does not increase but decreases. I found a related explanation on the Internet:

    [From: https://github.com/PaddlePaddle/PaddleOCR/issues/5445]

    "This is related to ONNX's execution strategy. Since there are a large number of shape and constant operations in the model, these op calculations must be executed on the CPU in onnx. In order to avoid data copying, onnx puts the entire operation before and after the network structure in the The cpu is used, which leads to a very slow prediction speed of the recognition model."
    "It needs to be modified by onnx, and this is the only way to do it for the time being."

    The above explanations are not verified and are for informational purposes only. You are also welcome to add corrections.

  2. Compared with torch, onnxruntime does not improve the reasoning performance.
    Based on the previous explanation, I guess that the current onnxruntime cannot accelerate all models, or most of them only achieve reasoning acceleration under the CPU. The specific acceleration is also related to the model. At present, I only tested the slowfast model, and I will continue to test other models in the future to verify whether this speculation is correct. (Of course, experienced bosses are welcome to tell me the answer directly)
    (To be continued...)


2022.3.18 update

Later, I continued to learn TensorRT and tried to convert the onnx model to TensorRT for inference acceleration. During the learning process, it was found that TensorTR will perform an inference before testing the inference speed, and this process is called Warming up . The reason is probably that the model needs to be loaded from the cache during the first inference. Looking back, I also added the Warming up process to the Pytorch and onnx tests, and then tested and compared the inference speed of the two. I found that the inference speed of onnx under GPU is slightly higher than that of Pytorch, and that using TensorRT is compared to the former two. The speed of model inference has been greatly improved.
If you are interested, you can refer to my blog: Using TensorRT to accelerate Pytorch model reasoning

Guess you like

Origin blog.csdn.net/qq_43673118/article/details/123281548