Switch CPU/GPU during onnxruntime inference and modify onnx input and output to be dynamic

foreword

As an intermediate model, the onnx model has an acceleration effect compared with pytorch's direct reasoning, and the reasoning code is simple, and does not need to load various networks. Recently, due to insufficient video memory in some projects, the CPU/GPU is switched during onnxruntime inference, so that some models can be inferred on the CPU and some models can be inferred on the GPU. I checked some other articles and found that many people said that onnxruntime reasoning cannot specify GPU and CPU like pytorch, and can only unload one GPU for CPU, and unload CPU for GPU. Personally, I feel that this should not be the case. Click on the source code to see that there are parameters for configuring CPU and GPU, and it is very simple. Here, I will record some of the pits I have stepped on, and share them with those in need.

onnxruntime CPU/GPU switching

After clicking into the source code, I saw options such as CUDAExecutionProvider and CPUExecutionProvider. When I started using a low version of onnxruntime-gpu, I found that "CPUExecutionProvider" still used GPU resources. Later, I saw a warning to build an inference engine after version 1.10. When you need the providers parameter to select CPU/GPU, after installing onnxruntime-gpu==1.10.0, you can use the CPU under the GPU version of onnxruntime.

import onnxruntime as rt
sess = rt.InferenceSession(MODEL_PATH,providers=['CPUExecutionProvider'])# providers=['CPUExecutionProvider','CUDAExecutionProvider']

Note: During the dynamic size inference of the high version ONNX, if the shape output by the inference does not match the output shape of your model file itself, a warning will be issued. At this time, the inference speed will be reduced. At this time, just change the model itself to set the output shape.

Modify the input/output size of onnx to be dynamic

In fact, when converting the onnx model, the input/output should be set to static or dynamic, but if there is no original model and only one onnx model, you can use the method given below to quickly convert the model to dynamic input and output. Just change a dimension of the corresponding input and output that needs to be dynamic to "?".

import onnx
file_path = './my.onnx'
model = onnx.load(file_path)
#model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'
model.graph.output[0].type.tensor_type.shape.dim[2].dim_param = '?'
model.graph.output[0].type.tensor_type.shape.dim[3].dim_param = '?'
onnx.save(model, './my_dynamic.onnx')

Follow up:

Every time I have to challenge this posting assistant, it looks a bit funny, it’s just that I don’t have enough words, why is CSDN’s detection so boring, I have to make the weight of words so high, it’s hard to see Speechless, let me see how many words can pass the test, and see how many people in the comment area are complaining about this rotten detection system, and the product manager pretends not to see it.

 

Guess you like

Origin blog.csdn.net/qq_36276587/article/details/126666203