Summary of Tencent cloud server deployment onnxruntime-gpu experience

foreword

Some projects need to use onnxruntime-gpu for reasoning. I thought that onnxruntime-gpu can be installed directly in the case of cuda like windows, but I didn’t expect it to be so troublesome, so I share this article to help the latecomers.

environment

GPU computing NVIDIA v100 cloud server.
When choosing to install the system, the highest version has been selected as follows:
Ubuntu20.04
cuda11.0.3
cudnn8.5.0
Use the command line to view nvidia-smi:
insert image description here
you can see that cuda has indeed been installed.

Error log

1.[W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
It is a version incompatibility problem caused by directly using the following command to install:

pip install onnxruntime-gpu

The specific version correspondence is shown in the following table:
insert image description here
because cuda is 11.0.3, it can only be installed with onnxruntime-gpu==1.8 or 1.7.

2.onnxruntime OSError: libcurand.so.10: cannot open shared object file: No such file or directory

This is because the location where Tencent Cloud cuda is installed is very strange. It cannot be found by default as an environment variable, so it needs to be added manually. Let me demonstrate how to add it manually.

solve

1. If you keep the default cuda11.0.3, then run the following installation command:

pip install onnxruntime-gpu==1.8.0
或者
pip install onnxruntime-gpu==1.7.0

2. Find the cuda file directory:

find / -name "libcudart.so.11.0"

-name is followed by the file name, replace it as needed, check what is missing, I show them in the following directory:
insert image description here
3. Configure environment variables:

first:

vi /etc/profile

Then press i to switch the input mode, and add multiple lines of environment variables at the end of the text:

export LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/nvidia/cublas/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/python3.8/dist-packages/nvidia/cuda_nvrtc/lib:$LD_LIBRARY_PATH

Please modify the location of the directory according to your own. It is the default on my side. After the modification, it looks like this:

insert image description here
Press esc to exit input mode. Press : Enter wq to save and exit.

Then take effect:

source /etc/profile

Enter the following command to confirm that the environment variable takes effect:

echo $LD_LIBRARY_PATH

If it is correct, then run the framework to create a cuda operator, and there will be no error message.

Guess you like

Origin blog.csdn.net/weixin_43945848/article/details/129194357