Linux (CentOS) install torch2.0.0 +tensorflow2.12.0 +NVIDIA530.30.02 +CUDA12.1.1 +cuDNN8.9.0 +TensorRT8.6.0

Linux (CentOS) install NVIDIA530.30.02+CUDA12.1.1+cuDNN8.9.0+torch2.0.0+tensorflow2.12.0

NVIDIA

wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/530.30.02/NVIDIA-Linux-x86_64-530.30.02.run

If wget cannot connect, you can directly open the URL to download, and then transfer it to the server.

chmod +x NVIDIA-Linux-x86_64-530.30.02.run

Run after granting permissions

sh ./NVIDIA-Linux-x86_64-530.30.02.run -s

Verify after installing the driver

# nvidia-smi
Wed Apr 26 16:12:43 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti      Off| 00000000:01:00.0 Off |                  N/A |
| 20%   39C    P0               56W / 250W|      0MiB / 11264MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

CUDA

Insert image description here

wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run

If wget cannot connect, you can directly open the URL to download, and then transfer it to the server.

sudo sh cuda_12.1.1_530.30.02_linux.run

After running, the following 2 steps need to be adjusted in the page that appears.

1.输入accept
2. - [×] Driver 取消×

Start verification

# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

cuDNN

Insert image description here
After downloading https://developer.nvidia.com/rdp/cudnn-download
, decompress it on the server

tar -zxvf cudnn-linux-x86_64-8.9.0.131_cuda12-archive.tar.xz

Execute the following commands one by one to install cudnn

sudo cp cudnn-linux-x86_64-8.9.0.131_cuda12-archive/include/cudnn.h /usr/local/cuda/include/
sudo cp cudnn-linux-x86_64-8.9.0.131_cuda12-archive/lib/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Since the cuda file is too large, soft connections can be used appropriately.

torch

Insert image description here

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

tensorflow

Insert image description here

The installation procedure given on the official website is as follows

conda install -c conda-forge cudatoolkit=11.8.0
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# Verify install:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

After the last line of code is completed, it will be displayed

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

TensorRT

Preconditions

pip install cuda-python
python3 -m pip install --upgrade tensorrt

Insert image description here

Download https://developer.nvidia.com/nvidia-tensorrt-8x-download and unzip it

tar -xzvf TensorRT-8.6.0.12.Linux.x86_64-gnu.cuda-12.0.tar.gz

Add the absolute path of TensorRT's lib directory to the environment variable LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-8.6.0.12/lib

Install Python TensorRT wheel files

python3 -m pip install tensorrt-8.6.0-cp310-none-linux_x86_64.whl

(Optional) Install TensorRT lean and dispatch runtime wheel files:

python3 -m pip install tensorrt_lean-8.6.0-cp310-none-linux_x86_64.whl
python3 -m pip install tensorrt_dispatch-8.6.0-cp310-none-linux_x86_64.whl

(Optional) Install the graphsurgeon wheel files:

cd TensorRT-8.6.0.12/graphsurgeon
python3 -m pip install graphsurgeon-0.4.6-py2.py3-none-any.whl

(Optional) Install the onnx-graphsurgeon wheel file:

cd TensorRT-8.6.0.12/onnx_graphsurgeon
python3 -m pip install onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl

Final test

See the code I wrote before . It will be displayed after running successfully.

torch.__version__	 2.0.0+cu118
torch.version.cuda	 11.8
torch.cuda.is_available	 True
torch.cuda.get_device_name	NVIDIA GeForce GTX 1080 Ti
torch.cuda.device_count	1
-------------------------------------------------------------
tf.__version__	 2.12.0
tf.config.list_physical_devices	 True
tf.test.is_built_with_cuda	 True
  • If an alarm occurs
    I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
    You can enter the following command to solve the problem
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

Guess you like

Origin blog.csdn.net/weixin_46398647/article/details/130387759