It is enough to read this article - cuda cudnn cudatookit and pytorch in the ubuntu system

1. Basic concepts

1.1 nvidia discrete graphics card

Discrete graphics card refers to a graphics card that exists in the form of an independent board card and can be freely plugged and unplugged on a motherboard with a graphics card interface. Discrete graphics card has independent video memory, does not occupy system memory, and is technically ahead of integrated graphics card, which can provide better display effect and running performance. As an important part of the computer host, the graphics card is very important for those who like to play games and engage in professional graphic design. In the past, the graphics chip suppliers for civilian graphics cards mainly included ATI and NVIDIA.

Ubuntu needs to install the nvidia driver to use nvidia. Installing the nvidia driver can allow the system to correctly identify the nvidia graphics card, perform 2D/3D rendering, and play the proper performance of the graphics card.

1.2 CUDA

        CUDA (Compute Unified Device Architecture) is a computing platform launched by graphics card manufacturer NVIDIA . CUDA™ is a general-purpose parallel computing architecture introduced by NVIDIA that enables GPUs to solve complex computing problems.

A computer can have two CUDA APIs, one is driver CUDA (for display screen), the other is runtime CUDA (accelerated deep learning)

After installing the nvidia driver, enter nvidia-smi in the terminal and the following interface will appear, in which the CUDA Version is 11.4, and the CUDA here is dirverCUDA. That is to say, there is CUDA in the computer now, but it is only used for the display screen, so it cannot be accelerated by deep learning. Then, if you want to accelerate deep learning, you need to install runtime CUDA. There are two ways, one is to install in the conda environment, and the other is to install cudatoolkit, which will be explained in detail later.

1.3 CUDA Toolkit (nvidia)

  CUDA Toolkit (nvidia) is  a complete tool installation package for CUDA, which provides options for installation such as Nvidia drivers and development toolkits related to developing CUDA programs. Including CUDA program compiler, IDE, debugger, etc., various library files corresponding to CUDA programs and their header files. Simply put, CUDA Toolkit includes dirver CUDA and runtimeCUDA.

Therefore, installing CUDA on ubuntu in a general sense is actually installing CUDA Toolkit, but generally we will first install the nvidia graphics card driver (with dirverCUDA), so when installing CUDA Toolkit, the X in front of the Driver will be removed, that is, it will not be installed, because There is already dirver CUDA in the computer.

As can be seen from the above figure, a complete cuda toolkit includes

  1. Driver: Graphics card driver

  1. Toolkit : Provides some tools like profiler, debuggers and scientific libraries and utility libraries

  • cudart : CUDA Runtime

  • cudadevrt: CUDA device runtime

  • cupti: CUDA profiling tools interface ion

  • nvml: NVIDIA management library

  • nvrtc: CUDA runtime compilation

  • cublas : BLAS (Basic Linear Algebra Subprograms, basic linear algebra assembly)

  • cublas_device: BLAS kernel interface

  1. CUDA Samples : Code samples demonstrating how to use various CUDA and library APIs.

  1. CUDA documented

1.4 CUDA Toolkit (Pytorch)

We often use the following instructions to install cudatoolkit in the pytorch environment, here we call it CUDA Toolkit (pytorch)

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple

Torchvision is a computer vision-related package for PyTorch that provides useful tools and functions to help you easily load and preprocess image and video datasets, perform data augmentation, build and train deep neural network models, and Perform model evaluation, visualization, and more. Main functions include

Torchaudio is an audio signal processing-related package of PyTorch, which provides some useful tools and functions to help you easily load and preprocess audio datasets, perform sound feature extraction, build and train deep neural network models, and Perform model evaluation, visualization, and more. Main functions include

This is an incomplete tool installation package for CUDA, which mainly includes the dynamic link library that is dependent on CUDA-related functions. Drivers will not be installed. Personal understanding is that only runtime cuda is installed. This installation method can install different cudatoolkit versions in multiple conda environments to apply deep learning code.

1.5 inches

cuDNN is a CUDA-based deep learning GPU acceleration library. With it, deep learning calculations can be completed on the GPU. When cudatoolkit (pytorch) is installed, cuDNN will be installed automatically. When installing cudatoolkit (nvidia), you need to install cuDNN yourself

1.6 pytorch

Pytorch is a CUDA-based deep learning framework , so the version of pytorch must depend on the version of cuda toolkit. Since we often use other people's code, we often feel that we choose the CUDA version according to the pytorch version.

2. How to choose this pile

2.1 nvcc -V与nvidia-smi

Many people will have this doubt, why the CUDA version numbers output by nvcc -V and nvidia-smi are different, I believe it will be clear after reading the basic concepts above, because one shows runtime CUDA (for calculation) and the other is driver CUDA (for display), and we always install the two separately in two separate installs.

Can that make them display the same version? The answer is yes, you only need to install the full version of CUDAtookit (nvidia) for a one-time installation, that is, add the X in front of the Driver when installing the CUDA Toolkit. If you have installed the nvidia driver before this, you need to uninstall it, otherwise the installation will fail and fail. The reason is two driver CUDA conflicts.

What is the relationship between the two versions? Usually, if the graphics card driver is installed first, the version displayed by nvidia-smi is higher than that of nvcc-V. When selecting runtime CUDA, you need to check the model of the graphics card and the required version of pytorch to determine. There is no particularly strict restriction on one version. That is to say, each environment of conda on a computer can have different versions of cuda.

2.2 Accelerating only deep learning

If your purpose is to use CUDA to accelerate deep learning only, deep learning often requires different cuda versions for different codes. It is inconvenient to install only one version on the computer with cudatookit (nvidia), so for deep learning, the installation strategy is:

nvidia graphics card driver + cudatoolkit (pytorch)

You don’t need to install cudatoolkit (nvidia) on your computer to accelerate deep learning. If you need a new version of cuda acceleration, you can create a conda environment for installation.

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
或者
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple
#使用 Conda 安装 PyTorch 更为简单和方便,而使用 pip 则更加灵活和自由。

2.3 Use all

If you want to use it not only in pytorch but also in C++, you need to install cudatoolkit (navidia). The general steps are 1. Install nvidia driver 2. Install cudatoolkit (remove driver) 3. Conda install cudatoolkit (pytorch)

This can be understood as having one driverCUDA and multiple runtimeCUDAs in your ubuntu system, then an error will be reported when running the code in the pytorch environment, and it is easy to understand that there are multiple runtimeCUDA conflicts and errors are reported.

OSError: /home/cxl/anaconda3/envs/yolo/lib/python3.8/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Then the easiest way is to comment out the cuda environment configuration in bashrc, and open a new terminal to run it again.

1.打开bashrc
sudo gedit ~/.bashrc
2.注释cuda环境配置
#export PATH=$PATH:/usr/local/cuda-11.3/bin
#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.3/lib64
#export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-11.3/lib64
3.source ~/.bashrc

When running other codes, restore the cuda environment configuration in bashrc. We can understand that cuda in bashrc is global cuda, and cuda in conda environment is local cuda.

3. Installation

3.1 Graphics driver

The installation of the graphics card driver has been written clearly in my previous blog

https://blog.csdn.net/HUASHUDEYANJING/article/details/128838393?spm=1001.2014.3001.5502

3.2 cudatoolkit(nvidia)

1. Select cudatoolkit from nvidia official website

https://developer.nvidia.com/cuda-toolkit-archive

The version selection is generally lower than the version in nvidia-smi, select the installation package of runfile

2. Install cuda

First install some dependencies

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

install cuda

wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run
# 若出现CUDA段错误(核心已转储)通常由于栈溢出导致
ulimit -a
# 发现stack size为8192
# 修改栈限制为无限
# 然后从-c错误处继续下载
wget -c https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run
--2023-02-21 20:21:22--  https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run
ulimit -s unlimited
sudo sh cuda_11.4.4_470.82.01_linux.run
#若出现Ensure there is enough space in /tmp and that the installation package is not corrupt
#自己建一个tmp文件
sudo sh cuda_10.0.130_410.48_linux.run --tmpdir=[YOUR TMP DIR]
Remove the X from the driver in this step

3. configure bashrc

1.打开bashrc
sudo gedit ~/.bashrc
2.添加cuda环境配置
export PATH=$PATH:/usr/local/cuda-11.3/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.3/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-11.3/lib64
3.source ~/.bashrc

3.3 cudatoolkit(pytorch)

Install directly using the command

conda create -n torch
conda activate torch
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
#或者用pip一键安装
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple

Guess you like

Origin blog.csdn.net/HUASHUDEYANJING/article/details/128868737