Ubuntu18 and 22 install NVIDIA driver, CUDA, CUDNN, Pytorch

foreword

There are many tutorials for installing pytorch on the Internet. I recorded and shared the process and experience of installing NVIDIA drivers, CUDA, CUDNN, and Pytorch on my two laptops.

First of all, to install pytorch-gpu, you need to complete nvidia driver installation, CUDA installation, CUDNN installation, and torch library installation. The CUDA and CUDNN versions must correspond. Different versions of torch have requirements for the CUDA version. The nvidia driver determines how high you can install. version of CUDA, so the versions of these things cannot be installed randomly,

My installed version is:

Lenovo GTX1050 notebook:
Ubuntu18 + driver 470 + CUDA 10.0 + CUDNN for 10.0 + torch 1.0.0 + python 3.6
Ubuntu18 + driver 470 + CUDA 10.2 + CUDNN for 10.2 + torch 1.8.0 + python 3.7

Machine leather GTX4050 notebook:
Ubuntu22 + driver 525 + CUDA 11.8 + CUDNN for 11.8 + torch 2.0.0 + python 3.8
Ubuntu22 + driver 525 + CUDA 11.8 + CUDNN for 11.8 + torch 1.12.0 + python 3.8

1. How to choose a version

Constraint 1: Start with the torch version first, and first determine the torch version you want to install . Each version of pytorch has its own supported CUDA version. This constraint is not necessarily mandatory. Higher versions of CUDA can also be installed normally, such as my CUDA11.8 You can also use the following CUDA11.6 installation command to install torch1.12.1 normally

image-20230315102237480

Constraint 2: Look at the version of your nvidia driver that supports the installation of CUDA at the highest: (you need to install the NVIDIA driver first, and we will talk about how to install the driver later)

image-20230315102428185

Constraint 3: The computing power of CUDA to GPU also needs to match, otherwise torch will report the following error:

# 这里sm_89的意思就是4050的算力是89,该CUDA版本不支持此算力
NVIDIA GeForce RTX 4050 Laptop GPU with CUDA capability sm_89 is not compatible with the current PyTorch installation

First check the computing power of your computer GPU

image-20230316124259977

image-20230316124339060

Then look at the matching relationship between CUDA and computing power (in fact, just Baidu your GPU model + CUDA, and see what version of CUDA other people have installed)

Finally, find the CUDA version you want according to these three constraints, and then install CUDNN and torch according to the CUDA version. To add, when you find that the three constraints conflict, you should give priority to satisfying constraints two and three. For example, the GPU is 4050, query constraints two and three, and find that the minimum CUDA version is 11.8, and I want to install torch1.8, pip installation and If you do not have CUDA11.8, you can directly satisfy constraints 2 and 3 and install CUDA11.8. Then, when installing torch1.8 later, use the installation command closest to CUDA version and 11.8.

2. Install the NVIDIA driver

My 1050 graphics card is installed with nvidia-driver-470, and the 4050 graphics card is installed with nvidia-driver-525

If the Ubuntu version is lower than 18, you must first upgrade the gcc version, refer to

Method 1: Simple but very slow download speed

#1.先把之前的nvidia驱动卸载干净:
sudo apt-get remove --purge nvidia*
#2.添加并更新源
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
#3.查看适配的驱动版本
sudo ubuntu-drivers devices
#4.下载你想下载的nvidia驱动版本:
sudo apt install nvidia-driver-470
#5.重启电脑(这一步很重要,不重启没有效果):
reboot
#6.重启后,输入命令查看nvidia驱动是否安装好了
nvidia-smi

To explain, the fourth step above can also be installed through the installation 软件和更新in the middle 附加驱动, but the download speed cannot be seen in this way. The first
insert image description here
method is relatively simple, but the download speed is really slow, and it takes more than half an hour for things with tens of M

Method 2: You may encounter more problems, but if it goes well, the installation will be very fast

The method is to directly download the driver from the official website of NVIDA, and then install it. However, various problems are prone to occur during the process. I recommend you to try it first. In case your computer is the chosen one, you can do it in one step

Installation reference
Note, when installing the run file of the driver, do not add the command line, -no-opengl-filesotherwise you will find that the graphics after installation will not display the graphics card, and the installation is not completely successful.

That is, when installing the run driver file:

# 不要添加-no-opengl-files命令
sudo ./NVIDIA-Linux.run -no-x-check

image-20230319111219012

问题:Unable to load the “nvidia-drm” kernel module

Solution: I installed both solutions successfully

3. Install CUDA

CUDA version download

CUDA 10.0 installation method, refer to

The method of CUDA 10.2 installation, reference , the installation method of 10.2 and above is similar

CUDA uninstall, refer to

Note that here you only need to install the sh script file according to the tutorial, and we will talk about the environment configuration later

4. Install CUDNN

Download various versions of CUDNN

Note that CUDNN should correspond to the CUDA version:

image-20230315100044615

For the installation method of versions below CUDNN8.0, refer to

For the installation method of CUDNN8.0 and above, refer to

5. Environment configuration

After installing CUDA and CUDNN, you can configure the environment:

sudo gedit ~/.bashrc
# 添加以下内容
# 支持多个CUDA版本切换,要改版本就修改/usr/local/cuda的软连接就行了
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda

Related commands

查看cuda版本 : nvcc -V
查看位置 : which nvcc
查看NVIDIA动态使用情况: watch -n 1 nvidia-smi
cuda 版本 : cat /usr/local/cuda/version.txt
cudnn 版本 : cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
NVIDIA 驱动版本 : cat /proc/driver/nvidia/version

6.pytorch installation

pytorch offline version download

pytorch online version download

# 发现下载torch用豆瓣的源比清华源更快
pip install --index-url https://pypi.douban.com/simple 要安装的包

If there is a problem that the pip installation torch cannot be found, it may be that the Python version is low, you can create a conda container installation with a higher Python version; or there is a problem with the pip source, you can change the source.

I installed it on a 4050 notebook, because the CUDA version of the 40 series graphics card must be 1.18.0 and above, but the CUDA1.18.0 version supported by pytorch is currently only pytorch2.0.0 version is too new, it seems that high version CUDA can also install low version CUDA The supported pytorch version, but if the CUDA version is too different, an error will be reported in torch because CUDA computing power is not supported. I tried to install pytorch1.12.0 that supports CUDA1.16.0, and it seems to be ok after testing:

机革GTX4050笔记本:Ubuntu22 + 驱动525 + CUDA 11.8 + CUDNN for 11.8 + torch 1.12.0 + python 3.8

7. Check whether the torch gpu is available

import torch
# 光靠is_available还不足以确定是否安装成功
torch.cuda.is_available()
a=torch.Tensor([1,2])
a=a.cuda()

8. The torch completion in pycharm is incomplete

reference

Guess you like

Origin blog.csdn.net/caiqidong321/article/details/129600719