Detailed illustration of cuda installation and uninstallation (Ubuntu, Debian)

CUDA requires an Nvidia graphics card or computing card. AMD or Intel graphics cards are not acceptable (but there are standards for them).
Even a flash card can be used. For example, it is
recommended to use Ubuntu for GT710, because CUDA is developed on this platform. Of course, other Linux systems can also perform
the following operations. They have been performed on Ubuntu server 2204, debian12, and debian11. If you have not installed a Linux system, you can refer to the
Ubuntu server installation diagram
and Debian installation diagram .


Notice! Please install the corresponding version of cuda as needed! The main ideas for installing different versions are similar
: install N card (hardware), install cuda dependencies (mainly c compiler), install N card driver, install nvcc, install cuda. ​​In addition, you may also need pytorch
and tf. Before selecting the version Be sure to select according to your own needs before installing. Some components also have requirements for the operating system. In order to reduce duplication of work, first map all required component versions and then install them one by one.
Official documentation is always the best: cuda official installation documentation

1. Check the hardware and software environment and delete Nouveau

不要省略这一步,检查环境确定符合基本需求

1. Make sure the system recognizes the N card

lspci | grep -i nvidia

Information similar to the following is displayed (the pictures below are rtx3090 24G and rtx4090 24G respectively):
Insert image description hereInsert image description here

2. Check the gcc compiler

gcc --version

If normal, the version will be displayed, similar to the following information
Insert image description here

If not, it is recommended to install a large collection of packages of this c, once and for all

apt-get install build-essential

3. Check whether the relevant supporting programs are installed

apt-get install linux-headers-$(uname -r)

4. Delete Nouveau

(This step is not necessary. According to the actual situation, uninstall if prompted to uninstall.)
Linux installs the open source driver of the N card by default, namely Nouveau.

检查Nouveau工作状态

lsmod | grep nouveau

If a lot of information comes out, it means that the driver is still there. Uninstall it and
edit a new file. The name does not have to be this one, other names will work.

vi /etc/modprobe.d/nouveau.conf

The content is as follows

blacklist rivafb
blacklist vga16fb
blacklist nouveau
blacklist nvidiafb
blacklist rivatv
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

Apply to kernel

update-initramfs -u

After completion , restart the computer , and then check again.
If there is no information, it will still be displayed if the computer is not restarted.

lsmod | grep nouveau

2. Use cuda Toolkit to install

It is recommended to use, so that the Family Bucket is installed: N card driver + cuda + nvcc
Note: This method does not require installing the driver first, and it also eliminates the need to find compatibility issues.
The driver version required by cuda is the lowest version, that is It is said that you can use the latest driver with an early cuda version.
Official address: cuda toolkit
prompts again: Select the version according to your needs. For example, if you want to use pytorch, tensorflow, etc., whichever needs to be used, the installation method of different versions is the same.
Insert image description here
Remember For the previous link, don't click the Versioned... link at the back, that is a detailed document in English, which looks troublesome.

If you choose this way, the installation command will appear below. Just copy it and use it. The
12.1 version is installed here. You can choose the version you need at the official address above. The methods are similar.
Different systems can also see the corresponding installation methods here. , and then copy the following commands in one by one.
Insert image description here
Insert image description here
The following is a step-by-step operation according to the commands:

(1) Refer to this for ubuntu system

1. First switch to the program download directory

mkdir /usr/local/my_cuda && cd /usr/local/my_cuda

2. Installation operation

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin

mobile profile

mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

Download the installation package

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb

Install

dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb

install key

cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/

renew

apt-get update

Install cuda, this step takes a long time, wait patiently

apt-get -y install cuda

Restart the computer after the installation is complete, otherwise various problems may occur

(2) Refer to this for debian system

Choose the local installation method

1. Enter the operating directory

cd /usr/local

2. Download the key and install it into the system

wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
add-apt-repository contrib

If the above command prompts an error, use the following command

apt-get install software-properties-common

3.Installation

时间较长,耐心等待

apt-get update
apt-get -y install cuda

Restart the computer after the installation is complete, otherwise various problems may occur

3. Test

The cuda version is subject to the one displayed by nvcc. If the N card driver is newer, the version displayed by nvidia-smi will be the new cuda version, and the actual call is through nvcc

1. Test nvcc (cuda compiler)

nvcc -V

The normal display is as follows (if there is an error, please refer to the fourth section to deal with the problem, there is a solution):
Insert image description here

2. Test nvidia-smi

nvidia-smi

If there is a problem in both steps, Section 4 will deal with it.

4. Problem handling

1.nvcc shows no

Find nvcc

find / -name "nvcc"

For example, the following directory appears
Insert image description here

vi ~/.bashrc

Add the last part (if you are not installing version 12.1, you need to change the version in the directory)

export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64
export PATH=$PATH:/usr/local/cuda-12.1/bin

After saving, refresh the environment variables

source ~/.bashrc

Use the command again (note V is capitalized)

nvcc -V

Isn’t it very kind to see the following?
Insert image description here

2.nvidia-smi error

据说重启解决80%问题
For example, as shown in the figure below, if there is an error, just restart, because it has been installed above, and many problems can be solved by restarting.
Insert image description hereOr if the hardware cannot be found, restart (there is also a possibility that the graphics card is not plugged in properly!)
Insert image description here

Use nvidia-smi again to see information similar to the following (the upper left corner is the N card driver version, the upper right corner is the cuda version)
The cuda version is subject to nvcc
Insert image description here

5. cuda uninstallation

If you need to change to a different version, it is recommended that multiple versions coexist, which will not be covered here. If you want to completely uninstall, follow the following operations
权限不够前面加sudo,我这里用root进行安装

1. Prepare to delete cuda

apt-get remove cuda

2. Automatically uninstall

apt autoremove 

3. Delete other cuda

apt autoremove cuda*

4. Delete the downloaded installation package (you can also not delete it)

rm /usr/local/my_cuda/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb

5. Find package related

dpkg -l |grep cuda

Similar to some packages as shown below, manually delete the relevant packages. Otherwise, installing other versions will fail
Insert image description here. Fill in the name above and delete it below.

dpkg -P cuda-repo-ubuntu2204-12-1-local cuda-toolkit-12-1-config-common cuda-toolkit-12-config-common cuda-toolkit-config-common cuda-visual-tools-12-1

6. Supplementary instructions

1. Upgrade graphics card

If you change the graphics card, you usually don't need to reinstall it. If it doesn't work, just reinstall it.

2. Limit power consumption (cautiously)

Some graphics cards limit power consumption to effectively reduce temperature with little performance loss.
以下仅作参考,通常情况不要动

Enter persistence mode

nvidia-smi -pm 1

Limit card 0 power consumption to 200w

nvidia-smi -pl 200 -i 0

3. Install an older cuda version

Although the cuda version has driver version restrictions, the version with this restriction is the lowest driver version.
For example, the initial driver version of rtx4090 is 522.25, while the default cuda version of cuda11.8 is 522.06 (cannot be installed directly by default). If you need this version of cuda.
You should install the N card driver first, and then run cuda tookit11.8. At this time, the program will skip the driver by default. The cuda version displayed by nvcc -V and nvidia-smi is inconsistent because the two principles are different. cuda is run through nvcc. , so nvcc shall prevail, especially under Windows, it doesn’t matter if you accidentally upgrade the N card driver, the actual version of CUDA will not change.

Guess you like

Origin blog.csdn.net/ziqibit/article/details/129935737