Install the latest version of NVIDIA graphics card driver on CentOS7 (graphic display)

System Version:CentOS 7.9.2009

Kernel version: Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

1. Install dependencies

yum -y install epel-release

yum -y install gcc binutils wget

yum -y install kernel-devel

2. Disable Nouveau

2.1. Check whether Nouveau is turned on

lsmod | grep nouveau

Note: No information output means it has been disabled. No need to perform the following steps;

2.2. Modify configuration

echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf

2.3.Backup img

mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak

2.4.Rebuild

dracut /boot/initramfs-$(uname -r).img $(uname -r)

2.5. Restart the system

reboot

2.6. Check whether it is closed

lsmod | grep nouveau

Note: No information output indicates successful disabling;

3. Check the driver

3.1. Install elrepo source

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm

or

yum -y install https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm

3.2. Install nvidia-detect

yum -y install nvidia-detect

3.3. Detect graphics card driver

nvidia-detect -v

Probing for supported NVIDIA devices…
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 510.60.02 NVIDIA driver kmod-nvidia

4. Driver installation

4.1. Download driver

wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/510.68.02/NVIDIA-Linux-x86_64-510.68.02.run

Note: If it is detected that the version number is inconsistent with mine, you can replace the part yourself.

Suggestion: Download it to a USB flash drive from the NVIDIA official website and copy it to the server.

Note: Nvidia will only provide the latest version, which is backward compatible, so what I installed here is 510.68.02, not 510.60.02

4.2.Authorization

chmod +x NVIDIA-Linux-x86_64-510.68.02.run

An error will be reported here and the X service needs to be closed.

Check whether it is gdm (there are two types, this serverless one belongs to gdm)

systemctl --all|grep gdm
whereis gdm
systemctl stop gdm.service

install driver

systemctl start gdm.service

4.3.Installation

sh ./NVIDIA-Linux-x86_64-510.68.02.run -s

4.4. View graphics card information

nvidia-smi

Note: The information output indicates that the graphics card driver has been installed.
In addition: I also installed
python 3.9.11
pytorch1.11.0
tensorflow-gpu 2.7.0
transformers 4.18.0
cuda 11.3
cudnn 8.2.0
, all of which are the latest versions, and successfully tried it.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02    Driver Version: 510.68.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 49%   82C    P2   246W / 250W |   8944MiB / 11264MiB |     99%      Default |
|                               |                      |                  N/A | 
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     10400      G   /usr/bin/X                         84MiB |
|    0   N/A  N/A     23147      G   /usr/bin/gnome-shell               84MiB |
|    0   N/A  N/A     29312      C   python                           8771MiB |
+-----------------------------------------------------------------------------+

5. Uninstall the driver

5.1. Uninstall and install

nvidia-uninstall

5.2. Clean installation

dkms remove

Note: Need to install "yum -y install dkms"

6. Common mistakes

1.安装时报错“ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the ‘–kernel-source-path’ command line option.”

Solution:

Install kernel libraries

yum -y install epel-release
yum -y install kernel-devel

Kernel version comparison

rpm -qa |grep kernel
uname -r

install driver

./NVIDIA-Linux-x86_64-510.68.02.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64 -k $(uname -r)

Guess you like

Origin blog.csdn.net/weixin_46398647/article/details/124469828