System Version:CentOS 7.9.2009
Kernel version: Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
1. Install dependencies
yum -y install epel-release
yum -y install gcc binutils wget
yum -y install kernel-devel
2. Disable Nouveau
2.1. Check whether Nouveau is turned on
lsmod | grep nouveau
Note: No information output means it has been disabled. No need to perform the following steps;
2.2. Modify configuration
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
2.3.Backup img
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
2.4.Rebuild
dracut /boot/initramfs-$(uname -r).img $(uname -r)
2.5. Restart the system
reboot
2.6. Check whether it is closed
lsmod | grep nouveau
Note: No information output indicates successful disabling;
3. Check the driver
3.1. Install elrepo source
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
or
yum -y install https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
3.2. Install nvidia-detect
yum -y install nvidia-detect
3.3. Detect graphics card driver
nvidia-detect -v
Probing for supported NVIDIA devices…
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 510.60.02 NVIDIA driver kmod-nvidia
4. Driver installation
4.1. Download driver
wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/510.68.02/NVIDIA-Linux-x86_64-510.68.02.run
Note: If it is detected that the version number is inconsistent with mine, you can replace the part yourself.
Suggestion: Download it to a USB flash drive from the NVIDIA official website and copy it to the server.
Note: Nvidia will only provide the latest version, which is backward compatible, so what I installed here is 510.68.02, not 510.60.02
4.2.Authorization
chmod +x NVIDIA-Linux-x86_64-510.68.02.run
An error will be reported here and the X service needs to be closed.
Check whether it is gdm (there are two types, this serverless one belongs to gdm)
systemctl --all|grep gdm
whereis gdm
systemctl stop gdm.service
install driver
systemctl start gdm.service
4.3.Installation
sh ./NVIDIA-Linux-x86_64-510.68.02.run -s
4.4. View graphics card information
nvidia-smi
Note: The information output indicates that the graphics card driver has been installed.
In addition: I also installed
python 3.9.11
pytorch1.11.0
tensorflow-gpu 2.7.0
transformers 4.18.0
cuda 11.3
cudnn 8.2.0
, all of which are the latest versions, and successfully tried it.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 510.68.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 49% 82C P2 246W / 250W | 8944MiB / 11264MiB | 99% Default |
| | | N/A |
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 10400 G /usr/bin/X 84MiB |
| 0 N/A N/A 23147 G /usr/bin/gnome-shell 84MiB |
| 0 N/A N/A 29312 C python 8771MiB |
+-----------------------------------------------------------------------------+
5. Uninstall the driver
5.1. Uninstall and install
nvidia-uninstall
5.2. Clean installation
dkms remove
Note: Need to install "yum -y install dkms"
6. Common mistakes
1.安装时报错“ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the ‘–kernel-source-path’ command line option.”
Solution:
Install kernel libraries
yum -y install epel-release
yum -y install kernel-devel
Kernel version comparison
rpm -qa |grep kernel
uname -r
install driver
./NVIDIA-Linux-x86_64-510.68.02.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64 -k $(uname -r)