Linux operating system, install Nvidia graphics card driver from scratch, very detailed!

Source: AINLPer public account (daily sharing of useful information!!)
Editor: ShuYini
Proofreading: ShuYini
Time: 2023-9-15

 A graphics card driver is indispensable for running the model. The main contents of this part include: graphics card driver model alignment, driver dependency installation, and graphics card installation. If you have already installed the graphics card driver, you can skip this step.

1. Check the server model:

cat /etc/redhat-release

2. Check the server graphics card model:

sudo lshw -numeric -C display  或  lspci | grep -i vga

3. Visit Nvidia’s official website: https://www.nvidia.cn/Download/index.aspx?lang=cn, select the corresponding version according to your own graphics card series model, and download it. As shown below:
Please add image description

 The graphics card I use is T4 and the operating system is Centos7.9, so I chose this version. Another thing to note here is that the version of CUDA Toolkit needs to be consistent with the version of Pytorch, Tensorflow and other frameworks used. At present, I see that Pytorch can support up to CUDA version 11.8.
4. Install GCC, kernel components, dkms and other related dependencies

yum install gcc
yum install gcc-c++
yum -y install kernel-devel
yum -y install kernel-headers
yum -y install epel-release
yum -y install dkms

4. Close nouveau. Among them, Nouveau is an open source 3D driver developed by a third party for NVIDIA graphics cards, and it has not been recognized and supported by NVIDIA. Although Nouveau Gallium3D is far from comparable to NVIDIA's official private driver in terms of game speed, it makes it easier for Linux to cope with various complex NVIDIA graphics card environments, allowing users to enter the desktop after installing the system and have good display effects. Therefore, many Linux distributions integrate the Nouveau driver by default and install it by default when encountering NVIDIA graphics cards. This is especially true for the enterprise version of Linux. Almost all enterprise Linux distributions that support graphical interfaces include Nouveau.

 For individual desktop users, Nouveau in its growth stage is not perfect. Unlike the enterprise version, individual users often need some 3D special effects in addition to the normal display of the graphical interface. Nouveau cannot complete it most of the time, and users When installing NVIDIA's official private driver, Nouveau becomes an obstacle again. If Nouveau is not disabled, an error is always reported during installation. The error is as follows: ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver...

  • 1) Check if nouveau is running:
lsmod | grep nouveau
  • 2) Modify the system blacklist configuration file, go to the /etc/modprobe.d folder, and find the file with the words blacklist.conf. Modify the configuration file through vim, add the following content to the file, and finally save it through !wq.
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
  • 3) Update the kernel server parameters (if the first command does not work, you can try the second one)
update-initramfs -u  或者  dracut --force
  • 4) Restart the server
reboot
  • 5) Check again to see if nouveau is running. If not, it means nouveau is completely closed.
lsmod | grep nouveau

5. Copy the driver to the server and execute the following command to install the graphics card driver (if an installation error occurs, please see item 6 below):

chmod +x NVIDIA-Linux-x86_64-515.105.01.run
sh NVIDIA-Linux-x86_64-515.105.01.run

6. Graphics card installation error.
 During the installation process, the main error I encountered was: ERROR: Unable to find the kernel source tree for the currently running kernel...
Insert image description here
 When I encountered this problem, I read many examples online. I will do it here. Sort it out. This error means that the kernel version running on the operating system is inconsistent with kernel-devel. Just align the two versions here. The specific operations are as follows:
1) Check the kernel number of the system running version

cat /proc/version

2) List all kernel-related resources in the current system

rpm -qa | grep kernel
  • Or directly list the installed versions of kernel-devel and kernel-headers
yum info kernel-devel、kernel-headers

As shown below:
Insert image description here

 It can be found that the kernel number running on the server is different from the version numbers of kernel-devel and kernel-headers. There are two ways to do this. One is to align the server's kernel version number with the kernel-devel and kernel-headers version numbers. The other is to align the kernel-devel and kernel-headers version numbers with the kernel number that the server system is running. Alignment.
1) The system kernel number is aligned with the kernel-devel and other numbers.

# 根据kernel-devel编号安装对应的系统内核
yum install kernel-3.10.0-1160.95.1.el7.x86_64

# 设置系统默认启动内核版本
grub2-set-default kernel-3.10.0-1160.95.1.el7.x86_64

# 重启服务器
reboot

# 再次进入服务器,查看系统运行内核编号
cat /proc/version

2) Align the kernel-devel and other numbers with the system kernel number (assuming the system kernel number is: kernel-3.10.0-1160.95.1.el7.x86_64)

# 根据系统内核编号安装对应的kernel-devel、kernel-headers
yum install kernel-headers-3.10.0-1160.95.1.el7.x86_64
yum install kernel-devel-3.10.0-1160.95.1.el7.x86_64

Regardless of the above method, the result obtained is as shown in the figure below. At this time, follow the command in step 5 above to install the graphics card driver.
Insert image description here
In addition, if there are other services that require the previous kernel, then you need to switch the kernel for the server. The specific operations are as follows:

# 进入到/boot/grub2或者/etc目录下面,其中:/etc/grub2.cfg文件是一个文件链接,实际链接到/boot/grub2/grub.cfg
#看一下是否有:grub.cfg,如果没有需要创建。
grub2-mkconfig -o /boot/grub2/grub.cfg

#查看当前内核
grub2-editenv list

#查看已安装内核
awk -F' '$1=="menuentry " {
    
    print i++ " : " $2}' /boot/grub2/grub.cfg

#设置默认启动版本
grub2-set-default xx #xx为你看到的内核编号

#重建内核配置文件
grub2-mkconfig -o /boot/grub2/grub.cfg

#重启生效
reboot

The installation is complete! ! !
If you have any questions, follow the AINLPer official account and join the group to communicate!

Guess you like

Origin blog.csdn.net/yinizhilianlove/article/details/132908863