Preface
When installing graphics card drivers on laboratory servers, we always encounter various problems. Therefore, I created a special article to record the various problems encountered.
Normal installation method
Install CUDA here. Select the latest version and click it according to the system configuration. The corresponding link will be automatically generated, as shown below. Select runfile here, which contains the required software packaged. Installing CUDA directly is required for running AI algorithms in the laboratory, and the other is that during installation, you will be prompted whether to install the graphics card driver. Then follow the requirements of the web page to wget and sh. After running, enter accept, and then select install or something.
Or, just download the driver here.
Summary of various issues
In practice, the installation may fail due to various problems. When it fails, the console will prompt you to view the log file. You can understand the error type based on the log information.
Nouveau kernel driver driver problem
ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution’s documentation for details on how to correctly disable the Nouveau kernel driver.
sudo vi /etc/modprobe.d/blacklist-nouveau.conf
write into it
blacklist nouveau
options nouveau modeset=0
Then update the kernel
sudo update-initramfs -u
final restart
sudo reboot
Nvidia-drm cannot load issue (to be confirmed)
Module is occupied by another application
sudo systemctl isolate multi-user.target
sudo modprobe -r nvidia-drm