A collection of solutions to various failed problems when installing Nvidia graphics card drivers on Ubuntu servers

Preface

When installing graphics card drivers on laboratory servers, we always encounter various problems. Therefore, I created a special article to record the various problems encountered.

Normal installation method

Install CUDA here. Select the latest version and click it according to the system configuration. The corresponding link will be automatically generated, as shown below. Select runfile here, which contains the required software packaged. Installing CUDA directly is required for running AI algorithms in the laboratory, and the other is that during installation, you will be prompted whether to install the graphics card driver. Then follow the requirements of the web page to wget and sh. After running, enter accept, and then select install or something.
Insert image description here

Or, just download the driver here.

Summary of various issues

In practice, the installation may fail due to various problems. When it fails, the console will prompt you to view the log file. You can understand the error type based on the log information.

Nouveau kernel driver driver problem

ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution’s documentation for details on how to correctly disable the Nouveau kernel driver.

solution:

sudo vi /etc/modprobe.d/blacklist-nouveau.conf

write into it

blacklist nouveau
options nouveau modeset=0

Then update the kernel

sudo update-initramfs -u

final restart

sudo reboot

Nvidia-drm cannot load issue (to be confirmed)

Module is occupied by another application

sudo systemctl isolate multi-user.target
sudo modprobe -r nvidia-drm

Guess you like

Origin blog.csdn.net/weixin_43483799/article/details/132815103