【20230414】NVIDIA graphics card, driver, CUDA and cuDNN installation under Ubuntu system

1 Basic concepts

CUDA (Compute Unified Device Architecture) and cuDNN (CUDA Deep Neural Network library) are two technologies related to deep learning and GPU computing, but they have different roles and functions.

  • CUDA: CUDA is a parallel computing platform and programming model launched by NVIDIA, which is used for general computing using GPU (Graphics Processing Unit). CUDA provides a set of C language-based programming interfaces that allow developers to write parallel computing codes directly on the GPU, thereby accelerating various computing tasks on the GPU, including deep learning, scientific computing, image processing, etc. CUDA provides a wealth of parallel computing functions, including parallel computing models, memory management, thread models, etc., allowing developers to fully utilize the computing power of the GPU.
  • cuDNN: cuDNN is a library provided by NVIDIA to accelerate deep neural network calculations. It is specially designed for deep learning frameworks (such as TensorFlow, PyTorch, etc.) and provides a series of high-performance deep neural network computing functions, such as convolution, pooling, normalization, etc., to accelerate the training and development of deep learning models. reasoning. cuDNN takes advantage of the parallel computing capabilities and high-speed memory access characteristics of the GPU to provide fast deep learning computing capabilities through highly optimized algorithms and implementations, thus greatly improving the training and inference speed of deep learning models.

Therefore, CUDA is a parallel computing platform and programming model for general computing, while cuDNN is a GPU acceleration library specifically targeted at deep learning tasks, providing high-performance deep neural network computing functions. CUDA can be seen as a more general computing platform, while cuDNN is a library specifically optimized for deep learning based on CUDA to provide more efficient deep learning computing capabilities. In deep learning tasks, CUDA and cuDNN are usually used together, where CUDA is used to write parallel computing codes, and cuDNN is used to accelerate deep learning calculations.

2 Install NVIDIA graphics card driver

2.1 Dependency installation

sudo apt-get install -y g++ gcc make pkg-config libglvnd
## 2.2 查看GPU型号

```bash
lspci | grep -i nvidia

2.3 Download the driver according to the model

Official driver | NVIDIA https://www.nvidia.cn/Download/index.aspx?lang=cn

2.4 Uninstall the original driver

sudo apt-get remove --purge nvidia*

Supplement (about uninstall command)

  • Try not to use sudo apt autoremove because it will delete many software that it "thinks" it is no longer in use.
  • sudo apt-get remove Uninstalls the specified package from the system but keeps the configuration files and data
  • sudo apt-get purge uninstalls the specified package from the system and deletes its configuration files and data, as well as related dependencies and configuration
  • sudo apt-get remove --purge has the same effect as sudo apt-get purge

2.5 Disable nouveau universal open source driver

2.5.1 Why disable generic open source drivers

When installing the NVIDIA graphics driver, disabling the Nouveau universal open source driver may be due to the following reasons:

  1. Conflicts: Nouveau and NVIDIA drivers may conflict because they are both trying to control the same graphics card. This may cause errors during installation or the driver not working properly.
  2. Performance and features: Nouveau is an open source driver. Although it can provide basic graphics card functions, its performance and features are usually relatively low. NVIDIA's official closed-source drivers usually provide better performance and richer features, such as support for advanced graphics effects and hardware acceleration. Therefore, many users choose to disable Nouveau when installing NVIDIA drivers for better performance and functionality.
  3. Compatibility: Some NVIDIA graphics card models may not be fully compatible with the Nouveau driver, resulting in problems when using the Nouveau driver, such as abnormal image display, reduced performance, or system instability. In this case, disabling the Nouveau driver may be a necessary step to install the official NVIDIA driver to ensure that the graphics card works properly.
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
# 执行完上面命令后,使用下面命令查看是否禁用成功
# 或者:重启后使用lsmod | grep nouveau查看是否禁用成功,如果没有输出则成功
cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf

The following information is displayed to indicate successful disabling:重启电脑

blacklist nouveau
options nouveau modeset=0

2.6 Install the display manager (you can ignore this step, the default is gdm3)

Lightdm is a display manager. It mainly manages the login interface. Ubuntu20.04 and 22.04 need to be installed by yourself. Then select lightdm with the up and down keys.

(You can also not install lightdm in this step and use the gdm3 display manager that comes with ubuntu20.04 and 22.04. The intuitive difference is that the login window of gdm3 is in the middle of the monitor, while the login window of lightdm is on the left. There is no difference in normal use. Others The difference will not be explored here ;)

(For personal testing, please note that if you need to control multi-screen displays, gdm3 may be more suitable for you. For personal testing, using lightdm to set up multiple screens may cause screen freezes, freezes, and inability to move. Only For reference)
So I choose gdm3
sudo apt-get install lightdm

2.7 Stop the current display server

sudo telinit 3
# 停止显示服务器后自动进入到命令行界面
# 如果是默认的gdm3显示管理器,命令为sudo /etc/init.d/gdm3 stop
sudo /etc/init.d/lightdm stop # 或者(sudo service lightdm stop)

2.8 Install driver

cd 驱动下载目录
sudo chmod 777 驱动程序.run
sudo ./驱动程序.run
1.The distribution-provided pre-install script failed! Are you sure you want to continue?
选择**continue installation**

2.Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later?  

选择 No 继续。

3.问题没记住,选项是:install without signing

4.问题大概是:Nvidia's 32-bit compatibility libraries? 选择 No 继续。

5.Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up.  

选择 Yes  继续

2.9 Restart the service

sudo service gdm3 start 
# 输入nvidia-smi检查是否装好

3

Reference 1: Install NVIDIA graphics driver in Ubuntu 22.04
Reference 2: Install NVIDIA graphics driver in Ubuntu 18.04

おすすめ

転載: blog.csdn.net/Creationyang/article/details/130149049