CentOS 7 installs N card driver and CUDA and cuDNN

foreword

The system was CentOS 7.6 at the beginning. The version number of the kernel file given by yum when installing dependencies is different from the result of uname -r. At this time, the dependencies cannot be installed directly. After installing the driver, an error will be reported that the kernel header file cannot be found ( At first, I installed the dependencies directly, thinking that the higher version was compatible with the lower version, and then I reported an error when I installed the driver and reported that the kernel header file of version 957 could not be found), so I need to yum -y upgrade first, and restart after the upgrade (it becomes CentOS after restarting) 7.9), then the version number given by yum is the same as uname -r, and you can install dependencies.

Before the upgrade, the kernel version is 957. If you install yum, it will install 1160. The version is different.

before upgrade

after upgrade

Yum installed the 1160 version of the dependencies, and then installed the driver and reported an error that the 957 version of the kernel header file could not be found

Install N card driver

See which graphics cards are on the machine

lspci | grep -i vga

lspci | grep -i nvidia

disable nouveau

nouveau is an open-source driver for N cards, and it will be installed automatically by linux. It is not an official NVIDIA driver. You must disable it before installing the official driver.

Execute lsmod | grep nouveau, if there is output, it means it is not disabled; if there is no output, it means it is disabled.

Nouveau is not disabled

vim /usr/lib/modprobe.d/dist-blacklist.conf, comment out the line blacklist nvidiafb, and then add the following two lines

blacklist nouveau
options nouveau modeset=0

Then rebuild the initramfs image and execute the following command

mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
dracut /boot/initramfs-$(uname -r).img $(uname -r)

reboot

reboot

Then execute lsmod | grep nouveau, if there is no output, it means it is disabled.

nouveau disabled

install dependencies

yum install kernel-devel kernel-headers gcc dkms gcc-c++

install driver

Official download page , search according to your graphics card model. My graphics card is a Tesla P4. The boss in the consulting group said that CUDA 11.2 should be used, so this is the download link of my model .

Execute after downloading

chmod +x NVIDIA-Linux-x86_64-460.106.00.run
./NVIDIA-Linux-x86_64-460.106.00.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.83.1.el7.x86_64 -no-x-check --no-opengl-files
# --kernel-source-path的值是装完依赖后才有这个路径
# 远程安装会检测x server,要让它不检测
# 不安装opengl,因为安装opengl,CentOS界面UI不能正常启动

during installation

Would you like to register the kernel module sources with DKMS? Choose yes (server choose yes, local choose no)

Install NVIDIA's 32-bit compatibility libraries?选yes

安装完成后执行nvidia-smi,如果有输出,说明驱动已安装。

从最开始升级内核重启后,到这一步,装驱动时可能提示要重启,记不清了。

安装CUDA

安装

官方下载页,我下载的11.2.2的run文件,前面已经说了我的型号应该用这个版本。

chmod +x cuda_11.2.2_460.32.03_linux.run
./cuda_11.2.2_460.32.03_linux.run --no-opengl-libs

安装时,X表示选中,即安装,空白表示不选中,即不安装。驱动前面已经安装了,不用再安装。设成下面的样子,再Install。

CUDA Installer
- [ ] Driver
     [ ] 460.32.03
+ [X] CUDA Toolkit 11.2
  [X] CUDA Samples 11.2
  [X] CUDA Demo Suite 11.2
  [X] CUDA Documentation 11.2
  Options
  Install

安装完成后

vim /etc/profile
# 添加下面两行,路径要和上图中一样
export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH

# 保存并重新加载
source /etc/profile

测试是否安装成功

方法一

终端输入cuda并连按两次tab,若有候选命令,则再执行nvcc --version,有输出版本信息就是安装成功。

自动出现候选命令

方法二

执行CUDA的示例程序进行测试

cd /root/NVIDIA_CUDA-11.2_Samples/1_Utilities/deviceQuery
make
./deviceQuery

安装cuDNN

安装

官方下载页,一开始直接页面内搜11.2搜到两个结果,是2021年的,然后在最顶部发现是11.x,就选它了。直接下载会提示让登录NVIDIA帐号,查到一个不用登帐号的方法,就是在下面第二张图箭头处点右键,复制链接,然后用迅雷下,但是我用迅雷下载过程中中断了,所以还是注册了帐号。

tar -xvf cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
# 以下三行命令from https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
# 参考链接中这一步复制的文件和官方文档中不太一样
cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

测试是否安装成功

查到的资料是执行cat /usr/local/cuda-11.2/include/cudnn.h | grep CUDNN_MAJOR -A 2,

# 参考链接中的两个例子

[root@ctnr ~]# cat /usr/include/cudnn_v7.h |grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"



cat /usr/local/cuda-8.0/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR      6
#define CUDNN_MINOR      0
#define CUDNN_PATCHLEVEL 21
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

但我这没输出,查看cudnn.h,里面有一句

#include "cudnn_version.h"

再查看cudnn_version.h,里面有cuDNN版本信息,连着的三行分别代表主版本、次版本以及修定版本。

#ifndef CUDNN_VERSION_H_
#define CUDNN_VERSION_H_

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 7
#define CUDNN_PATCHLEVEL 0

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

/* cannot use constexpr here since this is a C-only file */
/* Below is the max SM version this cuDNN library is aware of and supports natively */

#define CUDNN_MAX_SM_MAJOR_NUMBER 9
#define CUDNN_MAX_SM_MINOR_NUMBER 0
#define CUDNN_MAX_DEVICE_VERSION (CUDNN_MAX_SM_MAJOR_NUMBER * 100) + (CUDNN_MAX_SM_MINOR_NUMBER * 10)

#endif /* CUDNN_VERSION_H */

最后

前言中说了yum装的内核版本要和系统一样,你可能会想到这个命令,它就是安装和你内核版本一样的依赖。

yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

但我升级前用这个命令提示找不到957的包,然后去pkgs.org也搜不到957的包,就去群里问了大佬,他让升kernel版本,升成1160。顺便还吐槽了我们还在用3.10,而截止到2023年2月底linux官方还在维护的最低版本是4.14。

参考链接

【Linux】 查看机器是否有GPU_linux查看gpu_jn10010537的博客-CSDN博客

GPU, CUDA,cuDNN三者的关系总结_挽手等风起的博客-CSDN博客

openEuler安装GPU、CUDA、cudnn_openeuler安装显卡驱动_irrationality的博客-CSDN博客

Kubernetes管理GPU应用 - breezey - 博客园 (cnblogs.com)

Tensorflow-gpu版本安装 - breezey - 博客园 (cnblogs.com)

CentOS部署显卡驱动:CUDA,cuDNN_天然玩家的博客-CSDN博客

CentOS 7 安装 NVIDIA 显卡驱动和 CUDA Toolkit_cuda toolkit centos_XueShengke的博客-CSDN博客

centos安装cuda · 大专栏 (dazhuanlan.com)

centos7安装显卡驱动、cuda以及cudnn_yingchenwy的博客-CSDN博客

CUDA与cuDNN安装教程(超详细)_kylinmin的博客-CSDN博客

Installation Guide :: NVIDIA Deep Learning cuDNN Documentation

Guess you like

Origin blog.csdn.net/fj_changing/article/details/129282112