Linux (centos7) offline installation of A100 graphics card driver cuda/cudnn and solve docker not select device driver...gpu

1. Confirm the GPU model and operating system version. In this example, the A100 and the operating system are Centos 7.9.
Prepare the GPU driver and CUDA 11.2 software package, and download the driver package and CUDA package from nvidia’s official website.
Link: link
insert image description here
Linux 64-bit
CUDA Toolkit is the latest version for all Linux systems
. If you need an old version of CUDA, please go to the old version of CUDA to download
this example Using CUDA 11.2 in .

Visit the official website of nvidia, download CUDA, the link of cuda is: https://developer.nvidia.com/cuda-downloads,
select the runfile file to install.
insert image description here

2. Check the server GPU identification

3. Before installing the GPU driver, you need to check whether the GPU card can be fully recognized under the operating system. If it cannot be recognized, you need to perform hardware inspections such as re-plugging and swapping tests.

View all GPUs

   lspci | grep -i nvidia

insert image description here
4. Uninstall the old version software package (optional)

GPU driver offload

/usr/bin/nvidia-uninstall

CUDA uninstall method:

/usr/local/cuda/bin/cuda-uninstaller

6. Disable the nouveau module that comes with the system

Check whether the nouveau module is loaded, if it is loaded, disable it first

 lsmod | grep nouveau

7. Install gcc, g++ compiler

cuda needs g++ when installing the samples test program for make, but it does not need to install the cuda package.

yum -y install gcc gcc-c++ kernel-devel make

8. Disable the nouveau module that comes with the system

Check whether the nouveau module is loaded, if it is loaded, disable it first

 lsmod | grep nouveau

9. If there is no blacklist-nouveau.conf file, create it

 vim /usr/lib/modprobe.d/blacklist-nouveau.conf            
 blacklist nouveau
 options nouveau modeset=0

Execute the following command to make the kernel take effect (you need to restart the server to actually disable nouveau)

    dracut -force

10. Restart the operating system

reboot

11. Restart the system, and then check whether disabling the nouveau module configuration and text mode takes effect.

lsmod | grep new

12. Modify the system operating level to text mode GPU driver installation must be performed in text mode

 systemctl set-default multi-user.target    

GPU driver installation

GPU driver under the root user

chmod +x NVIDIA-Linux-x86_64-450.80.02.run
./NVIDIA-Linux-x86_64-450.80.02.run --no-opengl-files --ui=none --no-questions --accept-license

Configure GPU driver memory resident mode

nvidia-persistenced

Set up autostart

vim /etc/rc.d/rc.local

Add a line to the file

nvidia-persistenced

Give executable permission to /etc/rc.d/rc.local file

chmod +x /etc/rc.d/rc.local

If there is no /etc/rc.d/rc.local, it can also be modified

vim /etc/rc.local
chmod +x /etc/rc.local

After installing the GPU driver, check the GPU status and related configurations.

nvidia-smi

CUDA installation Install
CUDA
Pay attention when installing CUDA. If you have already installed the GPU driver, do not choose to install the GPU driver when installing CUDA.

 chmod +x cuda_11.1.1_455.32.00_linux.run
sh cuda_11.1.1_455.32.00_linux.run --no-opengl-libs

New version CUDA installation interface: Pay attention to the Driver option, indicating whether to install the GPU driver. If the GPU driver has already been installed, do not check it here.
insert image description here
Configure environment variables
to be added to the /etc/profile file and take effect for all users

vim /etc/profile
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source /etc/profile

Test whether the cuda installation is correct and whether the environment variable is recognized successfully

nvcc -V

Reference link
link: link

Docker - Solve could not select device driver...gpu problem (install nvidia-container-runtime)

Link: [link]https://www.hangge.com/blog/cache/detail_3184.html)
Link: link

Guess you like

Origin blog.csdn.net/dream_home8407/article/details/130327772