Question raised
https://ask.csdn.net/questions/7771526
Reference tutorial
https://blog.csdn.net/zhaoxueqi666/article/details/120333153
###############################################
Students, don’t just follow me to operate this series of files, because This is a record of stepping on pits, not a tutorial. I just recorded the entire process, so that students in the future can avoid these pits when operating. I hope you can read the entire series of operation processes and then operate after consideration
### ################################
According to the guidance of csdn and other great gods on the Internet,
I first suspected that it was the version of cuda and cudnn of the machine Too low
, so first try to upgrade cuda and cudnn.
In fact, the picture here reflects another major problem, but I didn’t notice it at the time. I noticed it later and fixed it.
nvidia-smi
The current version of CUDA is 10.2
Upgrade to 11.1
and enter the official website
https://developer.nvidia.com/cuda-downloads
Select the corresponding version, and then select the download method of runfile, nvidia will give the download method of the command line.
If you want to choose other versions of cuda, you can choose at the bottom of the page
Then find the required version, click to enter and
execute the installation command given by nvidia
wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
Then check the nouveau driver
lsmod | grep nouveau
No output means that nouveau is not loaded. If there is, disable nouveau.
Nouveau is an open source 3D driver developed by a third party for NVIDIA graphics cards. If you need to use nvidia's own driver, you must first disable it.
Go to the location where cuda was downloaded, and then execute the installation script
sudo sh cuda_11.1.0_455.23.05_linux.run
The file referenced in the middle
reinstalls cuda and reports an error "Error installing Cuda toolkit: Existing package manager installation of the driver found"
Then wait for a while, you can try to directly select continue first, and then install the driver without installing the driver. If it goes well, you can directly install it successfully. Remember to modify the .bashrc file later
Then there is the user license agreement.
Enter accept in the terminal to agree to the agreement.
Since I have installed the Nvidia graphics card driver before, the driver is not installed here, so just move to Driver, press enter, and remove the X in "[]" means no choose.
Then move to install and press Enter
select upgrade all
then select yes
After a while, the installation is complete
Then add the environment variable
sudo gedit ~/.bashrc
Comment or delete the original cuda
According to the prompt of the terminal, then add the following statement
export PATH=$PATH:/usr/local/cuda-11.1/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-11.1/lib64
Then refresh the environment variable to make it effective
sourec .bashrc
Then check the cuda version, you can see that the environment variable has taken effect
nvcc -V
The cuda version nvidia-smi and nvcc -V of the data referenced in the middle are displayed differently
Then check the usage of GPU
nvidia-smi
Then the cuda installation is complete, and then install cudnn
According to the official website, CUDNN8.0.4 is matched with CUDA11.1
https://developer.nvidia.com/rdp/cudnn-archive#a-collapse804-111
Go to the download page, nvidia requires you to log in to your account to download cudnn
then find
Click on CUDNN8.0.4, and there are many versions
to add link descriptions.
If you don’t know the version of your computer, you can check it.
uname -a
It will be downloaded after completion.
Due to the network comparison card, I downloaded it on Win10, and then used filezilla to send it
Unzip the .solitairetheme8 file for reference materials in the middle
Then download a file of type .solitairetheme8, which needs to be converted to tgz and then decompressed
cd NVIDIA_CUDA-11.1_Samples/
cp cudnn-11.1-linux-x64-v8.0.4.30.solitairetheme8 cudnn-11.1-linux-x64-v8.0.4.30.tgz
tar -xvf cudnn-11.1-linux-x64-v8.0.4.30.tgz
Get a cuda folder after decompression
Put the decompressed file into the corresponding cuda project, but found that the terminal reported an error
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
The terminal reported an error, moved to the path, and found that the file name was different
Make corresponding modifications and put the decompressed files into the corresponding cuda project
sudo cp cuda/include/cudnn.h /usr/local/cuda-11.1/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-11.1/lib64/
and set read-only permissions
sudo chmod a+r /usr/local/cuda-11.1/include/cudnn.h
sudo chmod a+r /usr/local/cuda-11.1/lib64/libcudnn*
Then in order to view the installed version number, you can put the cudnn_version.h file in the usr/local/cuda/include directory
Move the /home/heying/NVIDIA_CUDA-11.1_Samples/cuda/include/cudnn_version.h file to the /usr/local/cuda-11.1/include/path
sudo mv cudnn_version.h /usr/local/cuda-11.1/include
Check the version number, you can see that the cudnn version is 8.0.4
cat /usr/local/cuda-11.1/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
Compile darknt after the modification is completed,
the initial suspicion is that the version does not correspond
First delete the modified file of cudnn downloaded
and then re-download one
re-move
sudo cp include/cudnn.h /usr/local/cuda/include/
sudo cp lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Then check the version information
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
no output
sudo cp include/cudnn_version.h /usr/local/cuda/include/
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
process failed