Training of yolov3 (2) cuda and cudnn

Question raised
https://ask.csdn.net/questions/7771526

Reference tutorial
https://blog.csdn.net/zhaoxueqi666/article/details/120333153


###############################################
Students, don’t just follow me to operate this series of files, because This is a record of stepping on pits, not a tutorial. I just recorded the entire process, so that students in the future can avoid these pits when operating. I hope you can read the entire series of operation processes and then operate after consideration
### ################################


According to the guidance of csdn and other great gods on the Internet,
I first suspected that it was the version of cuda and cudnn of the machine Too low
, so first try to upgrade cuda and cudnn.
In fact, the picture here reflects another major problem, but I didn’t notice it at the time. I noticed it later and fixed it.

nvidia-smi

insert image description here

The current version of CUDA is 10.2

Upgrade to 11.1
and enter the official website
https://developer.nvidia.com/cuda-downloads

Select the corresponding version, and then select the download method of runfile, nvidia will give the download method of the command line.
insert image description here
If you want to choose other versions of cuda, you can choose at the bottom of the page
insert image description here

Then find the required version, click to enter and
insert image description here
execute the installation command given by nvidia

wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run

insert image description here
Then check the nouveau driver

lsmod | grep nouveau

insert image description here

No output means that nouveau is not loaded. If there is, disable nouveau.
Nouveau is an open source 3D driver developed by a third party for NVIDIA graphics cards. If you need to use nvidia's own driver, you must first disable it.

Go to the location where cuda was downloaded, and then execute the installation script
sudo sh cuda_11.1.0_455.23.05_linux.run
insert image description here

The file referenced in the middle
reinstalls cuda and reports an error "Error installing Cuda toolkit: Existing package manager installation of the driver found"


Then wait for a while, you can try to directly select continue first, and then install the driver without installing the driver. If it goes well, you can directly install it successfully. Remember to modify the .bashrc file later
insert image description here

Then there is the user license agreement.
Enter accept in the terminal to agree to the agreement.
insert image description here
Since I have installed the Nvidia graphics card driver before, the driver is not installed here, so just move to Driver, press enter, and remove the X in "[]" means no choose.
Then move to install and press Enter
insert image description here

select upgrade all
insert image description here

then select yes
insert image description here

After a while, the installation is complete

Then add the environment variable

sudo gedit ~/.bashrc 

Comment or delete the original cuda
According to the prompt of the terminal, then add the following statement

export PATH=$PATH:/usr/local/cuda-11.1/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-11.1/lib64

insert image description here

Then refresh the environment variable to make it effective

sourec .bashrc

Then check the cuda version, you can see that the environment variable has taken effect

nvcc -V

insert image description here

The cuda version nvidia-smi and nvcc -V of the data referenced in the middle are displayed differently

Then check the usage of GPU

nvidia-smi

insert image description here

Then the cuda installation is complete, and then install cudnn
According to the official website, CUDNN8.0.4 is matched with CUDA11.1
https://developer.nvidia.com/rdp/cudnn-archive#a-collapse804-111


Go to the download page, nvidia requires you to log in to your account to download cudnn
insert image description here

then find
insert image description here

Click on CUDNN8.0.4, and there are many versions
insert image description here
to add link descriptions.
If you don’t know the version of your computer, you can check it.

uname -a

insert image description here

It will be downloaded after completion.
Due to the network comparison card, I downloaded it on Win10, and then used filezilla to send it

Unzip the .solitairetheme8 file for reference materials in the middle

Then download a file of type .solitairetheme8, which needs to be converted to tgz and then decompressed
insert image description here

cd NVIDIA_CUDA-11.1_Samples/
cp cudnn-11.1-linux-x64-v8.0.4.30.solitairetheme8 cudnn-11.1-linux-x64-v8.0.4.30.tgz
tar -xvf cudnn-11.1-linux-x64-v8.0.4.30.tgz

insert image description here

Get a cuda folder after decompression
insert image description here

Put the decompressed file into the corresponding cuda project, but found that the terminal reported an error

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/

insert image description here

The terminal reported an error, moved to the path, and found that the file name was different
insert image description here

Make corresponding modifications and put the decompressed files into the corresponding cuda project

sudo cp cuda/include/cudnn.h /usr/local/cuda-11.1/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-11.1/lib64/

insert image description here

and set read-only permissions

sudo chmod a+r /usr/local/cuda-11.1/include/cudnn.h
sudo chmod a+r /usr/local/cuda-11.1/lib64/libcudnn*

insert image description here

Then in order to view the installed version number, you can put the cudnn_version.h file in the usr/local/cuda/include directory

Move the /home/heying/NVIDIA_CUDA-11.1_Samples/cuda/include/cudnn_version.h file to the /usr/local/cuda-11.1/include/path

sudo mv cudnn_version.h /usr/local/cuda-11.1/include

insert image description here

Check the version number, you can see that the cudnn version is 8.0.4

cat /usr/local/cuda-11.1/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

insert image description here

Compile darknt after the modification is completed,
insert image description here


the initial suspicion is that the version does not correspond

First delete the modified file of cudnn downloaded
insert image description here
and then re-download one

re-move

sudo cp include/cudnn.h /usr/local/cuda/include/
sudo cp lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Then check the version information

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

no output

sudo cp include/cudnn_version.h /usr/local/cuda/include/
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

process failed

Guess you like

Origin blog.csdn.net/Xiong2840/article/details/127934672