2021-12-01How to solve the problem that tensorflow cannot call GPU to run?

1. Current environment and problems

1.1 Problem description

The current main problem is that when doing deep learning, the GPU cannot be successfully called for calculations. The program has been running on the CPU. For deep learning with large amounts of data, the main problems with high time costs are as follows (the codes are all run in the python
environment ):
①tensorflow version check

import tensorflow as tf
tf.__version__  #(2.6.0)

The following problems occur when importing tensorflow:
Insert image description here
②Available GPU device check

import tensorflow as tf
from tensorflow.python.client import device_lib
#输出可用的GPU数量
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
#查询GPU设备
print(device_lib.list_local_devices())

The above results are returned as follows:
The usable number of GPUs is 0 (the actual number of usable GPUs is 2). The
Insert image description here
GPU device was not successfully detected, and only the CPU was detected. Based
Insert image description here
on the above, it is currently believed that the GPU configuration is unsuccessful and the version does not match. type of questions

1.2 Description of operating environment, etc.

①Operating system linux/ubuntu18.04 (the query method is as follows)
Insert image description here
②System architecture=amd64 (x86_64) (the query method is as follows)
Insert image description here
③python=3.6.9
④tensorflow-gpu=2.6.0
⑤cuda=9.1.85 (the query method is as follows) Figure)
Pay attention to the difference between the two query commands: https://blog.csdn.net/hb_learning/article/details/115534219
Insert image description here
⑥No cudnn (the query method is as shown below)
Insert image description here

2. Solutions to this problem


①Query the cuda and cudnn query links corresponding to the tensorflow version : https://tensorflow.google.cn/install/source?hl=en
Insert image description here
From the above, we can know that the versions corresponding to tensorflow2.6.0 are cudnn=8.1 and cudn=11.2 respectively.
②Upgrade cudn version.
First download the corresponding cudn version file. Download address: https://developer.nvidia.com/cuda-11.2.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal
. You can directly refer to the official website for installation. Tutorial: Enter the relevant commands in the terminal for installation.
Insert image description here
Because I am running in a docker container and do not have root permissions, when typing on the command line, I will leave out sudo and enter the following content directly.
③ Configure cudn
a. View the .bashrc file

vim  ~/.bashrc

b. Add at the end of the file

export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH

c. Finally, use the following command to activate the modifications to the .bashrc file and take effect.

source ~/.bashrc

d. Use the nvcc -V command to check the version of cudn. You can see that it is now 11.2.152, as shown in the figure:
Insert image description here
④Install cudnn
a. First download the corresponding cudnn installation package, the installation package link: https://developer.nvidia .com/rdp/cudnn-archive
b. Unzip the installation package and enter the decompression directory

tar zxvf cudnn-11.2-linux-x64-v8.1.0.77.tgz
cd 压缩文件所在路径

c. Execute cp and chmod commands to configure the cudnn environment

sudo cp cuda/include/cudnn.h /usr/local/cuda-11.2/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-11.2/lib64
sudo chmod a+r /usr/local/cuda-11.2/include/cudnn.h 
sudo chmod a+r /usr/local/cuda-11.2/lib64/libcudnn*

d. Check the cudnn version.
The commonly used command on the Internet to check the cudnn version is:

cat /usr/local/cuda-11.2/include/cudnn.h | grep CUDNN_MAJOR -A 2

However, the command did not respond, so I searched for solutions online. The link to the effective solution : https://blog.csdn.net/eaxy_z/article/details/108615548
. The final query is that the version of cudnn is 8.1.0.
Insert image description here
So far, all configurations are correct. It has been completed. Now you can check the number of available GPUs according to ② in the 1.1 problem description. The results are shown as follows (partial screenshot):
Insert image description here

Hahahaha, the problem that has been bothering me for a long time has finally been solved. The specific reference content is as follows :
①Added content of the .bashrc file when configuring cudn: https://blog.csdn.net/qq_16792139/article/details/113256279
②Overall Installation process:
https://zhuanlan.zhihu.com/p/72298520

Guess you like

Origin blog.csdn.net/LJ1120142576/article/details/121650605