How to solve the problem that tensorflow cannot call GPU to run?
1. Current environment and problems
1.1 Problem description
The current main problem is that when doing deep learning, the GPU cannot be successfully called for calculations. The program has been running on the CPU. For deep learning with large amounts of data, the main problems with high time costs are as follows (the codes are all run in the python
environment ):
①tensorflow version check
import tensorflow as tf
tf.__version__ #(2.6.0)
The following problems occur when importing tensorflow:
②Available GPU device check
import tensorflow as tf
from tensorflow.python.client import device_lib
#输出可用的GPU数量
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
#查询GPU设备
print(device_lib.list_local_devices())
The above results are returned as follows:
The usable number of GPUs is 0 (the actual number of usable GPUs is 2). The
GPU device was not successfully detected, and only the CPU was detected. Based
on the above, it is currently believed that the GPU configuration is unsuccessful and the version does not match. type of questions
1.2 Description of operating environment, etc.
①Operating system linux/ubuntu18.04 (the query method is as follows)
②System architecture=amd64 (x86_64) (the query method is as follows)
③python=3.6.9
④tensorflow-gpu=2.6.0
⑤cuda=9.1.85 (the query method is as follows) Figure)
Pay attention to the difference between the two query commands: https://blog.csdn.net/hb_learning/article/details/115534219
⑥No cudnn (the query method is as shown below)
2. Solutions to this problem
①Query the cuda and cudnn query links corresponding to the tensorflow version : https://tensorflow.google.cn/install/source?hl=en
From the above, we can know that the versions corresponding to tensorflow2.6.0 are cudnn=8.1 and cudn=11.2 respectively.
②Upgrade cudn version.
First download the corresponding cudn version file. Download address: https://developer.nvidia.com/cuda-11.2.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal
. You can directly refer to the official website for installation. Tutorial: Enter the relevant commands in the terminal for installation.
Because I am running in a docker container and do not have root permissions, when typing on the command line, I will leave out sudo and enter the following content directly.
③ Configure cudn
a. View the .bashrc file
vim ~/.bashrc
b. Add at the end of the file
export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
c. Finally, use the following command to activate the modifications to the .bashrc file and take effect.
source ~/.bashrc
d. Use the nvcc -V command to check the version of cudn. You can see that it is now 11.2.152, as shown in the figure:
④Install cudnn
a. First download the corresponding cudnn installation package, the installation package link: https://developer.nvidia .com/rdp/cudnn-archive
b. Unzip the installation package and enter the decompression directory
tar zxvf cudnn-11.2-linux-x64-v8.1.0.77.tgz
cd 压缩文件所在路径
c. Execute cp and chmod commands to configure the cudnn environment
sudo cp cuda/include/cudnn.h /usr/local/cuda-11.2/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-11.2/lib64
sudo chmod a+r /usr/local/cuda-11.2/include/cudnn.h
sudo chmod a+r /usr/local/cuda-11.2/lib64/libcudnn*
d. Check the cudnn version.
The commonly used command on the Internet to check the cudnn version is:
cat /usr/local/cuda-11.2/include/cudnn.h | grep CUDNN_MAJOR -A 2
However, the command did not respond, so I searched for solutions online. The link to the effective solution : https://blog.csdn.net/eaxy_z/article/details/108615548
. The final query is that the version of cudnn is 8.1.0.
So far, all configurations are correct. It has been completed. Now you can check the number of available GPUs according to ② in the 1.1 problem description. The results are shown as follows (partial screenshot):
Hahahaha, the problem that has been bothering me for a long time has finally been solved. The specific reference content is as follows :
①Added content of the .bashrc file when configuring cudn: https://blog.csdn.net/qq_16792139/article/details/113256279
②Overall Installation process:
https://zhuanlan.zhihu.com/p/72298520