Common problems and solutions of tensorflow-gpu installation


I often encounter problems when installing tensorflow-gpu. I have installed it several times and often encounter the same or similar problems, so I plan to record it and hope to help others.

Basic Information

  • tensorflow-gpu
  • Pip installation (virtualenv and other virtual installations are actually pip installations, but only build an independent environment, will not affect the system environment, it is easier to check the problem, at most recreate a clean environment and come again)

After installation, you will use import tensorflow to see if the installation is successful, and the result is an error. There are mainly two types of error messages encountered:

1.ImportError: DLL load failed: cannot find the pywrap_tensorflow.py of the specified module

There are a lot of pywrap_xxx related scripts in the error message:

Traceback (most recent call last):
  File "E:\study\machinelearning\ENV\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "E:\study\machinelearning\ENV\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "E:\study\machinelearning\ENV\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "E:\study\machinelearning\ENV\lib\imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "E:\study\machinelearning\ENV\lib\imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: 找不到指定的模块。

This type of error occurs the most, mainly due to several major causes:

(1) Microsoft Visual C ++ 2015 Redistributable Update 3 is not installed

This was encountered when I first installed it. After downloading vc_redist.x64.exe , it will be ok.

  • Regeneration wave When
    I installed it again today, I downloaded it and found that it could not be installed. Looking at the log said that my vs version is relatively new, so it cannot be installed. At this time, you can check whether there is the file MSVCP140.DLL under the system32 of your local machine.

  • Other solutions
    Some netizens said that they use the newer tensorflow, just install the 2017 Redistributable package, you can also try

After I installed the 2017 package again, and checked that I already had the MSVCP140.DLL file in my system, I still reported the same error

(2) The versions of cuda and cudnn are inconsistent

This problem is also very many. I have installed cuda many times and basically did not fail to install it, but encountered inconsistencies with the cudnn version. Because the downloaded cuda defaults to the latest version of cuda10.0, and the cudnn I downloaded was the old one, that is for cuda9.0, so I changed it later to solve the problem.

  • cuda download
    Insert picture description here
    I got the latest cuda10-win10 by default after clicking the configuration of my system (win10x64). You can click the Legacy Releases on the far right to see an earlier version

  • cuda installation and verification The
    next way seems to have encountered no problems.
    Verification: Enter nvcc -V under the command line to see if it is OK.
    In addition, the two under the sample are deviceQuery.exe and bandwidthTest.exe.

  • cudnn download
    To log in to the nvidia developer account,
    Insert picture description here
    click on the bottom of the Archived cuDNN Releases to see more versions, because I downloaded cuda-9.0, for security reasons, the cudnn version I downloaded is: Download cuDNN v7.0.5 (Dec 5 , 2017), for CUDA 9.0,
    according to the principle here, Download cuDNN v7.5.0 (Feb 21, 2019), for CUDA 9.0 should also be possible, please confirm it next time.

  • cudnn installation
    In the download page, you can open the Installation-Guide to take a look at the cudnn installation guide for windows. The main operations are as follows
    (1) Copy the files under the three folders of bin, lib and include under the unzipped cudnn to cuda installation The
    cuda path under the directory with the same name under the directory : C: \ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v9.0
    (2) Add the CUDA path to the CUDA_PATH of the environment variable
    Insert picture description here
    cuda This book will install cuda during installation The path of the environment variable added by the path ( note : it is at the front of the path and is not easy to see), so you do n’t have to add the cuda path to the path yourself

Here I put the decompressed cudnn on the d drive, for example: D \ cuda, and then put D: \ cuda \ bin in the path, because some people on the Internet suggest this. But I didn't mention it in the installation guide of cudn, so I think it should not be needed

It is a pity that today I still guarantee that this version will still be reported ==

(3) The version of tensorflow-gpu is inconsistent

When installing tensorflow-gpu, the default instructions are generally used:

pip install --upgrade tensorflow-gpu

The result is that the latest version of tensorflow-gpu will be installed. My version is as follows:
(1) python: 3.6.0
(2) cuda-9.0
(3) cudnn-7.0
(4) tensorflow-gpu-1.13.0

The latest cuda is 10.0, but I installed 9.0, so I installed tensorflow-gpu to 1.12.0, and then solved the problem perfectly. _

pip uninstall tensorflow-gpu==1.13.0
pip install tensorflow-gpu==1.12.0

This shows that tensorflow-gpu1.13.0 is estimated to use the content of the latest cuda version, which can be regarded as a version inconsistency.

If, like me, the above problems are solved, then see if the version is too new or too old. There is an episode here, because I started to accidentally lose 1.12.0 to 1.2.0, but the result is still not good, not paying attention to the result is a waste of time.

(4) Other python library version issues, etc.

Some people on the Internet also encountered problems such as numpy and other python library versions, but I did not encounter it because when installing tensorflw-gpu, all dependent packages will be downloaded.

2.TensorFlow pip installation issue: cannot import name 'descriptor’之graph_pb2.py

The error information is as follows. The script related to graph_xxx reports errors:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "F:\study\machinelearning\ENV\lib\site-packages\tensorflow\__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
  File "F:\study\machinelearning\ENV\lib\site-packages\tensorflow\python\__init__.py", line 59, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "F:\study\machinelearning\ENV\lib\site-packages\tensorflow\core\framework\graph_pb2.py", line 6, in <module>
    from google.protobuf import descriptor as _descriptor
  File "F:\study\machinelearning\ENV\lib\site-packages\google\protobuf\descriptor.py", line 47, in <module>
    from google.protobuf.pyext import _message
ImportError: DLL load failed: 找不到指定的程序。

I have encountered this twice, and the reason is that the version of protobuf is higher. This is also the reason I found it on the Internet. I reduced the protobuf version from 3.6.1 to 3.6.0

pip list
pip uninstall protobuf
pip install protobuf==3.6.0
pip list

reference

[1]import error: load dll failed

Published 41 original articles · praised 7 · 20,000+ views

Guess you like

Origin blog.csdn.net/pkxpp/article/details/88925868