深度学习win10 python TensorFlow GPU CUDA cuDNN 环境搭建安装

win10 python  TensorFlow GPU CUDA cuDNN 环境搭建

少走弯路先看版本搭配。搭配!搭配!搭配!
官方表格
https://tensorflow.google.cn/install/source_windows#gpu

版本    Python 版本    编译器    构建工具    cuDNN    CUDA
tensorflow_gpu-1.12.0    3.5-3.6    MSVC 2015 update 3    Bazel 0.15.0    7    9
tensorflow_gpu-1.11.0    3.5-3.6    MSVC 2015 update 3    Bazel 0.15.0    7    9
tensorflow_gpu-1.10.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    7    9
tensorflow_gpu-1.9.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    7    9
tensorflow_gpu-1.8.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    7    9
tensorflow_gpu-1.7.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    7    9
tensorflow_gpu-1.6.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    7    9
tensorflow_gpu-1.5.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    7    9
tensorflow_gpu-1.4.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    6    8
tensorflow_gpu-1.3.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    6    8
tensorflow_gpu-1.2.0    3.5-3.6    MSVC 2015 update 3    Cmake v3.6.3    5.1    8
tensorflow_gpu-1.1.0    3.5    MSVC 2015 update 3    Cmake v3.6.3    5.1    8
tensorflow_gpu-1.0.0    3.5    MSVC 2015 update 3    Cmake v3.6.3    5.1    8

这些表格很重要,环境搭建时对应关系不当,会导致错误出现的花红柳绿,五彩纷呈。

有可能像“ImportError: DLL load failed: 找不到指定的模块”这种问题都会出现

根据操作系统找到合适的表格,根据自己的显卡驱动的支持来确定应该选哪一行。

这里https://developer.nvidia.com/cuda-gpus
查看自己的显卡是否支持cuda


这里https://www.geforce.cn/drivers
查看自己的显卡驱动的版本号

 
举个栗子如
GeForce Game Ready Driver - WHQL
版本: 419.35 - 发行日期: 2019-3-5

根据419.35能确定合适的cuda版本
官方文档
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installcuda
引用
4. Installing cuDNN on Windows
4.1. Prerequisites
Ensure you meet the following requirements before you install cuDNN.
A GPU of compute capability 3.0 or higher. To understand the compute capability of the GPU on your system, see: CUDA GPUs. Also see the cuDNN Support Matrix.
One of the following supported platforms:
Windows 7
Windows 10
Windows Server 2012
One of the following supported CUDA versions and NVIDIA graphics driver:
NVIDIA graphics driver R410 or newer for CUDA 10.0
NVIDIA graphics driver R396 or newer for CUDA 9.2
NVIDIA graphics driver R384 or newer for CUDA 9
NVIDIA graphics driver R377 or newer for CUDA 8

419应该选择 CUDA 10.0


此篇有图表,可查看对应关系
https://blog.csdn.net/qq_27158179/article/details/82952021


这个案例很经典,很欣赏文章的标题 正好是CUDA 10.0
https://www.cnblogs.com/sorex/p/7615185.html
Win10 x64 + CUDA 10.0 + cuDNN v7.5 + TensorFlow GPU 1.13 安装指南

Python 3.6.x x64

安装tensorflow-gpu指定版本

pip install tensorflow-gpu==1.13.1


NVIDIA 419.35 驱动

CUDA 10.0
下载地址:https://developer.nvidia.com/cuda-10.0-download-archive

其他版本这里
https://developer.nvidia.com/cuda-toolkit-archive
Latest Release
CUDA Toolkit 10.1 (Feb 2019), Versioned Online Documentation

Archived Releases

CUDA Toolkit 10.0 (Sept 2018), Online Documentation

CUDA Toolkit 9.2 (May 2018),Online Documentation
CUDA Toolkit 9.1 (Dec 2017), Online Documentation
CUDA Toolkit 9.0 (Sept 2017), Online Documentation
CUDA Toolkit 8.0 GA2 (Feb 2017), Online Documentation
CUDA Toolkit 8.0 GA1 (Sept 2016), Online Documentation
CUDA Toolkit 7.5 (Sept 2015)
CUDA Toolkit 7.0 (March 2015)
CUDA Toolkit 6.5 (August 2014)
CUDA Toolkit 6.0 (April 2014)
CUDA Toolkit 5.5 (July 2013)
CUDA Toolkit 5.0 (Oct 2012)
CUDA Toolkit 4.2 (April 2012)
CUDA Toolkit 4.1 (Jan 2012)
CUDA Toolkit 4.0 (May 2011)
CUDA Toolkit 3.2 (Nov 2010)
CUDA Toolkit 3.1 (June 2010)
CUDA Toolkit 3.0 (March 2010)
OpenCL 1.0 Release (Sept 2009)
CUDA Toolkit 2.3  (June 2009)
CUDA Toolkit 2.2  (May 2009)
CUDA Toolkit 2.1  (Jan 2009)
CUDA Toolkit 2.0  (Aug 2008)
CUDA Toolkit 1.1  (Dec 2007)
CUDA Toolkit 1.0 (June 2007)


cuDNN v7.5 for CUDA 10.0
下载地址:https://developer.nvidia.com/rdp/cudnn-download


解压后覆盖到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0目录即可。

Win10下的nvidia-smi在 C:\Program Files\NVIDIA Corporation\NVSMI 目录内。
添加路径到环境变量的Path
关于环境变量
这篇比较细致https://blog.csdn.net/qilixuening/article/details/77503631
引用原文
CUDA_SDK_PATH = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0(这是默认安装位置的路径,经自定义路径后,我的路径为D:\NVIDIA\CUDA Samples) 

CUDA_LIB_PATH = %CUDA_PATH%\lib\x64 

CUDA_BIN_PATH = %CUDA_PATH%\bin 

CUDA_SDK_BIN_PATH = %CUDA_SDK_PATH%\bin\win64 

CUDA_SDK_LIB_PATH = %CUDA_SDK_PATH%\common\lib\x64


然后:

在系统变量 PATH 的末尾添加: 

%CUDA_LIB_PATH%;%CUDA_BIN_PATH%;%CUDA_SDK_LIB_PATH%;%CUDA_SDK_BIN_PATH%; 

再添加如下4条(默认安装路径):
 
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin; 
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\common\lib\x64; 
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64; 

不要复制粘贴,变量值的路径都应该相应替换为你的安装的路径

验证安装
引用原文
配置完成后,我们可以验证是否配置成功,主要使用CUDA内置的deviceQuery.exe 和 bandwithTest.exe: 
首先win+R启动cmd,cd到安装目录下的 ...\extras\demo_suite,然后分别执行bandwidthTest.exe和deviceQuery.exe,应该得到下图:


如果以上两步都返回了Result=PASS,那么就算成功啦。

C:\Users\hasee>nvidia-smi
Sun Mar 10 22:01:06 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 419.35       Driver Version: 419.35       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105... WDDM  | 00000000:01:00.0  On |                  N/A |
| N/A   43C    P8    N/A /  N/A |    244MiB /  4096MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1148    C+G   Insufficient Permissions                   N/A      |
|    0     12644    C+G   ...hell.Experiences.TextInput.InputApp.exe N/A      |
+-----------------------------------------------------------------------------+

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\demo_suite>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1050 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     6351.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     6453.6

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     94710.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1050 Ti"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 4096 MBytes (4294967296 bytes)
  ( 6) Multiprocessors, (128) CUDA Cores/MP:     768 CUDA Cores
  GPU Max Clock rate:                            1620 MHz (1.62 GHz)
  Memory Clock rate:                             3504 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               zu bytes
  Total amount of shared memory per block:       zu bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          zu bytes
  Texture alignment:                             zu bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1, Device0 = GeForce GTX 1050 Ti
Result = PASS

安装完成后,进入python环境,输入:


>>> import tensorflow as tf

>>> hello = tf.constant('Hello, TensorFlow!')

>>> sess = tf.Session()

>>> print(sess.run(hello))

输出类似这样
2019-03-11 23:14:31.677862: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-03-11 23:14:32.091092: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2019-03-11 23:14:32.129912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-11 23:14:34.344642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-11 23:14:34.358376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-03-11 23:14:34.367380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-03-11 23:14:34.409061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3004 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
b'Hello, TensorFlow!'


至此,你就可以体验tensorflow-gpu的速度感了。

其他参考
https://blog.csdn.net/weixin_39290638/article/details/80045236
Win10下Tensorflow(GPU版)安装趟坑实录
https://blog.csdn.net/qq_36124802/article/details/79675485
tensflow-gpu版的无数坑坑坑!(tf坑大总结)
https://keras-cn.readthedocs.io/en/latest/for_beginners/keras_windows/
 Keras中文文档Docs » For beginners » Keras windows
https://blog.csdn.net/qq_33186949/article/details/79104659
Tensorflow之GPU和CPU

发布了78 篇原创文章 · 获赞 76 · 访问量 14万+

猜你喜欢

转载自blog.csdn.net/qq_38288618/article/details/88541852