Windows安装GPU环境CUDA、深度学习框架Tensorflow和Pytorch

1、未安装CUDA使用tensorflow报错

import tensorflow as tf

2022-03-06 15:14:38.869955: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-03-06 15:14:38.870236: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

2、CUDA介绍

首先需要安装GPU环境，包括cuda和cudnn。

深度学习本质上就是训练深度卷积神经网络。

cuda：显卡能够完成并行计算任务，所有的操作是比较底层的、复杂的。

cudnn：在cuda之上有一个专门用于深度神经网络的SDK库来加速完成相关特定的深度学习操作，是用于深度神经网络的GPU加速库。它强调性能、易用性和低内存开销。NVIDIA cuDNN可以集成到更高级别的机器学习框架中，如caffe、tensorflow、pytorch、mxnet等。cudnn简单的插入式设计可以让开发人员专注于设计和实现神经网络模型，而不是调整性能，同时还可以在GPU上实现高性能现代并行计算。
cuda就是用来定义显卡并行运算的一些列底层GPU操作库，cudnn则是在cuda基础上专门正对深度学习定制的高级GPU操作库。
在这里插入图片描述

这里我们匹配的版本是CUDA 11.0

3、安装CUDA

根据 Nvidia 的说法，CUDA 内核现在提供浮点和整数运算的并发执行，以提高现代游戏计算密集型工作负载的性能。
查询Tensorflow版本与CUDA的匹配关系
在这里插入图片描述
tensorflow_gpu-2.4.0

3.1 下载CUDA

https://developer.nvidia.com/cuda-downloads
在这里插入图片描述
选择Windows，打开cmd查看windows版本

安装完，在Anaconda中，输入nvcc -V 进行测试

Anaconda的使用教程可以查看之前的文章：
Python如何使用和配置Anaconda入门

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Thu_Feb_10_19:03:51_Pacific_Standard_Time_2022
Cuda compilation tools, release 11.6, V11.6.112
Build cuda_11.6.r11.6/compiler.30978841_0

CUDA的下载链接：https://developer.nvidia.com/cuda-toolkit-archive
cudnn的下载链接：https://developer.nvidia.com/cuda-downloads

3.2 CUDA下安装Tensorflow

激活Anaconda虚拟环境

conda activate  tfenv_py37

conda install tensorflow-gpu

Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf

2022-03-06 16:21:03.223773: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll

Cannot dlopen some GPU libraries.Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

安装了最新的，版本不匹配，需要根据显卡的CUDA版本，选择对应的CUDA

3.3 测试Tensorflow

import tensorflow as tf

a = tf.constant(2)

Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found

在这里插入图片描述
下载地址：https://developer.nvidia.com/rdp/cudnn-download

3.4 安装CUDNN

在这里插入图片描述
下载链接：https://developer.nvidia.com/rdp/cudnn-archive

解压

复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0

3.5 再次通过Tensorflow测试CUDA

import tensorflow as tf
tf.test.gpu_device_name()   # 显示显卡型号

print(tf.test.is_gpu_available())  # 提示True

Not creating XLA devices, tf_xla_enable_xla_devices not set

解决方案

os.environ['TF_XLA_FLAGS'] = '--tf_xla_enable_xla_devices' os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

其实这是由于Tensorflow 2.4版本新特性所致，可以**直接忽略，**看看2.4版本的release就一目了然，并不是很多博客说的版本对应问题，回退到老版本治标不治本。
如果需要用XLA，添加TF_XLA_FLAGS=–tf_xla_enable_xla_devices即可解决该warning。

4、安装pytorch

PyTorch 的速度表现胜过 TensorFlow和Keras 等框架。PyTorch 是所有的框架中面向对象设计的最优雅的一个。

PyTorch主要用来进行深度学习算法建模和推理，为了加快算法训练速度，一般情况下需要使用带GPU的电脑进行Pytoch安装，为了让PyToch能够使用GPU资源，需要安装GPU环境，包括CUDA和CUDNN。
安装Pytorch

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

# CUDA 11.0
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html