Detailed TensorFlow™ GPU installation

TensorFlow™ is an open source software library for high-performance numerical calculations. With its flexible architecture, users can easily deploy computing tasks to multiple platforms (CPU, GPU, TPU) and devices (desktop devices, server clusters, mobile devices, edge devices, etc.). TensorFlow™  was originally developed by researchers and engineers in the Google Brain team (part of Google’s AI department). It can provide strong support for machine learning and deep learning, and its flexible numerical computing core is widely used in many other sciences field. Currently TensorFlow™  has two installation options for CPU (TensorFlow CPU) and GPU (TensorFlow GPU). Different from installing the TensorFlow CPU version through pip, installing TensorFlow GPU requires more underlying dependencies.

$ pip install tensorflow==1.12


$ pip install tensorflow-gpu==1.12

TensorFlow GPU mainly accesses the GPU through CUDA and cuDNN provided by NVIDIA, thereby achieving deep learning training acceleration capabilities that are dozens of times faster than the CPU. This article mainly introduces the installation and use of TensorFlow GPU version.

Glossary

Before getting into the topic, let me first introduce a few terms involved in it in detail.

  • Graphic Processing Unit (GPU): dedicated graphics equipment for personal computers, workstations and servers. Also known as display card or graphics card, the current main manufacturers are the N card produced by NVID and the A card produced by AMD. Compared with a CPU composed of several cores focusing on sequential serial processing, a GPU has a massively parallel computing architecture composed of thousands of smaller and more efficient cores, and it is better at parallel computing.image

  • Compute Unified Device Architecture (CUDA): It is a parallel computing architecture launched by NVIDIA that can solve complex GPU computing problems. At present, the evolution of NVIDIA's graphics architecture for distributed computing has experienced Tesla, Fermi, Kepler, Maxwell, Pascal, Volta (named after a physicist), and with the evolution of the architecture, its floating-point computing capabilities have become more and more. Strong. NVIDIA Tesla often names its products with the first letter, such as Tesla M40, Tesla P100, Tesla V100.

image

  • CUDA Toolkit: CUDA Toolkit provides a development environment for creating high-performance GPU accelerated applications. Currently supports C, C++, Fortran, Python and MATLAB programming, the name of the compiler is nvcc.

  • NVIDIA GPU drivers: It is used to drive software programs corresponding to NVIDIA graphics cards.

  • CUDA Deep Neural Network Library (cuDNN): An acceleration library for deep neural networks created by NVIDIA is a GPU acceleration library for deep neural networks. If you want to use GPU to train the model, cuDNN is not necessary, but this acceleration library is generally used.

GPU support

Install the corresponding hardware and software (drivers and libraries) support for TensorFlow GPU. The latest version of TensorFlow™ (1.12) for hardware:

  • NVIDIA® GPU card with CUDA® Compute Capability 3.5 or higher.

NVIDIA® GPU card with computing capability 3.5 or higher.

Requirements for software:

  • NVIDIA® GPU drivers—CUDA 9.0 requires 384.x or higher.

NVIDIA® GPU driver-CUDA 9.0 requires 384.x or higher.

  • CUDA® Toolkit—TensorFlow supports CUDA 9.0.

CUDA® Toolkit- TensorFlow支持CUDA 9.0。

  • CUPTIships with the CUDA Toolkit.

CUPTI附带CUDA工具包。

  • cuDNN SDK(> = 7.2)

NVIDIA深度神经网络的加速库(版本大于等于7.2)

硬件支持

从TensorFlow 所需要软件CUDA和CUDA的名词描述可以得出TensorFlow仅仅支持N卡,如果计算机属于A卡,则只能安装TensorFlow CPU版本,或者用AMD的ROCm GPU平台来安装GPU 版本的TensorFlow。本文重点讲以最为流行的N卡支持CUDA为重点讲解。使用lspci命令可以查看机器的NVIDIA显卡配置

$ lspci |grep -i nvidia


02:00.0 3D controller: NVIDIA Corporation GM200GL [Tesla M40] (rev a1)

82:00.0 3D controller: NVIDIA Corporation GM200GL [Tesla M40] (rev a1)

从显示结果看当前机器的GPU卡的型号是NVIDIA Tesla M40。当然,也并非所有的N卡都支持TensorFlow GPU 版本,其还要求GPU具有相应的计算能力,比如TensorFlow最新版1.12就要求计算能力大于3.5。NVIDIA GPU卡的计算能力可以通过其官网列表查询得到。目前NVIDIA产品主要系列有:

  • GeForce, 面向普通大众用户,包括GTX和TITAN系列,价格比较亲民,用于科学计算精度稍低。GTX桌面版750以上、笔记本930M以上和全系的TITAN X计算能力均大于等于5。

  • Tesla,面向企业部署而设计的用于大规模并行计算的产品,包含有F(Fermi 架构)、K(Kepler架构)、M(Maxwell架构)、P(Pascal架构)和V(Volta架构)系列产品。目前除Tesla K10以外,K、M、P、V系列的产品的计算能力均大于5。Tesla系列与GeForce 系列的TITAN X相比,其单机长期上线运行,稳定性更好。

  • Quadro,面向专业的绘图设计,其分为桌面版和手机版两个系列。

  • Terga/Jetson, 面向移动和嵌入式设备。

软件支持

TensorFlow官方文档会指定每一个TensorFlow GPU版本所依赖的python、cuDNN、GPU显卡驱动和CUDA的版本。在安装的时候一定注意版本的对应关系,若Python、cuDNN、GPU显卡驱动和CUDA的版本有低于TensorFlow GPU 要求的版本,在安装和使用过程中会出现错误。下表是TensorFlow GPU在Linux系统下版本对应表。

TensorFlow GPU python cuDNN CUDA
1.12 2.7、3.3-3.6 >=7.2 9
1.11 2.7、3.3-3.6 >=7.2 9
1.10 2.7、3.3-3.6 7 9
1.9.0 2.7、3.3-3.6 7 9
1.8.0 2.7、3.3-3.6 7 9
1.7.0 2.7、3.3-3.6 7 9
1.6.0 2.7、3.3-3.6 7 9
1.5.0 2.7、3.3-3.6 7 9
1.4.0 2.7、3.3-3.6 6 8
1.3.0 2.7、3.3-3.6 6 8
1.2.0 2.7、3.3-3.6 5.1 8
1.1.0 2.7、3.3-3.6 5.1 8
1.0.0 2.7、3.3-3.6 5.1 8

 

TensorFlow GPU 安装过程

以服务器的NVIDIA Tesla M40为例,在操作系统Centos 7上安装最新版本TensorFlow GPU 1.12。

GPU 驱动安装

根据TensorFlow 1.12 GPU支持要求的NVIDIA驱动版本,从NVIDIA网站选择对应的型号和操作系统,CUDA Toolkit版本,下载驱动文件,如NVIDIA-Linux-x86_64-384.145.run。

image

运行驱动文件,根据提示完成安装。

$ sh NVIDIA-Linux-x86_64-384.145.run

安装完成以后通过NVIDIA命令工具nvidia-smi查看GPU情况

$ nvidia-smi

image

从上图可以看到,NVIDIA GPU显卡驱动型号384.145。当有TensorFlow GPU任务运行的时候,使用该命令也可以查看GPU的内存使用情况,也可以作为检查TensorFlow是CPU运行还是GPU运行。

image

CUDA 安装

从NVIDIA网站选择相应的驱动版本,选择Linux,x86_64,CentOS 7,下载rpm(local)驱动文件。

image

根据安装指导安装

$ sudo rpm -i cuda-repo-rhel7-10-0-local-10.0.130-410.48-1.0-1.x86_64.rpm


$ sudo yum clean all

$ sudo yum install cuda

配置系统环境(/etc/profile)或者当前用户的环境(~/.bashrc)。

export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH


export CUDA_HOME=/usr/local/cuda-9.0/

export PATH=$PATH:$CUDA_HOME/bin

Source配置文件后,查看nvcc版本,若如下图所示则说明CUDA安装成功。到此,也就配置好了CUDA C/C++的编译环境,可以使用nvcc编译.cu的文件。

$ nvcc -V


nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2017 NVIDIA Corporation

Built on Fri_Sep__1_21:08:03_CDT_2017

Cuda compilation tools, release 9.0, V9.0.176

cuDNN 安装

从NVIDIA网站下载cuDNN 安装包,根据GPU及CUDA版本选择对应cuDNN版本,下载cuDNN v7.4.1 for CUDA9.0。

image

解压拷贝到CUDA安装目录,需要注意的是在/usr/local 目录下存在cuda和cuda-9.0两个目录,一定要拷贝到cuda目录下。

$ cp include/* /usr/local/cuda/include


$ cp lib64/* /usr/local/cuda/lib64

安装TensorFlow

安装TensorFlow GPU就比较容易了,在启用Python不同的虚拟环境下,可以直接通过pip安装,国内的用户可以选择国内的豆瓣pip源。

$ pip install tensorflow-gpu==1.12 -i https://pypi.douban.com/simple/

验证TensorFlow GPU

启动python交互运行界面,导入TensorFlow模块,进行简单的验证。

>>> import tensorflow as tf


>>> sess = tf.Session()

2018-11-21 16:02:44.949511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:

name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112

2018-11-21 16:02:45.089993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:

name: Tesla M40 major: 5 minor: 2 memoryClockRate(GHz): 1.112

从输出的日志可以看到发现了两个GPU卡,则说明 TensorFlow GPU是安装成功的。

TensorFlow示例运行

TensorFlow GitHub官方仓库

  • https://github.com/tensorflow/tensorflow.git

  • https://github.com/tensorflow/models.git

Download the corresponding code, you can directly run the examples in the samples.


Guess you like

Origin blog.51cto.com/15060464/2678399