从源码安装tensorflow1.8-gpu版本

配置如下：

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): v1.8.0
Python version: 2.7
Bazel version (if compiling from source):.10.0- (@non-git)
GCC/Compiler version (if compiling from source):4.9
CUDA/cuDNN version: 8.0 / 6.0
GPU model and memory: 1080Ti
Exact command to reproduce: bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

1.下载tensorflow源码

tensorflow是个开源库，在github上有源码，直接在上面下载。
下载地址：https://github.com/tensorflow/tensorflow
代码库默认为 master 开发分支。您也可以检出要构建的版本分支:
cd tensorflow git tag
查看到1.8分支为v1.8.0,切换分支：git checkout v1.8.0

2.安装一些依赖库

python的依赖库

tensorflow支持C、C++和Python三种语言，但是它对Python的支持是最全面的，所以我们这里使用Python。
Python的安装这里就赘述了。我这里安装的是python2.7，就以python2.7为例子

sudo apt-get install python-numpy python-dev python-pip python-wheel

tensorflow依赖库

安装 TensorFlow pip 软件包依赖项（如果使用虚拟环境，请省略 --user 参数）：


pip install -U --user pip six numpy wheel mock
pip install -U --user keras_applications==1.0.5 --no-deps
pip install -U --user keras_preprocessing==1.0.3 --no-deps

如果使用虚拟环境:

pip install -U  pip six numpy wheel mock
pip install -U keras_applications==1.0.5 --no-deps
pip install -U  keras_preprocessing==1.0.3 --no-deps

3.安装bazel

Bazel是从谷歌开源的自动化构建工具，谷歌内部绝大部分的应用都是通过它来编译的。

安装步骤安装：安装办法

安装相关库

sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python

下载bazel

下载地址:bazel有多个版本，下载名字为bazel--installer-linux-x86_64.sh的0.10.0版本

运行安装器

chmod +x bazel-<version>-installer-linux-x86_64.sh
./bazel-<version>-installer-linux-x86_64.sh --user

第一行是为了给该安装器添加执行权限

第二行是执行该安装器，后面的–user，表示把bazel安装到了$HOME/bin目录里面。

设置环境

export PATH="$PATH:$HOME/bin"

上面提到把bazel安装到了$HOME/bin目录，这里是把该目录添加到默认目录里面了。在调用bazel的时候系统会到该目录里查找bazel

安装JDK8
安装Bazel，首先要安装JDK8:

sudo apt-get update
sudo apt-get install openjdk-8-jdk

Add Bazel distribution URI as a package source

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

Install and update Bazel

sudo apt-get update && sudo apt-get install bazel
sudo apt-get install --only-upgrade bazel

4.配置tensorflow编译环境

cd tensorflow #进入之前下载的tensorflow源码目录内
chmod +x ./configure #给配置文件configure添加执行权限
./configure #执行configure文件

大部分都选的n,根据自己需要选择,注意CUDA cuDNN 与系统配置相同

You have bazel 0.10.0 installed.
Please specify the location of python. [Default is /data/anaconda2/envs/py27/bin/python]:
Found possible Python library paths:
  /data/anaconda2/envs/py27/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/data/anaconda2/envs/py27/lib/python2.7/site-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 6.0
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1,6.1]
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

5.编译pip安装包

bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

编译了好长一段时间，最后出现Build complete successfully,大功告成!

./bazel-bin/tensorflow/tools/pip_package/build_pip_package ./tensorflow_pkg

在tensorflow_pkg下生成了tensorflow-1.8.0-cp27-cp27mu-linux_x86_64.whl

7.用pip安装pip安装生成的安装包

pip install tensorflow-1.8.0-cp27-cp27mu-linux_x86_64.whl

8.验证tensorflow-gpu版本是否安装成功

检测tensorflow-gpu版本是否安装好，需要运行一个python代码进行测试，在python终端输入：

>>> import tensorflow as tf
>>> matrix1 = tf.constant([[3., 3.]])     
>>> matrix2 = tf.constant([[2.],[2.]])    
>>> product = tf.matmul(matrix1, matrix2) 
>>> sess = tf.Session()

输出的信息，如果有你的显卡信息，则说明你的tensorflow GPU 版本安装成功了!!!

9.踩过的坑

issue1

ERROR: Config value cuda is not defined in any .rc file

问题原因：bazel版本过高。
解决办法：tf1.8的构建版本时需要使用bazel 0.10，断换成了bazel 0.10版本构建问题解决。
备注：更换bazel版本后，使用bazel clean --expunge 清理之前编译痕迹，重新输入编译命令

issue2

Cannot find libdevice.10.bc under /usr/local/cuda-8.0

解决办法：将/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_50.10.bc
改为libdevice.10.bc，并复制一份至/usr/local/cuda-8.0/

issue3

/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11260): error: argument of type "void *" is incompatible with parameter of type "int *"
/usr/lib/gcc/x86_64-linux-gnu/5/include/avx512vlintrin.h(11269): error: argument of type "void *" is incompatible with parameter of type "long long *"
...

由于gcc 版本 5.5导致
解决办法：

sudo apt-get install gcc-4.9 g++-4.9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 50 --slave /usr/bin/g++ g++ /usr/bin/g++-4.9
sudo update-alternatives --config gcc

选择4.9版本编译。

issue 4

./tensorflow/core/kernels/gather_functor_gpu.cu.h(57): error: calling a __host__ function("__builtin_
expect") from a __global__ function("tensorflow::GatherOpKernel<double, int, (bool)1> ") is not allow
ed

./tensorflow/core/kernels/gather_functor_gpu.cu.h(57): error: calling a __host__ function("__builtin_
expect") from a __global__ function("tensorflow::GatherOpKernel< ::std::complex<float> , int, (bool)1
> ") is not allowed

解决办法：
Please change ./tensorflow/core/platform/macros.h
the change from

#if TF_HAS_BUILTIN(__builtin_expect) || (defined(__GNUC__) && __GNUC__ >= 3)

#if (!defined(__NVCC__)) && (TF_HAS_BUILTIN(__builtin_expect) || (defined(__GNUC__) && __GNUC__ >= 3))

in the case of --config=cuda, it seems that nvcc doesn’t recognize __builtin_expect
as a compiler builtin, and simply assumes it’s some function defined for the host, leading to the compilation error above.

issue 4

AttributeError: 'int' object attribute '__doc__' is read-only
Target //tensorflow/tools/pip_package:build_pip_package failed to build
...
ERROR: /data/lirong/py2/tensorflow/tensorflow/tools/api/generator/BUILD:27:1: Executing genrule //tensorflow/tools/api/generator:python_api_gen
failed (Exit 1)

解决

pip uninstall enum
apt-get install python-enum34

重新编译之后还会会报错，但是报的是issue 5了

issue 5

ImportError: No module named enum
Target //tensorflow/tools/pip_package:build_pip_package failed to build

解决方法

pip uninstall enum
pip install enum34

10.其他

bazel build编译的时候，如果遇到各种问题。
command会提示Use --verbose_failures to see the command lines of failed build steps.，
我之前一直没有注意，直到加了–verbose_failures这个条命令:
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package --verbose_failures
试过，才发现完整的报错，然后google一下找解决办法。