Tensorflow[源码安装时bazel行为解析]

0. 引言

通过源码方式安装，并进行一定程度的解读，有助于理解tensorflow源码，本文主要基于tensorflow v1.8源码，并借鉴于如何阅读TensorFlow源码.

首先，自然是需要去bazel官网了解下必备知识，如(1)什么是bazel; (2)bazel如何对cpp项目进行构建的; (3)bazel构建时候的函数大全。然后就是bazel官网的一些其他更细节部分了。下文中会给出超链接。

ps: 找了很久，基本可以确定bazel除了官网是没有如书籍等资料出现的，所以只有官网和别人博客这2个途径进行学习了解
因为bazel官网很多链接不在左边的导航中，所以推荐直接将整个网站镜像下来

wget -m -c -x -np -k -E -p https://docs.bazel.build/versions/master/bazel-overview.html

1. 从源码编译tensorflow

如下图所示：

图1.1 github上tensorflow v1.8源码目录

1.1 先配置

源代码树的根目录中包含了一个名为 configure 的 bash 脚本。此脚本会要求您确定所有相关 TensorFlow 依赖项的路径名，并指定其他构建配置选项，例如编译器标记。您必须先运行此脚本，然后才能创建 pip 软件包并安装 TensorFlow
然后是运行该configure

./configure

$ cd tensorflow  # cd to the top-level directory created
$ ./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7    # python解释器路径
Found possible Python library paths:
 /usr/local/lib/python2.7/dist-packages
 /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python2.7/dist-packages]       # python 库路径

Using python library path: /usr/local/lib/python2.7/dist-packages
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:    # 是否在编译期间启用优化
Do you wish to use jemalloc as the malloc implementation? [Y/n]        # 是否将 jemalloc 作为malloc的实现
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]    # 是否开启google云平台支持
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]    # 是否开启hdfs的支持
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]    # 是否启用尚在实验性质的XLA jit编译
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]    # 是否开启VERBS支持
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]    # 是否开启OpenCL支持
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] Y    # 是否开启CUDA支持
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]    # 是否将clang作为CUDA的编译器
nvcc will be used as CUDA compiler
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.0    # 选择cuda版本
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:    # 告知cuda的安装路径
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:    # 指定host侧的 编译器
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7    # cuDNN版本
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:    # 告知cuDNN 的安装路径
Please specify a list of comma-separated CUDA compute capabilities you want to build with.     # 告知当前机器上GPU的计算力
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.    
Please note that each additional compute capability significantly increases your build time and binary size.

Do you wish to build TensorFlow with MPI support? [y/N]    # 是否开启MPI支持
MPI support will not be enabled for TensorFlow
Configuration finished

我们先来看看configure到底做了什么事情，

#!/usr/bin/env bash

set -e
set -o pipefail

if [ -z "$PYTHON_BIN_PATH" ]; then
  PYTHON_BIN_PATH=$(which python || which python3 || true)
fi

# Set all env variables
CONFIGURE_DIR=$(dirname "$0")
"$PYTHON_BIN_PATH" "${CONFIGURE_DIR}/configure.py" "$@"    #  这行表明该configure文件是通过调用 对应的configure.py来完成配置过程的

echo "Configuration finished"

从configure.py的第1491行开始，发现如上述运行代码中展示的配置过程

  set_build_var(environ_cp, 'TF_NEED_JEMALLOC', 'jemalloc as malloc',
                'with_jemalloc', True)
  set_build_var(environ_cp, 'TF_NEED_GCP', 'Google Cloud Platform',
                'with_gcp_support', True, 'gcp')
  set_build_var(environ_cp, 'TF_NEED_HDFS', 'Hadoop File System',
                'with_hdfs_support', True, 'hdfs')
  set_build_var(environ_cp, 'TF_NEED_AWS', 'Amazon AWS Platform',
                'with_aws_support', True, 'aws')
  set_build_var(environ_cp, 'TF_NEED_KAFKA', 'Apache Kafka Platform',
                'with_kafka_support', True, 'kafka')
  set_build_var(environ_cp, 'TF_ENABLE_XLA', 'XLA JIT', 'with_xla_support',
                False, 'xla')
  set_build_var(environ_cp, 'TF_NEED_GDR', 'GDR', 'with_gdr_support',
                False, 'gdr')
  set_build_var(environ_cp, 'TF_NEED_VERBS', 'VERBS', 'with_verbs_support',
                False, 'verbs')

所以配置过程可以简单的理解，就是各种参数的收集，最后会有3个文件的时间信息更新（即生成或者修改的）：

其中.bazelrc内容如下:

import /mnt/d/tensorflow/tensorflow-master/.tf_configure.bazelrc

即导入的是在当前文件夹下新生成的文件.tf_configure.bazelrc，而该文件就纪录了配置

build --action_env PYTHON_BIN_PATH="/home/shouhuxianjian/anaconda3/bin/python"
build --action_env PYTHON_LIB_PATH="/home/shouhuxianjian/anaconda3/lib/python3.6/site-packages"
build --python_path="/home/shouhuxianjian/anaconda3/bin/python"
build --define with_jemalloc=true
build:gcp --define with_gcp_support=true
build:hdfs --define with_hdfs_support=true
build:aws --define with_aws_support=true
build:kafka --define with_kafka_support=true
build:xla --define with_xla_support=true
build:gdr --define with_gdr_support=true
build:verbs --define with_verbs_support=true
build --action_env TF_NEED_OPENCL_SYCL="0"
build --action_env TF_NEED_CUDA="0"
build --action_env TF_DOWNLOAD_CLANG="0"
build --define grpc_no_ares=true
build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
build --strip=always

其中的build:hdfs等形式等效于build --config=hdfs ，见这里的--config
上述在hdfs,gcp,aws,kafka选择时点击了N，如果点击Y则会变换成如下形式:

build --define with_gcp_support=true
build --define with_hdfs_support=true
build --define with_aws_support=true
build --define with_kafka_support=true

可以发现和

build --define with_jemalloc=true

一样了。而对于bazel而言，如果build:package形式，则编译时候会忽略该包(hdfs包中BUILD内容为：

# 文档在 tensorflow-master/third_party/hadoop/BUILD
package(default_visibility = ["//visibility:public"])

licenses(["notice"])  # Apache 2.0

exports_files(["LICENSE.txt"])

cc_library(
    name = "hdfs",
    hdrs = ["hdfs.h"],
)

所以下面真的调用bazel进行编译的时候，需要显示采用--config=opt来告知bazel，不要忽略opt这个package（这里是为了使用command:name中group这个特性）。

1.2 再bazel编译

如果只编译支持cpu的，敲如下代码

$ bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package

如果需要gpu支持的，敲如下代码：

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

在解读tensorflow-master/tensorflow/tools/pip_package/BUILD的时候，需要温习bazel构建时候的函数大全，还有官方推荐的BUILD文件结构格式File structure. 如下形式：

Package description (a comment)

All load() statements

The package() function.

Calls to rules and macros

在下面的tensorflow/tools/pip_package/BUILD文件中，你可以看到package，load，transitive_hdrs，生成python的pb_binary，内部变量COMMON_PIP_DEPS，filegroup，生成shell的sh_binary和genrule等等。

# Description:
#  Tools for building the TensorFlow pip package.
#  原型：package(default_deprecation, default_testonly, default_visibility, features)
#  此函数声明适用于包中每个后续规则的元数据。 它最多只能在一个包（BUILD文件）中使用一次。
#  此函数应该出现文件顶部，在所有load（）语句之后，任何规则之前的范围内，调用package（）函数。
#  [package](https://docs.bazel.build/versions/master/be/functions.html#package)
# private表示后续的规则默认情况下只能在当前包内可见 https://docs.bazel.build/versions/master/be/common-definitions.html#common-attributes
package(default_visibility = ["//visibility:private"])    

# Bazel的扩展是以.bzl结尾的文件。 通过使用load语句从可以从bazel的扩展文件中导入对应符号到当前BUILD中使用。
# [load](https://docs.bazel.build/versions/master/skylark/concepts.html)
load(
    "//tensorflow:tensorflow.bzl",
    "if_not_windows",
    "if_windows",
    "transitive_hdrs",
)
load("//third_party/mkl:build_defs.bzl", "if_mkl")
load("//tensorflow:tensorflow.bzl", "if_cuda")
load("@local_config_tensorrt//:build_defs.bzl", "if_tensorrt")
load("//tensorflow/core:platform/default/build_config_root.bzl", "tf_additional_license_deps")

# This returns a list of headers of all public header libraries (e.g.,
# framework, lib), and all of the transitive dependencies of those
# public headers.  Not all of the headers returned by the filegroup
# are public (e.g., internal headers that are included by public
# headers), but the internal headers need to be packaged in the
# pip_package for the public headers to be properly included.
#
# Public headers are therefore defined by those that are both:
#
# 1) "publicly visible" as defined by bazel
# 2) Have documentation.
#
# This matches the policy of "public" for our python API.
transitive_hdrs(
    name = "included_headers",
    deps = [
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//tensorflow/core:protos_all_cc",
        "//tensorflow/core:stream_executor",
        "//third_party/eigen3",
    ] + if_cuda([
        "@local_config_cuda//cuda:cuda_headers",
    ]),
)

py_binary(
    name = "simple_console",
    srcs = ["simple_console.py"],
    srcs_version = "PY2AND3",
    deps = ["//tensorflow:tensorflow_py"],
)

COMMON_PIP_DEPS = [
    ":licenses",
    "MANIFEST.in",
    "README",
    "setup.py",
    ":included_headers",
    "//tensorflow:tensorflow_py",
    "//tensorflow/contrib/autograph:autograph",
    "//tensorflow/contrib/autograph/converters:converters",
    "//tensorflow/contrib/autograph/converters:test_lib",
    "//tensorflow/contrib/autograph/impl:impl",
    "//tensorflow/contrib/autograph/pyct:pyct",
    "//tensorflow/contrib/autograph/pyct/static_analysis:static_analysis",
    "//tensorflow/contrib/boosted_trees:boosted_trees_pip",
    "//tensorflow/contrib/cluster_resolver:cluster_resolver_pip",
    "//tensorflow/contrib/data/python/kernel_tests:dataset_serialization_test",
    "//tensorflow/contrib/data/python/ops:contrib_op_loader",
    "//tensorflow/contrib/eager/python/examples:examples_pip",
    "//tensorflow/contrib/eager/python:checkpointable_utils",
    "//tensorflow/contrib/eager/python:evaluator",
    "//tensorflow/contrib/gan:gan",
    "//tensorflow/contrib/graph_editor:graph_editor_pip",
    "//tensorflow/contrib/keras:keras",
    "//tensorflow/contrib/labeled_tensor:labeled_tensor_pip",
    "//tensorflow/contrib/nn:nn_py",
    "//tensorflow/contrib/predictor:predictor_pip",
    "//tensorflow/contrib/proto:proto_pip",
    "//tensorflow/contrib/receptive_field:receptive_field_pip",
    "//tensorflow/contrib/rpc:rpc_pip",
    "//tensorflow/contrib/session_bundle:session_bundle_pip",
    "//tensorflow/contrib/signal:signal_py",
    "//tensorflow/contrib/signal:test_util",
    "//tensorflow/contrib/slim:slim",
    "//tensorflow/contrib/slim/python/slim/data:data_pip",
    "//tensorflow/contrib/slim/python/slim/nets:nets_pip",
    "//tensorflow/contrib/specs:specs",
    "//tensorflow/contrib/summary:summary_test_util",
    "//tensorflow/contrib/tensor_forest:init_py",
    "//tensorflow/contrib/tensor_forest/hybrid:hybrid_pip",
    "//tensorflow/contrib/timeseries:timeseries_pip",
    "//tensorflow/contrib/tpu",
    "//tensorflow/examples/tutorials/mnist:package",
    "//tensorflow/python:distributed_framework_test_lib",
    "//tensorflow/python:meta_graph_testdata",
    "//tensorflow/python:spectral_ops_test_util",
    "//tensorflow/python:util_example_parser_configuration",
    "//tensorflow/python/debug:debug_pip",
    "//tensorflow/python/eager:eager_pip",
    "//tensorflow/python/kernel_tests/testdata:self_adjoint_eig_op_test_files",
    "//tensorflow/python/saved_model:saved_model",
    "//tensorflow/python/tools:tools_pip",
    "//tensorflow/python:test_ops",
    "//tensorflow/tools/dist_test/server:grpc_tensorflow_server",
]

# On Windows, python binary is a zip file of runfiles tree.
# Add everything to its data dependency for generating a runfiles tree
# for building the pip package on Windows.
py_binary(
    name = "simple_console_for_windows",
    srcs = ["simple_console_for_windows.py"],
    data = COMMON_PIP_DEPS,
    srcs_version = "PY2AND3",
    deps = ["//tensorflow:tensorflow_py"],
)

filegroup(
    name = "licenses",
    data = [
        "//third_party/eigen3:LICENSE",
        "//third_party/fft2d:LICENSE",
        "//third_party/hadoop:LICENSE.txt",
        "@absl_py//absl/flags:LICENSE",
        "@arm_neon_2_x86_sse//:LICENSE",
        "@astor_archive//:LICENSE",
        "@aws//:LICENSE",
        "@boringssl//:LICENSE",
        "@com_google_absl//:LICENSE",
        "@com_googlesource_code_re2//:LICENSE",
        "@cub_archive//:LICENSE.TXT",
        "@curl//:COPYING",
        "@eigen_archive//:COPYING.MPL2",
        "@farmhash_archive//:COPYING",
        "@fft2d//:fft/readme.txt",
        "@flatbuffers//:LICENSE.txt",
        "@gast_archive//:PKG-INFO",
        "@gemmlowp//:LICENSE",
        "@gif_archive//:COPYING",
        "@grpc//:LICENSE",
        "@highwayhash//:LICENSE",
        "@jemalloc//:COPYING",
        "@jpeg//:LICENSE.md",
        "@kafka//:LICENSE",
        "@libxsmm_archive//:LICENSE",
        "@lmdb//:LICENSE",
        "@local_config_nccl//:LICENSE",
        "@local_config_sycl//sycl:LICENSE.text",
        "@grpc//third_party/nanopb:LICENSE.txt",
        "@grpc//third_party/address_sorting:LICENSE",
        "@nasm//:LICENSE",
        "@nsync//:LICENSE",
        "@pcre//:LICENCE",
        "@png_archive//:LICENSE",
        "@protobuf_archive//:LICENSE",
        "@six_archive//:LICENSE",
        "@snappy//:COPYING",
        "@swig//:LICENSE",
        "@termcolor_archive//:COPYING.txt",
        "@zlib_archive//:zlib.h",
        "@org_python_pypi_backports_weakref//:LICENSE",
    ] + if_mkl([
        "//third_party/mkl:LICENSE",
    ]) + tf_additional_license_deps(),
)

# 对应的shell二进制规则，其中涉及到了select
# [select](https://docs.bazel.build/versions/master/skylark/lib/globals.html#select)
# [select](https://docs.bazel.build/versions/master/be/functions.html#select)
sh_binary(
    name = "build_pip_package",
    srcs = ["build_pip_package.sh"],
    data = select({
        "//tensorflow:windows": [":simple_console_for_windows"],
        "//tensorflow:windows_msvc": [":simple_console_for_windows"],
        "//conditions:default": COMMON_PIP_DEPS + [
            ":simple_console",
            "//tensorflow/contrib/lite/python:interpreter_test_data",
            "//tensorflow/contrib/lite/python:tf_lite_py_pip",
            "//tensorflow/contrib/lite/toco:toco",
            "//tensorflow/contrib/lite/toco/python:toco_wrapper",
            "//tensorflow/contrib/lite/toco/python:toco_from_protos",
        ],
    }) + if_mkl(["//third_party/mkl:intel_binary_blob"]) + if_tensorrt([
        "//tensorflow/contrib/tensorrt:init_py",
    ]),
)

# A genrule for generating a marker file for the pip package on Windows
#
# This only works on Windows, because :simple_console_for_windows is a
# python zip file containing everything we need for building the pip package.
# However, on other platforms, due to https://github.com/bazelbuild/bazel/issues/4223,
# when C++ extensions change, this generule doesn't rebuild.
genrule(
    name = "win_pip_package_marker",
    srcs = if_windows([
        ":build_pip_package",
        ":simple_console_for_windows",
    ]),
    outs = ["win_pip_package_marker_file"],
    cmd = select({
        "//conditions:default": "touch $@",
        "//tensorflow:windows": "md5sum $(locations :build_pip_package) $(locations :simple_console_for_windows) > $@",
    }),
    visibility = ["//visibility:public"],
)

因编译命令显式的编译build_pip_package，对应上述文件中的sh_binary。sh_binary在这里的主要作用是生成data的这些依赖。
其中主要关注是data dependencies

参考资料：