环境安装
- Cuda8.0+Tensorflow11.0+cuDNN5.1.5+Bazel0.3.2+GCC4.9
- !!!!版本最好跟我的一样,其他有点不一样就很容易失败
- cuda和cudnn看另一篇帖子http://blog.csdn.net/cq361106306/article/details/52450907 或者自己找
- Bazel : http://bazel.io/docs/install.html
- 链接:https://github.com/tensorflow/tensorflow/archive/r0.11.zip
- 然后进入解压后的目录
cd tensorflow
./configure
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc nvcc should use as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0# 一定要选你安装的版本
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to use system default]: 5.1.5 # 一定要选你安装的版本
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
Setting up Cuda include
Setting up Cuda lib
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished
记住最好不要直接git clone最新版,因为这个框架比较新,每天都在更新,很容易出问题。比如我就遇到了某个库一直安装失败的坑爹情况
这个过程十分的久,最好开VPN翻墙
5. 如果出现了一些红色的Error。 就去本目录configure的本文文件打开找到
bazel clean --expunge
删除
然后在命令行
bazel fetch //tensorflow/...
然后反复运行直到没有报错。然后再重新./configure
5. 开始用bazel编译
# To build with GPU support:
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# The name of the .whl file will depend on your platform.
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.11.0rc1-py2-none-any.whl
测试
cd tensorflow/models/image/mnist
python convolutional.py
编译中遇到的坑
ERROR: /home/y/tensorflow-r0.11/tensorflow/core/kernels/BUILD:1096:1: C++ compilation of rule '//tensorflow/core/kernels:svd_op' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
..
gcc: internal compiler error: Killed (program cc1plus) (这里表示内存不足)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
Target //tensorflow/cc:tutorials_example_trainer failed to build
参考内存不足
https://github.com/tensorflow/tensorflow/issues/349
编译莫名其妙报ERROR-重新使用下面的
bazel build -c opt --config=cuda --spawn_strategy=standalone --verbose_failures --local_resources 2048,.5,1.0 //tensorflow/tools/pip_package:build_pip_package
这里spawn_… local_resources 之类的防止某些error. 如果这次通过。则继续上面的步骤
最终happy的结果应该是(可能会卡住,其实是在下载东西):
ensorflow-r0.11$ cd tensorflow/models/image/mnist
y@y:~/tensorflow-r0.11/tensorflow/models/image/mnist$ python convolutional.py
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.41GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Initialized!
分布式Demo
题外话,tensorflow的分布式支持只封装了底层通信机制,能让我们像用单机那样使用分布式的训练
首先ifconfig查下本机192.168开头的ip作为单机多进程模拟分布式
然后下面只有一个文件
#coding=utf-8
import numpy as np
import tensorflow as tf
# Define parameters
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_float('learning_rate', 0.00003, 'Initial learning rate.')
tf.app.flags.DEFINE_integer('steps_to_validate', 1000,
'Steps to validate and print loss')
# For distributed
tf.app.flags.DEFINE_string("ps_hosts", "",
"Comma-separated list of hostname:port pairs")
tf.app.flags.DEFINE_string("worker_hosts", "",
"Comma-separated list of hostname:port pairs")
tf.app.flags.DEFINE_string("job_name", "", "One of 'ps', 'worker'")
tf.app.flags.DEFINE_integer("task_index", 0, "Index of task within the job")
# Hyperparameters
learning_rate = FLAGS.learning_rate
steps_to_validate = FLAGS.steps_to_validate
def main(_):
ps_hosts = FLAGS.ps_hosts.split(",")
worker_hosts = FLAGS.worker_hosts.split(",")
cluster = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts})
server = tf.train.Server(cluster,job_name=FLAGS.job_name,task_index=FLAGS.task_index)
if FLAGS.job_name == "ps":
server.join()
elif FLAGS.job_name == "worker":
with tf.device(tf.train.replica_device_setter(
worker_device="/job:worker/task:%d" % FLAGS.task_index,
cluster=cluster)):
global_step = tf.Variable(0, name='global_step', trainable=False)
input = tf.placeholder("float")
label = tf.placeholder("float")
weight = tf.get_variable("weight", [1], tf.float32, initializer=tf.random_normal_initializer())
biase = tf.get_variable("biase", [1], tf.float32, initializer=tf.random_normal_initializer())
pred = tf.mul(input, weight) + biase
loss_value = loss(label, pred)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss_value, global_step=global_step)
init_op = tf.initialize_all_variables()
saver = tf.train.Saver()
tf.scalar_summary('cost', loss_value)
summary_op = tf.merge_all_summaries()
sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0),
logdir="./checkpoint/",
init_op=init_op,
summary_op=None,
saver=saver,
global_step=global_step,
save_model_secs=60)
with sv.managed_session(server.target) as sess:
step = 0
while step < 1000000:
train_x = np.random.randn(1)
train_y = 2 * train_x + np.random.randn(1) * 0.33 + 10
_, loss_v, step = sess.run([train_op, loss_value,global_step], feed_dict={input:train_x, label:train_y})
if step % steps_to_validate == 0:
w,b = sess.run([weight,biase])
print("step: %d, weight: %f, biase: %f, loss: %f" %(step, w, b, loss_v))
sv.stop()
def loss(label, pred):
return tf.square(label - pred)
if __name__ == "__main__":
tf.app.run()
#ps 节点执行:
CUDA_VISIBLE_DEVICES='' python distribute.py --ps_hosts=192.168.1.100:2222 --worker_hosts=192.168.1.100:2224,192.168.1.100:2225 --job_name=ps --task_index=0
#worker 节点执行:
CUDA_VISIBLE_DEVICES=0 python distribute.py --ps_hosts=192.168.1.100:2222 --worker_hosts=192.168.1.100:2224,192.168.1.100:2225 --job_name=worker --task_index=0
CUDA_VISIBLE_DEVICES='' python distribute.py --ps_hosts=192.168.1.100:2222 --worker_hosts=192.168.1.100:2224,192.168.1.100:2225 --job_name=worker --task_index=1
要按顺序来,ps是参数服务器,可以是多个。worker就是训练集群。
CUDA_VISIBLE_DEVICES=0 表示使用CUDA
CUDA_VISIBLE_DEVICES=‘’ 表示使用CPU