tensorflow 一个Nan问题

学习cifar10的相关代码,遇到以下问题:
Traceback (most recent call last):
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: softmax_linear/weights/gradients
	 [[Node: softmax_linear/weights/gradients = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_linear/weights/gradients/tag, gradients/AddN/_211)]]
	 [[Node: GradientDescent/update/_232 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_511_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 119, in <module>
    tf.app.run()
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 115, in main
    train()
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 107, in train
    mon_sess.run(train_op)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 567, in run
    run_metadata=run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1043, in run
    run_metadata=run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1134, in run
    raise six.reraise(*original_exc_info)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1119, in run
    return self._sess.run(*args, **kwargs)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1191, in run
    run_metadata=run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 971, in run
    return self._sess.run(*args, **kwargs)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: softmax_linear/weights/gradients
	 [[Node: softmax_linear/weights/gradients = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_linear/weights/gradients/tag, gradients/AddN/_211)]]
	 [[Node: GradientDescent/update/_232 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_511_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'softmax_linear/weights/gradients', defined at:
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 119, in <module>
    tf.app.run()
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 115, in main
    train()
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10_train.py", line 71, in train
    train_op = cifar10.train(loss, global_step)
  File "/home/yangguang/machineLearning/learn_machineLearning/Tensorflow_learning/CIFAR10/cifar10.py", line 343, in train
    tf.summary.histogram(var.op.name + '/gradients', grad)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/summary/summary.py", line 203, in histogram
    tag=tag, values=values, name=scope)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 283, in histogram_summary
    "HistogramSummary", tag=tag, values=values, name=name)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/yangguang/machineLearning/venv4ML/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Nan in summary histogram for: softmax_linear/weights/gradients
	 [[Node: softmax_linear/weights/gradients = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](softmax_linear/weights/gradients/tag, gradients/AddN/_211)]]
	 [[Node: GradientDescent/update/_232 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_511_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

弄了很久发现是个很蠢的原因,cifar10数据集我是自己从网上下载的,解压之后数据集文件的文件名为data_batch_1(是binary文件但是没有后缀),于是我把代码里的文件名的后缀bin去了,否则无法运行,然后就出来以上错误。后来,我把下载的数据集删了,让代码自己从网上下载和解压,得到的数据集的文件名是data_batch_1.bin(有后缀),然后再运行train文件,发现可以运行了。找这个问题找了3个多小时,想死。。。

猜你喜欢

转载自blog.csdn.net/ygfrancois/article/details/80404541