在TensorFlow中使用queue

TensorFlow提供了一个队列机制，通过多线程将读取数据与计算数据分开。因为在处理海量数据集的训练时，无法把数据集一次全部载入到内存中，需要一边从硬盘中读取数据，一边进行训练计算。

对于建立队列读取文件部分代码，已经在cifar10_input.py里实现了。这里讲解内部机制以及如何使用。

一队列的启动和挂起机制

import  cifar10_input
import tensorflow as tf
import pylab
#取数据
batch_size = 12
data_dir = '/tmp/cifar10_data/cifar-10-batches-bin'
images_test, labels_test = cifar10_input.inputs(eval_data = True, data_dir = data_dir, batch_size = batch_size)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
'''
将下面一行注释掉再运行，发现程序不动了，这时处于一个挂起状态，start_queue_runners的作用是启动线程，向队列里面写数据。
tf.train.start_queue_runners 这个函数将会启动输入管道的线程，填充样本到队列中，以便出队操作可以从队列中拿到样本。
'''
#tf.train.start_queue_runners()
'''
之所以挂起，源于下面这句代码，这句话的意思是拿出指定批次的数据。但是队列里没有数据，所以程序进入挂起等待状态。
'''
image_batch, label_batch = sess.run([images_test, labels_test])
print("__\n",image_batch[0])
print("__\n",label_batch[0])
pylab.imshow(image_batch[0])
pylab.show()

二在session内部的退出机制

修改代码如下：

import  cifar10_input
import tensorflow as tf
import pylab
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    tf.train.start_queue_runners()
    image_batch, label_batch = sess.run([images_test, labels_test])
    print("__\n",image_batch[0])
    
    print("__\n",label_batch[0])
    pylab.imshow(image_batch[0])
    pylab.show()

再次运行程序，发现虽然程序能够正常运行，但是结束后会报错，输出如下信息：

ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled

[[Node: batch/fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, div, Cast)]]

ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled

[[Node: batch_1/fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch_1/fifo_queue, div_1, Cast_4)]]

ERROR:tensorflow:Exception in QueueRunner: Enqueue operation was cancelled

[[Node: batch/fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, div, Cast)]]

......

原因就是带有with语法的session是自动关闭的。当运行结束后session自动关闭的同时会把里面所有的操作都关闭，而此时队列还在等待另外一个进程往里面写数据，所以就会出现错误。

这种情况下解决方法有两种：

1 使用session的创建方式，使用如下代码实现。

sess = tf.InteractiveSession()

2 可以在原来代码中去掉with语句，修改成如下代码

sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
tf.train.start_queue_runners(sess=sess)
image_batch, label_batch = sess.run([images_test, labels_test])
print("__\n",image_batch[0])
print("__\n",label_batch[0])
pylab.imshow(image_batch[0])
pylab.show()

上面代码在单例程序中运行没有问题，资源会随着程序关闭而整体销毁。但如果在复杂代码中，需要某个线程自动关闭，而不是依赖进程的结束而销毁，这种情况下需要使用tf.train.Coordinator函数来创建一个协调器，以信号量的方式来协调线程间的关系，完成线程间的同步。

三参考

https://blog.csdn.net/sunquan_ok/article/details/51832442

https://blog.csdn.net/lujiandong1/article/details/53369961

在TensorFlow中使用queue

猜你喜欢