Deep learning from entry to give up 2 of --- Data Section (queue generation)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/LEEANG121/article/details/102535662

Deep learning from entry to give up 2 of --- Data Section (queue generation)


As mentioned in the previous article, the data processing is one of the core priorities of deep learning, especially for tensorflow this distributed processing framework specifically for purposes of data analysis, data processing is a very important part, the first step is the depth of learning.
This note will be introduced tensorflow queue forms of data input and reading, data processing csv format and create and read documents TFRecords work, three different data processing, we introduce three points, to facilitate understanding.

Wang Xiaohua deep learning code in this article from the series of real and computer vision

Tensorflow queue
queue (Queue) is one of the most common data input and output, which provides a linear data structure in a FIFO (the same line as it), only one end is responsible for increasing the data elements in the queue, and the other end is responsible for data output and delete work. Usually we will be able to increase the queue data elements become the tail, while the output end of the first team called and deleted.
Similarly to python, tensorflow queue is equally applicable as a basic input and output data, the new data is automatically inserted into the tail, the head of the queue and output data is automatically deleted. In tensorflow queue is a kind of state of the node position, the other nodes as a change of state, the queue 'node' is also changed in the state in FIG.
tensorflow queue uses the function

operating description
tf.dequeue(name = None) The element is removed from the queue. If the queue is empty at the time of execution of the operation, it will block until the elements out of the line, return the tensors columns tuple
tf.enqueue_many(vals, name = None) Zero or more elements incorporated in the queue
class tf.QueueBase The basic application class queue, a queue (Queue) is a data structure that is purely by tensors of steps, and to be enqueued tensors (the enqueue) and the column (dequeue) Operation
tf.enqueue(vals, name = None An element added to the queue. If the queue is full when you do this, you will know the blocking element incorporated into the queue
tf.dequeue_many(n, name = None) One or more elements removed from the queue
tf.size(name = None) Calculating the number of elements in the queue
tf.close Close the queue
tf.dequeue_up_to(n, name = None) N elements and removed from the queue of connections
tf.dtypes Data types listed elements
tf.from_list(index, queues) Creating a queue according queues [index] Reference queue
tf.name Returns the name of the queue bottom element
tf.names Returns the name of each queue component
class tf.FIFOQueue When the column in accordance with first in first out order
class tf.PaddingFIFOQueue A FIFOQueue, while according to the padding support variable length tensor batching
class tf.RandomShuffleQueue Random lists queue element

Example 1

import tensorflow as tf

with tf.Session() as sess:
   q = tf.FIFOQeue(5, 'float')#创建一个先入先出的数列,其中数据为5个,类型为浮点型
   init = q.enqueue_many(([1.0, 2.0, 3.0, 4.0, 5.0],))#填充数列,注意最后的逗号不能省略
   init2 = q.dequeue()#删除第一个数字
   init3 = q.enqueue(3.5)#在最后一行添加数字3.5
   
   #tensorflow中任何操作都是在‘会话’中进行的,因此上述所有的操作实际上并未执行,需要添加如下会话
   sess.run(init)
   sess.run(init2)
   sess.run(init3)
   
   quelen = sess.run(q.size())#size函数显示了队列中的元素数量
   for i in range(quelen):
      print(sess.run(q.dequeue()))#依次将队列中的元素打印出来
2.0
3.0
4.0
5.0
3.5

Not difficult to find from the results, the original first element of the queue dequeue function is removed, and finally add the element 3.5 in the queue.
Example of a queue in tensorflow example shows is how to achieve, but this operation will cause reads the input data and slower, imagine if we want to enter a lot of data, this is not necessarily operate.
tensorflow provides synchronization, that the main thread during training model when QueueRunner asynchronous operation function is used to solve the problem, it can create a series of threads into the main thread to operate, the data read operation and data while working read from the hard disk.

Example 2

import tensorflow as tf
with tf.Session() as sess: 
#创建一个先进先出的队列,队列包含1000个元素,采用浮点类型
   q = tf.FIFOQueue(1000, 'float32')
   counter = tf.Variable(0.0)#设置参数变量,初始值设为0.0,随着数据输入的进行,参数随时变化
   add_op = tf.assign_add(counter, tf.constant(1.0))#实现一个自增加操作,每次counter数值+1
   enqueueData_op = q.enqueue(counter)#新数据入列

#定义队列管理器op。指定多少个子线程,子线程该干什么事情(队列操作)
#这里实际创建了4个线程,两个增加计数,两个执行入队
   qr = tf.train.QueueRunner(q, enqueue_ops = [add_op, enqueueData_op]*2)
   sess.run(tf.global_variables_initializer())#变量执行初始化操作
   enqueue_threads = qr.create_threads(sess, start = True) #启动入队线程,start = True代表立刻开始

   for i in range(5):
       print(sess.run(q.dequeue())#执行十个循环之后关闭会话(这是程序报错的原因)

The above-described procedure first creates a data processing function, an integer counter variable which is superimposed to the operation add_op. In order to perform this operation, qr created a queue manager, multi-threaded calls to accomplish this task. create_threads function is used to start the thread.

6.0
24.0
44.0
54.0
60.0
E1013 17:36:15.699034 16972 queue_runner_impl.py:275] Exception in QueueRunner: Session has been closed.
E1013 17:36:15.705018 19704 queue_runner_impl.py:275] Exception in QueueRunner: Session has been closed.

We can see the first five cycles program can run normally, after the error is: queue manager attempts to close the session, the end of the cycle.
We can change a code form:

Example 3

import tensorflow as tf
q = tf.FIFOQueue(1000, 'float32')
counter = tf.Variable(0.0)#设置参数变量,初始值设为0.0,随着数据输入的进行,参数随时变化
add_op = tf.assign_add(counter, tf.constant(1.0))#实现一个自增加操作,每次counter数值+1
enqueueData_op = q.enqueue(counter)#新数据入列

sess = tf.Session()#执行会话的内容由第一句调整到这里,这样就会让前面的操作不同步
qr = tf.train.QueueRunner(q, enqueue_ops = [add_op, enqueueData_op]*2)
sess.run(tf.global_variables_initializer())#变量执行初始化操作
enqueue_threads = qr.create_threads(sess, start = True)

for i in range(5):
   print(sess.run(q.dequeue()))

In this case run the code again, the system is no longer being given. This is because the program does not end after the cycle is complete, but is suspended.
Note: Tensorflow generally encountered in the case of a program hang means data input is not synchronized with a pretreatment, i.e., he was not required data are input to the data queue, so that the overall thread will be suspended. Tf being given at this time but will not be in a wait state.

Example 4
As can be seen from the above example, Tensorflow session support multithreading, multiple threads can easily work together under one session, each parallel execution. But the program can be found through the demo, this synchronization can cause a thread when you want to shut down the dialogue, the dialogue is forced to close, the unfinished work of the thread is also forced to close.
Tensorflow order to solve the problem of multi-threading, provides Coordinator and QueueRunner function thread control and coordination. In use, these two functions must be like to work at the same time, work together to stop all conversation threads, and would like to report the program in all the work waiting thread termination.

import tensorflow as tf

#with tf.Session() as sess:#标注1
    q = tf.FIFOQueue(1000, 'float32')
    counter = tf.Variable(0.0)
    add_op = tf.assign_add(counter, tf.constant(1.0))
    enqueueData_op = q.enqueue(counter)
    sess = tf.Session()#标注2,此处添加标注

    qr = tf.train.QueueRunner(q, enqueue_ops = [add_op, enqueueData_op] * 2)
    sess.run(tf.global_variables_initializer())
    enqueue_threads = qr.create_threads(sess, start = True) #启动入队线程
    coord = tf.train.Coordinator()#标注3
    enqueue_threads = qr.create_threads(sess, coord = coord, start = True)#标注4

    for i in range(5):
        print(sess.run(q.dequeue()))
    coord.request_stop()#标注5
    coord.join(enqueue_threads)#标注6    

Referred to above code QueueRunner queue manager, Coordinator thread-coordinator. Here to do a thinking:
1, when we remove the label at the 1, add annotations at 2, when the rest of the same, the results will still show thread closes error, why after returning five elements?
2, marked at maintaining a constant, the 3,5,6 add annotations, will coord = coord function at 4 removed, and the results now, why?

Next time we will explain in detail the question reads queue for more than two questions, we welcome the discussion ~ ~ ~

Guess you like

Origin blog.csdn.net/LEEANG121/article/details/102535662