The data set used here is still CIFAR-10. Since I wrote an article on the classification of the CIFAR data set using AlexNet before, this data set has been introduced in detail. At that time, we directly downloaded the data files of these pictures, and then Use pickle for deserialization to obtain data. For details, please refer to here: Section 16, AlexNet Network Implementation of Convolutional Neural Networks (6)
Similar to MNIST, TensorFlow also has a code file for downloading and importing the CIFAR dataset. The difference is that since TensorFlow 1.0, the Models module inside has been separated, and the code for separating and importing the CIFAR dataset is in the models, so Go to TensorFlow's GitHub website to download it first. Click the download link to start the download .
1. Using queue in TensorFlow
TensorFlow provides a queue mechanism that separates reading data from computing data through multithreading. Because when dealing with training of massive data sets, it is impossible to load all the data sets into the memory at one time, and it needs to be read from the hard disk while training. In order to speed up the training speed, we can use multiple threads to read the data, one Threads consume data.
Let's briefly introduce the concepts and usages related to Queue in TensorFlow. Click the original text for details .
In fact, there are only three concepts:
Queue
Is the implementation of TF queue and cache mechanismQueueRunner
It is the encapsulation of the thread that operates the Queue in TFCoordinator
It is a tool used to coordinate the running of threads in TF
Although they often appear together, these three things can be used separately in TensorFlow, so let's look at them separately.
1.Queue
According to different implementation methods, it is divided into several specific types, such as:
- tf.FIFOQueue : Queue to dequeue in enqueue order
- tf.RandomShuffleQueue : Queue to dequeue in random order
- tf.PaddingFIFOQueue : A queue that is dequeued in batches of fixed length
- tf.PriorityQueue : Queue with priority dequeue
- ... ...
Except for the different properties of these types of Queues, the methods of creating and using them are basically the same.
Create function parameters:
tf.FIFOQueue(capacity, dtypes, shapes=None, names=None, shared_name=None, name="fifo_queue")
#Created graph: a first-in-first-out queue, and initialization, dequeue, +1, enqueue operations q = tf.FIFOQueue(3, " float " ) init = q.enqueue_many(([0.1, 0.2, 0.3],)) x = q.dequeue() y = x + 1 q_inc = q.enqueue([y]) #Open a session, session is a session, the potential meaning of session is state retention, the state of various tensor is maintained with tf.Session() as sess: sess.run(init) for i in range(2): sess.run(q_inc) quelen = sess.run(q.size()) for i in range(quelen): print (sess.run(q.dequeue()))
2. QueueRunner
In the previous example, the enqueue operation is performed in the main thread, and multiple threads can run together in the Session. In the application scenario of data input, the enqueue operation is read from the hard disk, and the enqueue operation is to read the input from the hard disk and put it in the memory, which is slow. Use QueueRunner
can create a series of new threads for enqueuing operations, and let the main thread continue to use the data. If in the scenario of training a neural network, the training of the network and the reading of data are asynchronous, the main thread is training the network, and another thread is reading the data from the hard disk into the memory.
''' Use of QueueRunner() ''' q = tf.FIFOQueue(10, "float") counter = tf.Variable(0.0) #Counter #Add one to the counter increment_op = tf.assign_add(counter, 1.0 ) #Add the counter to the queue enqueue_op = q.enqueue(counter) #Create QueueRunner #Add data to the queue with multiple threads # There are actually 4 threads created here, two increase the count, and two execute the queue qr = tf.train.QueueRunner(q, enqueue_ops=[increment_op, enqueue_op] * 2 ) #Main thread with tf.Session() as sess: sess.run(tf.initialize_all_variables()) #Start enqueue_threads = qr.create_threads(sess, start= True) #Main thread for i in range(10 ): print (sess.run(q.dequeue()))
The result can be output correctly, but an error will be reported at the end, ERROR:tensorflow:Exception in QueueRunner: Session has been closed. That is to say, when the loop ends, the Session will be closed automatically, which is equivalent to the end of the main function.
''' Use of QueueRunner() ''' q = tf.FIFOQueue(10, "float") counter = tf.Variable(0.0) #Counter #Add one to the counter increment_op = tf.assign_add(counter, 1.0 ) #Add the counter to the queue enqueue_op = q.enqueue(counter) #Create QueueRunner #Add data to the queue with multiple threads # There are actually 4 threads created here, two increase the count, and two execute the queue qr = tf.train.QueueRunner(q, enqueue_ops=[increment_op, enqueue_op] * 2 ) ''' #main thread with tf.Session() as sess: sess.run(tf.initialize_all_variables()) #Start enqueue thread enqueue_threads = qr.create_threads(sess, start=True) #main thread for i in range(10): print (sess.run(q.dequeue())) ''' #Main thread sess = tf.Session() sess.run(tf.initialize_all_variables()) #Start enqueue thread enqueue_threads = qr.create_threads(sess, start= True) #Main thread for i in range(0, 10 ): print (sess.run(q.dequeue()))
If you do not use with tf.Session, then the Session will not be closed automatically.
It is not the 1, 2, 3, 4 we envisioned. The essential reason is that the process that increases the count will run in the background, and the process that executes the queue will be executed 10 times (because the queue length is only 10), and then the main thread will start to consume. Data, when part of the data is consumed, the enqueued process will start executing again. Finally, the main thread stops after consuming 10 data, but other threads continue to run, and the program will not end.
Experience: Because tensorflow performs calculations on the graph, to drive a graph for calculation, data must be sent. If the data is not sent in, then sess.run() cannot be executed, and tf will not actively report an error. It is prompted that no data is sent in. In fact, tf cannot actively report an error, because the training process of tf and the process of reading data are actually asynchronous. tf will hang forever, waiting for the data to be ready. The phenomenon is that the tf program does not report an error, but it does not move, similar to a hang.
''' Use of QueueRunner() ''' q = tf.FIFOQueue(10, "float") counter = tf.Variable(0.0) #Counter #Add one to the counter increment_op = tf.assign_add(counter, 1.0 ) #Add the counter to the queue enqueue_op = q.enqueue(counter) #Create QueueRunner #Add data to the queue with multiple threads # There are actually 4 threads created here, two increase the count, and two execute the queue qr = tf.train.QueueRunner(q, enqueue_ops=[increment_op, enqueue_op] * 2 ) #Main thread with tf.Session() as sess: sess.run(tf.initialize_all_variables()) #Start enqueue_threads = qr.create_threads(sess, start= True) #Main thread for i in range(10 ): print (sess.run(q.dequeue()))
In the above figure, the thread that generates the data is commented out, and the program will be stuck in sess.run(q.dequeue()), waiting for the arrival of the data. QueueRunner is used to start the queued thread.
3.Coordinator
Coordinator is a coordinator object used to save the running state of the thread group. It is not necessarily related to TensorFlow's Queue and can be used with Python threads alone. E.g:
''' Coordinator ''' import threading, time #Sub thread function def loop(coord, id): t = 0 while not coord.should_stop(): print(id) time.sleep(1) t += 1 #Only thread 1 calls the request_stop method if (t >= 2 and id == 0 ): coord.request_stop() #Main thread coord = tf.train.Coordinator() #Create 10 threads using the Python API threads = [threading.Thread(target=loop, args=(coord, i)) for i in range(10 )] #Start all threads and wait for the thread to end for t in threads: t.start() coord.join(threads)
Running this program, you will find that all sub-threads will stop after two cycles of execution, and the main thread will wait for all sub-threads to stop and end, thus ending the entire program. It can be seen that as long as any thread calls the Coordinator request_stop
method, all threads can should_stop
perceive and stop the current thread through the method.
Using QueueRunner and Coordinator together actually encapsulates this judgment operation, so that when any one of them is abnormal, the entire program can be ended normally, and the main thread can also directly call request_stop
the method to stop the execution of all child threads.
4.QueueRunner和Coordinator
There are two classic modes of using Queue in TensorFlow, both of which are used with QueueRunner and Coordinator.
The first is to explicitly create a QueueRunner and then call its create_threads
methods to start the thread. For example the following code:
''' With the use of ''' import numpy as np # 1000 4-dimensional input vectors, each value is a random number between 1-10 data = 10 * np.random.randn(1000, 4) + 1 # 1000 random target values, 0 or 1 target = np.random.randint(0, 2, size=1000 ) #Create a Queue, each item in the queue contains an input data and the corresponding target value queue = tf.FIFOQueue(capacity=50, dtypes=[tf.float32, tf.int32], shapes=[[4 ], []] ) #Batch enqueue data (this is an Operation) enqueue_op = queue.enqueue_many([data, target]) #dequeue data (this is a Tensor definition) data_sample, label_sample = queue.dequeue() #Create a QueueRunner with 4 threads qr = tf.train.QueueRunner(queue, [enqueue_op] * 4 ) with tf.Session() as sess: #Create Coordinator coord = tf.train.Coordinator() #Start the thread managed by QueueRunner enqueue_threads = qr.create_threads(sess, coord=coord, start= True) #Main thread, consume 100 data for step in range(100 ): if coord.should_stop(): break data_batch, label_batch = sess.run([data_sample, label_sample]) #The calculation of the main thread is completed, stop all data collection processes coord.request_stop() coord.join(enqueue_threads)
Second, use the global start_queue_runners
method to start the thread.
''' With the use of ''' #Open multiple files at the same time, showing the creation of Queue, and implicitly the creation of QueueRunner filename_queue = tf.train.string_input_producer([ " data1.csv " , " data2.csv " ]) reader = tf.TextLineReader(skip_header_lines=1 ) # Tensorflow's Reader object can directly accept a Queue as an input key, value = reader.read(filename_queue) with tf.Session() as sess: coord = tf.train.Coordinator() #Start all queue threads in the calculation graph threads = tf.train.start_queue_runners(coord= coord) #Main thread, consume 100 data for _ in range(100 ): features, labels = sess.run([data_batch, label_batch]) #The calculation of the main thread is completed, stop all data collection processes coord.request_stop() coord.join(threads)
In this example, tf.train.string_input_produecer
an implicit QueueRunner is added to the global graph (similar operations tf.train.shuffle_batch
etc.).
Since there is no explicit return to QueueRunner to start threads with create_threads, the tf.train.start_queue_runners
method is used to start tf.GraphKeys.QUEUE_RUNNERS
all queue threads in the collection directly.
The two methods are equivalent in effect.