TensorFlow combined training data

TensorFlow provides tf.train.batch and tf.train.shuffle_batch functions to organize single samples into batches for output:

import tensorflow as tf
example, label = features['i'], features['j']
#Number of samples in a batch
batch_size = 3
#The maximum number of samples that can be stored in the queue of combined samples. If the queue is too large, it will take up a lot of memory. If the queue is too small, the dequeue operation may be blocked because there is no data, resulting in reduced training efficiency.
# Generally speaking, the size of this queue is related to the size of each batch
capacity = 1000 + 3 * batch_size
#Use the tf.train.batch function to combine sample 0. The [example, label] parameter gives the parameters to be combined. Generally, example and label represent the training sample and the correct label corresponding to this sample, respectively.
The #batch_size parameter gives the number of samples in each batch. capacity gives the maximum capacity of the queue. When the queue length is equal to the capacity, TensorFlow will suspend the enqueue and wait for dequeue; when it is less than the capacity,
#TensorFlow will automatically restart enqueued operations
example_batch, label_batch = tf.train.batch([example, label], batch_size=batch_size, capacity=capacity)

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    tf.local_variables_initializer().run()
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    for i in range(3):
        cur_example_batch, cur_label_batch = sess.run([example_batch, label_batch])
        #Get the sample after printing the combination. It is generally used as a neural network input in the problem
        print(cur_example_batch, cur_label_batch)
    coord.request_stop()
    coord.join(threads)

The result is as follows:

[0 0 1] [1 1 1]
[0 1 0] [0 0 0]
[0 0 1] [1 0 1]
The result is a batch of 3

The sample code for the tf.train.shuffle_batch function is as follows:

example, label = features['i'], features['j']
example_batch, label_batch = tf.train.shuffle_batch([example, label], batch_size = batch_size, capacity=capacity, min_after_dequeue = 30)
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    tf.local_variables_initializer().run()
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    for i in range(3):
        cur_example_batch, cur_label_batch = sess.run([example_batch, label_batch])
        #Get the sample after printing the combination. It is generally used as a neural network input in the problem
        print(cur_example_batch, cur_label_batch)
    coord.request_stop()
    coord.join(threads)
The result is as follows:
[1 1 0] [0 1 0]
[0 1 1] [1 0 1]
[0 0 1] [1 0 0]
It can be found that the order of the output samples has been disrupted.

The tf.train.batch and tf.train.shuffle_batch functions can not only organize a single training data into an input batch, but also provide a method for parallelizing the input data. By setting the num_threads parameter of the tf.train.shuffle_batch function, multiple threads can be specified to perform the enqueue operation at the same time, and the enqueue operation is the process of data reading and preprocessing. If you need multiple threads to process samples from different files, you can use the tf.train.shuffle_batch_join function.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325806184&siteId=291194637