Sesame HTTP: TensorFlow LSTM MNIST Classification

This section introduces the method of using RNN's LSTM for MNIST classification. Compared with CNN, RNN may be slower, but it can save more memory space.

initialization

First, we can initialize some variables, such as the learning rate, the number of node units, the number of RNN layers, etc.:

learning_rate = 1e-3
num_units = 256
num_layer = 3
input_size = 28
time_step = 28
total_steps = 2000
category_num = 10
steps_per_validate = 100
steps_per_test = 500
batch_size = tf.placeholder(tf.int32, [])
keep_prob = tf.placeholder(tf.float32, [])

Then you also need to declare the MNIST data generator:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Next, routinely declare the input data, the input data is represented by x, and the label data is represented by y_label:

x = tf.placeholder(tf.float32, [None, 784])
y_label = tf.placeholder(tf.float32, [None, 10])

The input x dimension here is [None, 784], which means that the batch_size is uncertain. The input dimension is 784, and the same is true for y_label.

Next, we need to reshape the input x, because we need to divide a picture into multiple time_steps for input, so as to build an RNN sequence, so here we directly set the time_step to 28, so that the input_size becomes 28, batch_size is unchanged, so the result of reshape is a three-dimensional matrix:

x_shape = tf.reshape(x, [-1, time_step, input_size])

RNN layer

Next, we need to build an RNN model. The RNN Cell we use here is LSTMCell, and we need to build a three-layer RNN, so we need to use MultiRNNCell here, and its input parameter is a list of LSTMCell.

So we can first declare a method for creating LSTMCell as follows:

def cell(num_units):
    cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=num_units)
    return DropoutWrapper(cell, output_keep_prob=keep_prob)

Dropout is also added here to reduce overfitting during training.

Next we use it to build a multi-layer RNN:

cells = tf.nn.rnn_cell.MultiRNNCell([cell(num_units) for _ in range(num_layer)])

Note that a for loop is used here, and a new LSTMCell is generated each loop instead of directly using multiplication to expand the list, because this will cause the LSTMCell to be the same object, resulting in a dimension mismatch problem after the MultiRNNCell is constructed.

Next we need to declare an initial state:

h0 = cells.zero_state(batch_size, dtype=tf.float32)

Then call the dynamic_rnn() method to complete the construction of the model:

output, hs = tf.nn.dynamic_rnn(cells, inputs=x_shape, initial_state=h0)

Here, the input of inputs is the result of reshape of x. The initial state is passed in through initial_state, and there are two return results. One output is the output result of all time_steps, which is assigned as output. It is three-dimensional, and the length of the first dimension is equal to batch_size, the length of the second dimension is equal to time_step, and the length of the third dimension is equal to num_units. The other hs is the hidden state, which is in the form of a tuple, the length is the number of layers of the RNN 3, and each element contains c and h, that is, the two hidden states of the LSTM.

In this case, the final result of output can take the result of the last time_step, so you can use:

output = output[:, -1, :]

Or directly taking the h of the last layer of the hidden state is the same:

h = hs[-1].h

In this model, the two are equivalent. But note that if it is used for text processing, the two may be different due to the different lengths of the text and padding.

output layer

Next, we do another linear transformation and Softmax output results:

# Output Layer
w = tf.Variable(tf.truncated_normal([num_units, category_num], stddev=0.1), dtype=tf.float32)
b = tf.Variable(tf.constant(0.1, shape=[category_num]), dtype=tf.float32)
y = tf.matmul(output, w) + b
# Loss
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y_label, logits=y)

Loss here directly calls softmax_cross_entropy_with_logits to first calculate Softmax, and then calculate cross entropy.

training and evaluation

Finally, define the training and evaluation process, and output Train Accuracy and Test Accuracy every certain step during the training process:

# Train
train = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

# Prediction
correction_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_label, axis=1))
accuracy = tf.reduce_mean(tf.cast(correction_prediction, tf.float32))

# Train
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for step in range(total_steps + 1):
        batch_x, batch_y = mnist.train.next_batch(100)
        sess.run(train, feed_dict={x: batch_x, y_label: batch_y, keep_prob: 0.5, batch_size: batch_x.shape[0]})
        # Train Accuracy
        if step % steps_per_validate == 0:
            print('Train', step, sess.run(accuracy, feed_dict={x: batch_x, y_label: batch_y, keep_prob: 0.5,
                                                               batch_size: batch_x.shape[0]}))
        # Test Accuracy
        if step % steps_per_test == 0:
            test_x, test_y = mnist.test.images, mnist.test.labels
            print('Test', step,
                  sess.run(accuracy, feed_dict={x: test_x, y_label: test_y, keep_prob: 1, batch_size: test_x.shape[0]}))

run

After running it directly, 98% accuracy can be achieved after only a few rounds of training:

Train 0 0.27
Test 0 0.2223
Train 100 0.87
Train 200 0.91
Train 300 0.94
Train 400 0.94
Train 500 0.99
Test 500 0.9595
Train 600 0.95
Train 700 0.97
Train 800 0.98

It can be seen that LSTM is more effective in the task of MNIST character classification.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325806403&siteId=291194637