TensorFlow-Handwritten Digit Recognition (1)

This article uses TensorFlow to build the most basic fully connected network, and uses the MNIST data set to implement basic model training and testing.

MNIST data set

MNIST data set: Contains 70,000 pictures of handwritten digits with white characters on a black background, of which 55,000 are the training set, 5,000 are the validation set, and 10,000 are the test set.

TensorFlow-Handwritten Digit Recognition (1)

The size of each picture is 28X28 pixels, the pure black pixel value in the picture is 0, and the pure white pixel value is 1.
The label of the data set is a one-dimensional array of length 10, and the index number of each element in the array indicates the probability of the corresponding number.

When feeding the MNIST data set as input to the neural network, each picture in the data set needs to be transformed into a one-dimensional array of length 784, and this array is fed to the neural network as the input feature of the neural network.

E.g:

A digital handwritten picture becomes a one-dimensional array of length 784 [0.0.0.0.0.231 0.235 0.459...0.219 0.0.0.0.] input to the neural network.
The label corresponding to the picture is [0.0.0.0.0.0.1.0.0.0], the element with index number 6 in the label is 1, which means that the probability of the number 6 appearing is 100%, and the recognition result corresponding to the picture is 6.

Use the read_data_sets() function in the input_data module to load the MNIST data set:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./data/", one_hot=True)

Output:


Extracting ./data/train-images-idx3-ubyte.gz
Extracting ./data/train-labels-idx1-ubyte.gz
Extracting ./data/t10k-images-idx3-ubyte.gz
Extracting ./data/t10k-labels-idx1-ubyte.gz

There are two parameters in the read_data_sets() function. The first parameter indicates the storage path of the data set, and the second parameter indicates the access form of the data set.
When the second parameter is Ture, it means to access the data set in the form of one-hot code.
When the read_data_sets() function runs, it will check whether there is a data set in the specified path. If there is no data set in the specified path, it will be automatically downloaded,
and the MNIST data set will be divided into training set train, validation set validation and test set test.

MNIST data set structure

Use the following function in Tensorflow to return the number of subset samples:

① Return the number of train samples in the training set

print("train data size:",mnist.train.num_examples)

② Return the number of validation samples in the validation set


print("validation data size:",mnist.validation.num_examples)

Output:

validation data size: 5000

③ Return the number of test samples in the test set

print("test data size:",mnist.test.num_examples)

Output:


test data size: 10000

Data set label

For example:
in the MNIST data set, if you want to view the label of the 0th picture in the training set, use the following function:


mnist.train.labels[0]

Output:


array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.])

MNIST data set picture pixel value

For example:
in the MNIST data set, if you want to view the pixel value of the 0th picture in the training set, use the following function:


mnist.train.images[0]

Output:

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       ...略去中间部分,太多了
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.34901962, 0.9843138 , 0.9450981 ,
       0.3372549 , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.01960784,
       0.8078432 , 0.96470594, 0.6156863 , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.01568628, 0.45882356, 0.27058825,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        ], dtype=float32)

Feed data into a neural network

E.g:


BATCH_SIZE = 200
xs,ys = mnist.train.next_batch(BATCH_SIZE)
print("xs shape:",xs.shape)
print("ys shape:",ys.shape)

Output:


xs shape: (200, 784)
ys shape: (200, 10)

Among them, the mnist.train.next_batch() function contains a parameter BATCH_SIZE, which means that BATCH_SIZE samples are randomly selected from the training set and input into the neural network, and the pixel values ​​and labels of the samples are assigned to xs and ys, respectively.
In this example, BATCH_SIZE is set to 200, which means that the pixel values ​​and labels of 200 samples are assigned to xs and ys at a time, so the shape of xs is (200,784), and the shape of corresponding ys is (200,10).

TensorFlow model building foundation

Common functions to realize "MNIST data set handwritten digit recognition"

① The tf.get_collection("") function means to take out all the variables from the collection to generate a list.

② The tf.add() function means to add the corresponding elements in the parameter list.
E.g:


import tensorflow as tf
x=tf.constant([[1,2],[1,2]])
y=tf.constant([[1,1],[1,2]])
z=tf.add(x,y)
with tf.Session( ) as sess:
    print(sess.run(z))

Output:


[[2 3]
 [2 4]]

③ tf.cast (x, dtype) function means to convert the parameter x to the specified data type


import numpy as np
A = tf.convert_to_tensor(np.array([[1,1,2,4], [3,4,8,5]]))
print(A.dtype)
b = tf.cast(A, tf.float32)
print(b.dtype)

Output:

<dtype: 'int32'>
<dtype: 'float32'>

It can be seen from the output result that the matrix A is changed from an integer type to a 32-bit floating point type.

④ The tf.equal() function means to compare the elements of two matrices or vectors. If the corresponding elements are equal, return True, if the corresponding elements are not equal, return False.

E.g:


A = [[1,3,4,5,6]]
B = [[1,3,4,3,2]]
with tf.Session( ) as sess:
    print(sess.run(tf.equal(A, B)))

Output:


[[ True  True  True False False]]

In matrices A and B, the first, second, and third elements are equal, and the fourth and fifth elements are not equal. Therefore, in the output result, the first, second, and third elements take the value of True, and the fourth and fifth elements The value is False.

⑤ The tf.reduce_mean(x, axis) function means to obtain the average value of the specified dimension of the matrix or tensor.

  • If the second parameter is not specified, the average value is taken among all elements

  • If the second parameter is specified as 0, the average value is taken on the elements of the first dimension, that is, each column is averaged

  • If the second parameter is specified as 1, then the average value is taken on the elements of the second dimension, that is, the average value for each row

E.g:


x = [[1., 1.],
     [2., 2.]]

with tf.Session() as sess:
    print(sess.run(tf.reduce_mean(x)))
    print(sess.run(tf.reduce_mean(x,0)))
    print(sess.run(tf.reduce_mean(x,1)))

Output:


1.5
[1.5 1.5]
[1. 2.]

⑥ The tf.argmax(x, axis) function means to return the index number of the maximum value in the parameter x under the specified dimension axis.

E.g:

In the tf.argmax([1,0,0],1) function, the axis is 1, and the parameter x is [1,0,0], which means the index number corresponding to the maximum value in the first dimension of the parameter x, Therefore, 0 is returned.

⑦ The os.path.join() function means to join the parameter strings according to the path naming rules.

E.g:

import os
os.path.join('/hello/','good/boy/','doiido')

Output:


'/hello/good/boy/doiido'

⑧ The string.split() function means to split the string according to the specified "split character" and return the split list.

E.g:


'./model/mnist_model-1001'.split('/')[-1].split('-')[-1]

In this example, there are two splits.

  • The splitting symbol is /, and the split list is returned, and the element with index -1 in the list is extracted, which is the first element from the bottom;

  • The splitting symbol is -, and the split list is returned, and the element with index -1 in the list is extracted, which is the last element, so the return value of the function is 1001.

⑨ The tf.Graph( ).as_default() function means to set the current graph as the default graph and return a context manager.

This function is generally used in conjunction with the with keyword, and is used to reproduce the defined neural network in the calculation graph.

For example:
with tf.Graph().as_default() as g, which means adding the nodes defined in Graph() to the calculation graph g.

Save the neural network model

In the process of backpropagation, the neural network model is generally saved once in a certain number of rounds, and three files are generated:

  • Save the .meta file of the current graph structure

  • The .index file that saves the current parameter name

  • .Data file that saves the current parameters

It is expressed as follows in Tensorflow:


saver = tf.train.Saver()
with tf.Session() as sess:
    for i in range(STEPS):
        if i %  轮数 == 0:
            saver.save(sess, os.path.join(MODEL_SAVE_PATH,MODEL_NAME), global_step=global_step)

Among them, tf.train.Saver() is used to instantiate the saver object.
The above code indicates that the neural network will save all the parameters and other information in the neural network model to the specified path for the specified number of rounds per cycle of the neural network, and indicate the number of training rounds when the model is saved in the name of the folder where the network model is stored.

Loading of neural network model

When testing the network effect, you need to load the trained neural network model, which is expressed in TensorFlow as follows:

with tf.Session() as sess:
    ckpt = tf.train.get_checkpoint_state(存储路径)
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)
            在with结构中进行加载保存的神经网络模型,若ckpt和保存的模型在指定路径中存在,则将保存的神经网络模型加载到当前会话中。

Load the moving average of the parameters in the model

When saving the model, if a moving average is used in the model, the moving average of the parameters will be saved in the corresponding file. By instantiating the saver object, the loading of the sliding average of the parameters is realized,
which is expressed as follows in TensorFlow:


ema = tf.train.ExponentialMovingAverage(滑动平均基数)
ema_restore = ema.variables_to_restore()
saver = tf.train.Saver(ema_restore)

Evaluation method of neural network model accuracy

In the network evaluation, generally by calculating the recognition accuracy rate on a set of data, the effect of the neural network is evaluated. This is expressed in TensorFlow:


correct_prediction = tf.equal(tf.argmax(y, 1),tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
  • y: Represents the prediction result of the neural network model on a set of data (that is, batch_size data), the shape of y is [batch_size,10], and each row represents the recognition result of a picture.

  • tf.argmax(): Take out the index value corresponding to the maximum value element in the vector corresponding to each picture, and form a one-dimensional array with a length of batch_size of the input data.

  • tf.equal(): Determine whether each dimension of the predicted result tensor and the actual label tensor is equal, if they are equal, return True, and if they are not equal, return False.

  • tf.cast(): Convert the obtained Boolean value into a real number, and then use the tf.reduce_mean() function to average, and finally get the accuracy of the neural network model on this set of data.

Network model analysis

Neural network includes forward propagation process and back propagation process.

The setting of regularization, exponential decay learning rate, moving average method, and test module used in the back propagation process.

Forward propagation process (forward.py)

The forward propagation process completes the construction of the neural network, and the structure is as follows:


def forward(x, regularizer):
    w=
    b=
    y=
    return y
def get_weight(shape,  regularizer):
def get_bias(shape):

In the forward propagation process, it is necessary to define the parameters w and bias b in the neural network, and define the network structure from input to output.

The parameter w is set by defining the function get_weight(), including the shape of the parameter w and the flag of whether it is regularized.
Similarly, the setting of bias b is achieved by defining the function get_bias().

Backpropagation process (back word.py)

The back propagation process completes the training of the network parameters, and the structure is as follows:


def backward( mnist ):
    x =  tf.placeholder(dtype,shape)
    y_ =  tf.placeholder(dtype,shape)
    #定义前向传播函数
    y = forward()
    global_step =
    loss =
    train_step = tf.train .GradientDescentOptimizer( learning_rate).minimize(loss, global_step=global_step)
    #实例化saver对象
    saver = tf.train.Saver()
    with  tf.Session() as sess:
        #初始化所有模型参数
        tf.initialize_all_variables().run()
        #训练模型
        for i in range(STEPS):
            sess.run(train_step, feed_dict={x: , y_: })
            if i %  轮数 == 0:
                print
                saver.save( )

During back propagation:

  • tf.placeholder(dtype, shape): implement training sample x and sample label y_placeholder

The parameter dtype represents the type of data

The parameter shape represents the shape of the data

  • y: defined forward propagation function forward

  • loss: the defined loss function, generally the sum of the cross entropy (or mean square error) of the predicted value and the sample label and the regularization loss

  • train_step: Use optimization algorithms to optimize model parameters

Common optimization algorithms are GradientDescentOptimizer, AdamOptimizer, MomentumOptimizer, the GradientDescentOptimizer optimization algorithm used in the above code.

Then instantiate the saver object:

  • tf.initialize_all_variables().run(): instantiate all parameter models

  • sess.run( ): Realize the training optimization process of the model, and save the model every certain number of rounds

Setting of regularization, exponential decay learning rate, and moving average method

① Regularization item regularization

When the regularization parameter regularization is set to 1 during the forward propagation process, that is, in the forward.py file, it indicates that when optimizing the model parameters during the back propagation process, a regularization term needs to be added to the loss function.
The structure is as follows:

First, you need to add in the forward.py file in the forward propagation process


if regularizer != None: 
    tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))

Secondly, you need to add in the byackword.py file in the back propagation process


ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_, 1))
cem = tf.reduce_mean(ce)
loss = cem + tf.add_n(tf.get_collection('losses'))
  • tf.nn.sparse_softmax_cross_entropy_with_logits(): indicates that the softmax() function is used together with cross entropy.

②Exponential decay learning rate

When training the model, using the exponential decay learning rate can make the model quickly converge to a better solution in the early stage of training, and can ensure that the model will not fluctuate too much in the later stage of training.

To use the exponential decay learning rate, you need to add in the backword.py file in the backpropagation process:


learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP, LEARNING_RATE_DECAY,staircase=True)

③ Moving average

Introducing the moving average during model training can make the model perform more robustly on the test data.

Need to add in the backword.py file during the backpropagation process:


ema = tf.train .ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
ema_op = ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step, ema_op]):
    train_op = tf.no_op(name='train')

Test process (test.py)

After the neural network model is trained, it can be used to test the data set to verify the performance of the neural network. The structure is as follows:

First, formulate the model test function test()

def test( mnist ):
    with tf.Graph( ).as_default( ) as g:

    #给x y_ 占位
    x = tf.placeholder( dtype, , shape) )
    y_ = tf.placeholder( dtype, , shape) )

    #前向传播得到预测结果y
    y = mnist_forward.forward(x, None )

    #实例化可还原滑动平均的saver
    ema =  tf.train.ExponentialMovingAverage(滑动衰减率)
    ema_restore = ema.variables_to_restore()
    saver = tf.train.Saver(ema_restore)

    #计算正确率
    correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast( correct_prediction,tf.float32))

    while True:
        with tf.Session() as sess:
            #加载训练好的模型
            ckpt = tf.train.get_checkpoint_state( 存储路径) )
            #如果已有ckpt模型则恢复
            if ckpt and ckpt.model_checkpoint_path:
                #恢复会话
                saver.restore(sess, ckpt.model_checkpoint_path)
                #恢复轮数
                global_ste = = ckpt.model_checkpoint_path.split('/')[-1].split('- ')[-1]
                #计算准确率
                accuracy_score = sess.run(accuracy, feed_dict={x: 测试数据 , y_: 测试数据标签 })
                #打印提示
                print("After %s training step(s), test accuracy=
                %g" % (global_step, accuracy_score ))
            #如果没有模型
            else:
                print('No checkpoint file found') # # 模型不存在 提示
                return

Second, formulate the main() function


def main():
    #加载测试数据集
    mnist = input_data.read_data_sets ("./data/", one_hot=True)
    #调用定义好的测试函数test ()
    test(mnist)
if __name__ == '__main__':
    main()

The accuracy rate is obtained by predicting the test data, so as to judge the performance of the trained neural network model.

When the accuracy is low, the possible reasons are that the model needs to be improved, or the amount of training data is too small to cause overfitting.

Network model building and testing

The realization of the recognition task of the handwritten MNIST data set is divided into three module files, namely:

  • The forward propagation process file describing the network structure (mnist_forward.py)

  • The backpropagation process file (mnist_backward.py) describing the optimization method of network parameters,

  • The test process file (mnist_test.py) to verify the accuracy of the model.

Forward propagation process file (mnist_forward.py)

In the forward propagation process, it is necessary to define the number of input layers, hidden layer nodes, and output layers of the network model, define network parameters w, bias b, and define the neural network architecture from input to output.

The forward propagation process of the recognition task of the handwritten MNIST data set is as follows:

import tensorflow as tf

INPUT_NODE = 784
OUTPUT_NODE = 10
LAYER1_NODE = 500

def get_weight(shape, regularizer):
    w = tf.Variable(tf.truncated_normal(shape,stddev=0.1))
    if regularizer != None: tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
    return w

def get_bias(shape):  
    b = tf.Variable(tf.zeros(shape))  
    return b

def forward(x, regularizer):
    w1 = get_weight([INPUT_NODE, LAYER1_NODE], regularizer)
    b1 = get_bias([LAYER1_NODE])
    y1 = tf.nn.relu(tf.matmul(x, w1) + b1)

    w2 = get_weight([LAYER1_NODE, OUTPUT_NODE], regularizer)
    b2 = get_bias([OUTPUT_NODE])
    y = tf.matmul(y1, w2) + b2
    return y

As can be seen from the above code, in the forward propagation process, the network is specified:

  • Input nodes: 784 (representing the number of pixels in each input picture)

  • Hidden layer nodes: 500

  • Output nodes: 10 (indicating that the output is a tenth of the numbers 0-9)

  • w1: parameters from the input layer to the hidden layer, the shape is [784,500]

  • w2: the parameter from the hidden layer to the output layer, the shape is [500,10]

(The parameters meet the truncated normal distribution, and regularization is used to add the regularization loss of each parameter to the total loss)

  • b1: the offset from the input layer to the hidden layer, the shape is a one-dimensional array with a length of 500

  • b2: The offset from the hidden layer to the output layer, the shape is a one-dimensional array of length 10, and the initial value is all 0.

  • y1: hidden layer output, the first layer of the forward propagation structure is the input x and the parameter w1 matrix multiplied by the offset b1, and then obtained by the relu function

  • y: output, obtained by multiplying the output y1 and the parameter w2 matrix by the second layer of the forward propagation structure as the hidden layer plus the bias b2

(Because the output y has to pass the softmax function to make it conform to the probability distribution, the output y does not pass the relu function)

Backpropagation process file (mnist_backward.py)

The backpropagation process realizes the use of training data sets to train the neural network model, and optimizes the network model parameters by reducing the loss function value, so as to obtain a neural network model with high accuracy and strong generalization ability.

The backpropagation process of the recognition task of the handwritten MNIST data set is as follows:


import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_forward
import os

BATCH_SIZE = 200
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
REGULARIZER = 0.0001
STEPS = 500 #50000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH="./model/"
MODEL_NAME="mnist_model"

def backward(mnist):

    x = tf.placeholder(tf.float32, [None, mnist_forward.INPUT_NODE])
    y_ = tf.placeholder(tf.float32, [None, mnist_forward.OUTPUT_NODE])
    y = mnist_forward.forward(x, REGULARIZER)
    global_step = tf.Variable(0, trainable=False)

    ce = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
    cem = tf.reduce_mean(ce)
    loss = cem + tf.add_n(tf.get_collection('losses'))

    learning_rate = tf.train.exponential_decay(
        LEARNING_RATE_BASE,
        global_step,
        mnist.train.num_examples / BATCH_SIZE, 
        LEARNING_RATE_DECAY,
        staircase=True)

    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

    ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
    ema_op = ema.apply(tf.trainable_variables())
    with tf.control_dependencies([train_step, ema_op]):
        train_op = tf.no_op(name='train')

    saver = tf.train.Saver()

    with tf.Session() as sess:
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        for i in range(STEPS):
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: xs, y_: ys})
            if i % 1000 == 0:
                print("After %d training step(s), loss on training batch is %g." % (step, loss_value))
                saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)

def main():
    mnist = input_data.read_data_sets("./data/", one_hot=True)
    backward(mnist)

if __name__ == '__main__':
    main()

Output:


Extracting ./data/train-images-idx3-ubyte.gz
Extracting ./data/train-labels-idx1-ubyte.gz
Extracting ./data/t10k-images-idx3-ubyte.gz
Extracting ./data/t10k-labels-idx1-ubyte.gz
After 1 training step(s), loss on training batch is 3.47547.
After 1001 training step(s), loss on training batch is 0.283958.
After 2001 training step(s), loss on training batch is 0.304716.
After 3001 training step(s), loss on training batch is 0.266811.
...省略
After 47001 training step(s), loss on training batch is 0.128592.
After 48001 training step(s), loss on training batch is 0.125534.
After 49001 training step(s), loss on training batch is 0.123577.

As can be seen from the above code, in the back propagation process:

  • Introduce tensorflow, input_data, forward propagation mnist_forward and os modules

  • Define the number of pictures fed to the neural network in each round, the initial learning rate, the learning rate decay rate, the regularization coefficient, the number of training rounds, the model save path, and the model save name and other related information

  • In the backword of the backpropagation function:

  • Read in mnist, use placeholder to place the training data x and label y_

  • Call the forword() function of the forward propagation process in the mnist_forward file, and set the regularization to calculate the prediction result y on the training data set

  • And assign a value to the current counting round counter and set it as a non-trainable type

  • Call the loss function loss that includes regularization loss of all parameters, and set the exponential decay learning rate learning_rate

  • Use gradient decay algorithm to optimize the model, reduce the loss function, and define the moving average of the parameters

  • In the with structure:

  • Realize all parameter initialization

  • Each feed batch_size group (ie 200 groups) training data and corresponding labels, loop iterative steps round

  • And print out the loss function value information every 1000 rounds, and load the current session to the specified path

  • Through the main function main(), load the training data set under the specified path, and call the specified backward() function to train the model

Test process file (mnist_ test.py)

After training the model, input the test set to the neural network model to verify the accuracy and generalization of the network.
Note that the test set and training set used are independent of each other.

The test propagation process of the recognition task of the handwritten MNIST data set is as follows:

#coding:utf-8
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import mnist_forward
import mnist_backward
TEST_INTERVAL_SECS = 5

def test(mnist):
    with tf.Graph().as_default() as g:
        x = tf.placeholder(tf.float32, [None, mnist_forward.INPUT_NODE])
        y_ = tf.placeholder(tf.float32, [None, mnist_forward.OUTPUT_NODE])
        y = mnist_forward.forward(x, None)

        ema = tf.train.ExponentialMovingAverage(mnist_backward.MOVING_AVERAGE_DECAY)
        ema_restore = ema.variables_to_restore()
        saver = tf.train.Saver(ema_restore)

        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

        while True:
            with tf.Session() as sess:
                ckpt = tf.train.get_checkpoint_state(mnist_backward.MODEL_SAVE_PATH)
                if ckpt and ckpt.model_checkpoint_path:
                    saver.restore(sess, ckpt.model_checkpoint_path)
                    global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
                    accuracy_score = sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
                    print("After %s training step(s), test accuracy = %g" % (global_step, accuracy_score))
                    return
                else:
                    print('No checkpoint file found')
                    return
            time.sleep(TEST_INTERVAL_SECS)

def main():
    mnist = input_data.read_data_sets("./data/", one_hot=True)
    test(mnist)

if __name__ == '__main__':
    main()

Output:


Extracting ./data/train-images-idx3-ubyte.gz
Extracting ./data/train-labels-idx1-ubyte.gz
Extracting ./data/t10k-images-idx3-ubyte.gz
Extracting ./data/t10k-labels-idx1-ubyte.gz
After 49001 training step(s), test accuracy = 0.98

In the above code,

  • Introduce time module, tensorflow, input_data, forward propagation mnist_forward, back propagation mnist_backward module and os module

  • Specify the 5 second cycle interval of the program

  • Define the test function test() and read in the mnist data set:

  • Use tf.Graph() to reproduce the previously defined calculation graph

  • Use placeholder to place the training data x and label y_

  • Call the forword() function of the forward propagation process in the mnist_forward file to calculate the prediction result y on the training data set

  • Instantiate a saver object with a moving average, so that when the session is loaded, all parameters in the model are assigned to their respective moving averages, which enhances the stability of the model

  • Calculate the accuracy of the model on the test set

  • In the with structure, load the ckpt under the specified path:

  • If the model exists, load the model to the current dialog, verify the accuracy rate on the test data set, and print out the accuracy rate under the current number of rounds

  • If the model does not exist, print out the prompt that the model does not exist, and the test() function is completed

  • Through the main function main(), load the test data set under the specified path, and call the specified test function to verify the accuracy of the model on the test set

It can be seen from the above running results that the final accuracy rate on the test set is 98%. The model training mnist_backward.py and the model test mnist_test.py can be executed at the same time. It can be seen more intuitively here: as the number of training rounds increases , The loss function value of the network model is constantly decreasing, and the accuracy on the test set is constantly improving, with good generalization ability.

Reference: Artificial Intelligence Practice: Tensorflow Notes

Guess you like

Origin blog.51cto.com/15060517/2641115