MNIST in action

I recently read "Tensorflow Practical Google Deep Learning Framework". In Chapter 5, I summarized the use of sliding average, regularization, and decay learning rate to optimize a three-layer neural network to recognize handwritten digits MNIST. Finally, the accuracy rate obtained on the test set reached 0.9842, I implemented it myself here, and made a group summary.

Functional module separation

Considering that forward propagation is used in both training NN and testing NN, the process of calculating forward propagation is encapsulated into a function, input training samples, and get forward output results. In order to facilitate the calculation of the regularization parameters, the regularization object is passed in the function at the same time.

import tensorflow as tf


# 定义NN的网络结构参数
INPUT_NODES=784
HIDDEN_NODES=500
OUT_NODES=10

# 获取权重变量
def get_weight(shape,regularizer):

    weight=tf.get_variable(name='weights',shape=shape,dtype=tf.float32,initializer=tf.truncated_normal_initializer(stddev=0.1))
    if regularizer!=None:
        tf.add_to_collection('losses',regularizer(weight))
    return weight
# 计算前向传播的结果
def inference(input_tensor,regularizer):
    # with tf.variable_scope('hidde_layer',reuse=True):
    with tf.variable_scope('hidde_layer'):
        weight=get_weight([INPUT_NODES,HIDDEN_NODES],regularizer)
        biase=tf.get_variable('biases',shape=[HIDDEN_NODES],initializer=tf.constant_initializer(value=0.1))
        hidden_layer=tf.nn.relu(tf.matmul(input_tensor,weight)+biase)

    # with tf.variable_scope('output_layer',reuse=True):
    with tf.variable_scope('output_layer'):
        weight=get_weight([HIDDEN_NODES,OUT_NODES],regularizer)
        biase=tf.get_variable('biases',shape=[OUT_NODES],initializer=tf.constant_initializer(value=0.1))
        outout=tf.matmul(hidden_layer,weight)+biase
    return outout

A few points:

  • The tf.variable_scope('hidde_layer'):parameter scope is defined for easy management of variables. The default setting of reuse is False, so that new variables can be created in the function. If reuse is set to True, then tf.get_variable will look for the defined variables in the scope. This is determined using the name of the variable. Doing so will cause trouble when debugging, and the problem lies in the training process. It is precisely because in the calculation graph of tensorflow, the first calculation forward propagation will create a new variable (hidden_layer/weight...), if the program has bugs and needs to be retrained, then the training will go to get_variable again, then if reuse=False, it will be An error is reported, telling you that hiddev_layer/weight already exists in the calculation graph, you need to set reuse in the scope to True, every time this opportunity is a waste of time (it takes a lot of time to import all modules every time), by the way, I use jupyter notebook to write the code, and then transfer to the module in py format. Later, I thought of a way to solve the change of dependent module code. The method of only re-importing that module is reload (model). Reload only re-introduces the specified model, and other models will not be re-introduced, which greatly reduces the running time.
  • Here, the regularized weights are put into a set, and during training, we add up the calculated cross-entropy and the set of regularized weights to form the total loss function.
  • Sometimes we need to delete some variables in the graph or nodes in the calculation graph, but tensorflow does not have such a function. I thought of a way to reset the calculation graph# tf.reset_default_graph()

Define the neural network structure

# coding: utf-8

import mnist_inference
from tensorflow.examples.tutorials.mnist import input_data
import os
import tensorflow as tf
from imp import reload

# 定义训练nn的参数
batch_size = 100  # 每次训练的数据量
learning_rate_base = 0.8 # 初始学习率
learning_rate_decay = 0.99 # 学习率损失率
moving_average_decay = 0.99 # 滑动平均损失率
regularization_rate = 0.0001  # 正则化参数
train_steps = 30000  # 训练的轮数
saved_model_path = 'saves_model_path'  # 训练的模型保存的地方
saved_model_name = 'model.ckpt'  # 模型名称




# reload(mnist_inference)

# tf.reset_default_graph()


def train(mnist):
    # 定义训练样本的输入输出placeholder
    x = tf.placeholder(dtype=tf.float32, shape=[None, mnist_inference.INPUT_NODES], name='x_input')
    y_ = tf.placeholder(dtype=tf.float32, shape=[None, mnist_inference.OUT_NODES], name='y_input')

    # 训练次数
    global_step = tf.Variable(initial_value=0, trainable=False)

    # 定义正则化类
    regularizer = tf.contrib.layers.l2_regularizer(regularization_rate)

    # 计算nn输出
    y = mnist_inference.inference(x, regularizer)

    # 定义滑动平均类并apply
    variable_average = tf.train.ExponentialMovingAverage(moving_average_decay, global_step)
    variable_average_op = variable_average.apply(tf.trainable_variables())

    # 定义学习率
    learn_rate = tf.train.exponential_decay(learning_rate=learning_rate_base, global_step=global_step,
                                            decay_steps=mnist.train.num_examples / batch_size
                                            , decay_rate=learning_rate_decay, staircase=True)
    # 计算loss
    cross_encropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, axis=1), logits=y)
    cross_encropy_mean = tf.reduce_mean(cross_encropy)
    loss = cross_encropy_mean + tf.add_n(tf.get_collection('losses'))

    # 训练过程
    train_step = tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss=loss,
                                                                                      global_step=global_step)
    # 将训练的步骤和对变量滑动平均的步骤放在一起,这样正好每训练一次模型,就将模型的参数滑动平均更新一次
    with tf.control_dependencies([train_step, variable_average_op]):
        train_op = tf.no_op(name='train')

    # 定义持久化类
    saver = tf.train.Saver()

    # 开启会话
    with tf.Session() as sess:
        # 初始化变量
        sess.run(tf.global_variables_initializer())

        for i in range(train_steps):
            xs, ys = mnist.train.next_batch(batch_size)

            _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: xs, y_: ys})
            if i % 1000 == 0:
                saver.save(sess=sess, save_path=os.path.join(saved_model_path, saved_model_name),
                           global_step=global_step)
                print('after %d training steps,loss is %f' % (step, loss_value))


# train(mnist)


# tf.all_variables()
if __name__ == '__main__':
    mnist = input_data.read_data_sets("mnist_data", one_hot=True)
    train(mnist)

The first thing that comes to mind when using NN is to define the network structure of NN, including the basic structure of the number of nodes in the input layer, the hidden layer, and the number of nodes in the output layer, as well as the activation function, the loss function, and the optimization method (the learning rate varies with the training loss, regular ization, moving average model parameters, dropout, etc.)

  • Here, both the learning rate and the sliding average are required to change with the number of training rounds, so a global_step parameter is defined in the network to tell the learning rate and the sliding average how many times the model is trained now. This global_step can be tf.train.GradientDescentOptimizer(learning_rate=learn_rate).minimize(loss=loss,global_step=global_step)obtained in.
  • where the moving average creates a shadow parameter for each model parameter to update the parameters
    write picture description here

Model persistence

To prevent accidents during the training process, we can persist the model once after training for a period of time, and mark the current model after the model as having been trained many times.

Model evaluation

Because we use the moving average in training, we also need to calculate the moving average of the parameters in the test set. tensorflow provides a very convenient way variables_average.variables_to_restore()to provide the mapping relationship between the moving average variable and the corresponding variable

{
'hidde_layer/biases/ExponentialMovingAverage': <tf.Variable 'hidde_layer/biases:0' shape=(500,) dtype=float32_ref>, 
'output_layer/weights/ExponentialMovingAverage': <tf.Variable 'output_layer/weights:0' shape=(500, 10) dtype=float32_ref>, 
'hidde_layer/weights/ExponentialMovingAverage': <tf.Variable 'hidde_layer/weights:0' shape=(784, 500) dtype=float32_ref>, 
'output_layer/biases/ExponentialMovingAverage': <tf.Variable 'output_layer/biases:0' shape=(10,) dtype=float32_ref>
}

The test results show that the accuracy is about 0.9843

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325936609&siteId=291194637