This article can be used to learn to write a neural network model based on TensorFlow for handwritten digit recognition! ! !
import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data #Load the MNIST dataset and automatically divide the training and validation datasets mnist = input_data.read_data_sets('MNIST_data/',one_hot=True) #Get data set information print("Training set samples: ",mnist.train.num_examples) print("Validation set sample: ",mnist.validation.num_examples) print("Test set samples: ",mnist.test.num_examples) # print("Training samples: ",mnist.train.images[0]) # print("Training sample labels: ",mnist.train.labels[0]) print("Training sample dimension: ",mnist.train.images[0].shape) print("Test sample dimension: ",mnist.test.images[0].shape) #Configure the parameters of the neural network INPUT_NODE = 784#Number of input layer nodes OUTPUT_NODE = 10#Number of output layer nodes LAYER1_NODE = 500#Number of hidden layer nodes BATCH_SIZE = 100#The number of samples per training LEARNING_RATE_BASE = 0.8#Initial learning rate LEARNING_RATE_DECAY = 0.99#The decay rate of the learning rate LAMBDA = 0.0001#regularization coefficient TRAINING_STEPS = 10000#Number of training rounds MOVING_AVERAGE_DECAY = 0.99#moving average decay rate #forward propagation process def forward(x, avg_class, w1, b1, w2, b2): #If no moving average class is provided, use the current value of the parameter directly if avg_class == None: # hidden layer output layer1 = tf.nn.relu( tf.matmul(x, w1) + b1 ) #Return the output layer output, no need for non-linearity. Because the prediction uses the list index of the relative size of the output node, the softmax layer is not required return tf.matmul( layer1, w2 ) + b2 else: #First calculate the sliding average of each variable, and then calculate the hidden layer output layer1 = tf.nn.relu( tf.matmul( x, avg_class.average(w1) ) + avg_class.average(b1) ) return tf.matmul( layer1, avg_class.average(w2) ) + avg_class.average(b2) #1. Define the placeholder variables of the batch training set x = tf.placeholder( tf.float32, [None, INPUT_NODE], name = 'x_input' ) y_ = tf.placeholder( tf.float32, [None, OUTPUT_NODE], name = 'y_input' ) #2. Define weight parameters and bias parameters, normal distribution initialization w1 = tf.Variable( tf.truncated_normal( [INPUT_NODE, LAYER1_NODE], stddev = 0.1) ) w2 = tf.Variable( tf.truncated_normal( [LAYER1_NODE, OUTPUT_NODE], stddev = 0.1) ) b1 = tf.Variable( tf.constant(0.1, shape = [LAYER1_NODE]) ) b2 = tf.Variable( tf.constant(0.1, shape = [OUTPUT_NODE]) ) #3. Obtaining forward pass results without moving average classes y = forward( x, None, w1, b1, w2, b2 ) #4. Define the variable of the current number of training rounds, specify this parameter as non-trainable global_step = tf.Variable( 0, trainable = False ) #Initialize the sliding average class, given the sliding average decay rate and the current number of training epochs variable_averages = tf.train.ExponentialMovingAverage( MOVING_AVERAGE_DECAY, global_step ) #Update the moving average value operation, use the moving average on all variables representing the parameters of the neural network, global_step does not need, specified by trainable=False variable_averages_op = variable_averages.apply( tf.trainable_variables() ) #5. Obtaining forward pass results using moving average classes average_y = forward( x, variable_averages, w1, b1, w2, b2 ) #6. The cross entropy loss function tf.argmax(y_, 1) obtains the maximum index of each row of the two-dimensional list y_, and finally becomes a one-dimensional list cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( logits=y, labels=tf.argmax(y_, 1) ) #batch's mean cross-entropy loss cross_entropy_mean = tf.reduce_mean( cross_entropy ) #7.L2 regularization, generally only the regularization loss of each layer model is calculated, and the bias loss of each layer is not calculated regularizer = tf.contrib.layers.l2_regularizer( LAMBDA ) regularition = regularizer(w1) + regularizer(w2) #8. Calculate the total loss loss = cross_entropy_mean + regularition #9. Set the learning rate of exponential decay (starting learning rate, number of epochs in current iteration, number of epochs after training all data, learning rate decay rate) learning_rate = tf.train.exponential_decay( LEARNING_RATE_BASE, global_step, mnist.train.num_examples/BATCH_SIZE, LEARNING_RATE_DECAY ) #10. Gradient descent algorithm to optimize the loss function train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize( loss, global_step=global_step) #11. Multiple operations at the same time. When training the neural network, each pass of the data needs to complete the back-propagation to update the neural network parameters, and also need to update the sliding average of each parameter. with tf.control_dependencies( [train_step, variable_averages_op] ): train_op = tf.no_op( name='train' ) #12. Calculate whether the forward propagation result of the moving average model is correct, tf.argmax(averabe_y, 1), return a one-dimensional array with the dimension batch, each element is the maximum index of the prediction array correct_prediction = tf.equal( tf.argmax(average_y, 1), tf.argmax(y_, 1) ) #Calculate the accuracy, cast the boolean type to the real type accuracy = tf.reduce_mean( tf.cast( correct_prediction, tf.float32 ) ) #13. Create a session with a context manager with tf.Session() as sess: #Initialize variables tf.initialize_all_variables().run() #validation dataset validate_feed = { x:mnist.validation.images, y_:mnist.validation.labels } #test dataset test_feed = { x:mnist.test.images, y_:mnist.test.labels } #Iteratively train the neural network for i in range(TRAINING_STEPS): #14. Output the test results on the validation set and test set every 1000 rounds if i%1000==0: #Accuracy on validation set validate_acc = sess.run( accuracy, feed_dict = validate_feed ) print("Accuracy of the current model on the validation set: ", validate_acc) #Accuracy on the test set test_acc = sess.run( accuracy, feed_dict=test_feed ) print("Accuracy of the current training model on the test set: ", test_acc) #15. Use a batch training set per round xs,ys = mnist.train.next_batch(BATCH_SIZE) sess.run( train_op, feed_dict={ x:xs, y_:ys } ) #16. Accuracy on the final test set test_acc = sess.run( accuracy, feed_dict=test_feed ) print("Accuracy of the final training model on the test set: ", test_acc)
Knowledge used:
1. Loss functions for regression problems and classification problems?
The loss function of the classification problem is usually the cross entropy loss function, and the number of nodes in the output layer is the number of categories; the loss function of the regression problem is usually the mean squared error (MSE, mean squared error) loss function, the number of nodes in the output layer is generally one node, and the output of the node value is the predicted value.
2. Cross-entropy loss function?
It depicts the distance between the actual output p (probability) and the expected output q (probability), that is, the smaller the value of cross entropy, the closer the two probability distributions are. In neural networks, cross-entropy is often used in combination with the Sorfmax function. In TensorFlow, its cross-entropy loss function obtains an nxm matrix, where n is the number of batches, m is the number of categories, and the final result is the mean of the entire matrix (the row is averaged first, and then the column is averaged), which is the value of the entire batch. average cross entropy;
2. Gradient descent algorithm?
Batch gradient descent: There is no guarantee that the optimized function (convex functions can) reach the global optimal solution; computation time is long because the loss function J is the sum of the losses over all training data.
Stochastic gradient descent: In order to speed up the training process, the stochastic gradient descent algorithm can be used, and in each iteration, the loss function on a certain piece of training data is randomly optimized. But it may not be able to achieve local optimum;
Batch gradient descent: Combining the batch gradient descent algorithm and the stochastic gradient descent algorithm, a batch gradient descent is proposed, which calculates the loss function on a batch of training data each time, and optimizes the parameters on the neural network.
4. What is the role of validation data?
The ultimate goal of a neural network model is to make predictions on unknown data. Therefore, a part of the validation data set is generally obtained from the training data set to debug the model parameters (when the amount of data is small, the cross-validation method can be used) and then the model is used to predict the test data.
References:
1. "TensorFlow combat Google deep learning framework"