TensorFlow-Neural Network Handwritten Digit Recognition

This article can be used to learn to write a neural network model based on TensorFlow for handwritten digit recognition! ! !

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#Load the MNIST dataset and automatically divide the training and validation datasets
mnist = input_data.read_data_sets('MNIST_data/',one_hot=True)

#Get data set information
print("Training set samples: ",mnist.train.num_examples)
print("Validation set sample: ",mnist.validation.num_examples)
print("Test set samples: ",mnist.test.num_examples)

# print("Training samples: ",mnist.train.images[0])
# print("Training sample labels: ",mnist.train.labels[0])

print("Training sample dimension: ",mnist.train.images[0].shape)
print("Test sample dimension: ",mnist.test.images[0].shape)

#Configure the parameters of the neural network
INPUT_NODE = 784#Number of input layer nodes
OUTPUT_NODE = 10#Number of output layer nodes
LAYER1_NODE = 500#Number of hidden layer nodes
BATCH_SIZE = 100#The number of samples per training
LEARNING_RATE_BASE = 0.8#Initial learning rate
LEARNING_RATE_DECAY = 0.99#The decay rate of the learning rate
LAMBDA = 0.0001#regularization coefficient
TRAINING_STEPS = 10000#Number of training rounds
MOVING_AVERAGE_DECAY = 0.99#moving average decay rate

#forward propagation process
def forward(x, avg_class, w1, b1, w2, b2):
	#If no moving average class is provided, use the current value of the parameter directly
	if avg_class == None:
		# hidden layer output
		layer1 = tf.nn.relu( tf.matmul(x, w1) + b1 )
		#Return the output layer output, no need for non-linearity. Because the prediction uses the list index of the relative size of the output node, the softmax layer is not required
		return tf.matmul( layer1, w2 ) + b2
	else:
		#First calculate the sliding average of each variable, and then calculate the hidden layer output
		layer1 = tf.nn.relu( tf.matmul( x, avg_class.average(w1) ) + avg_class.average(b1) )
		return tf.matmul( layer1, avg_class.average(w2) ) + avg_class.average(b2)

#1. Define the placeholder variables of the batch training set
x = tf.placeholder( tf.float32, [None, INPUT_NODE], name = 'x_input' )
y_ = tf.placeholder( tf.float32, [None, OUTPUT_NODE], name = 'y_input' )

#2. Define weight parameters and bias parameters, normal distribution initialization
w1 = tf.Variable( tf.truncated_normal( [INPUT_NODE, LAYER1_NODE], stddev = 0.1) )
w2 = tf.Variable( tf.truncated_normal( [LAYER1_NODE, OUTPUT_NODE], stddev = 0.1) )
b1 = tf.Variable( tf.constant(0.1, shape = [LAYER1_NODE]) )
b2 = tf.Variable( tf.constant(0.1, shape = [OUTPUT_NODE]) )

#3. Obtaining forward pass results without moving average classes
y = forward( x, None, w1, b1, w2, b2 )

#4. Define the variable of the current number of training rounds, specify this parameter as non-trainable
global_step = tf.Variable( 0, trainable = False )
#Initialize the sliding average class, given the sliding average decay rate and the current number of training epochs
variable_averages = tf.train.ExponentialMovingAverage( MOVING_AVERAGE_DECAY, global_step )
#Update the moving average value operation, use the moving average on all variables representing the parameters of the neural network, global_step does not need, specified by trainable=False
variable_averages_op = variable_averages.apply( tf.trainable_variables() )

#5. Obtaining forward pass results using moving average classes
average_y = forward( x, variable_averages, w1, b1, w2, b2 )

#6. The cross entropy loss function tf.argmax(y_, 1) obtains the maximum index of each row of the two-dimensional list y_, and finally becomes a one-dimensional list
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( logits=y, labels=tf.argmax(y_, 1) )
#batch's mean cross-entropy loss
cross_entropy_mean = tf.reduce_mean( cross_entropy )

#7.L2 regularization, generally only the regularization loss of each layer model is calculated, and the bias loss of each layer is not calculated
regularizer = tf.contrib.layers.l2_regularizer( LAMBDA )
regularition = regularizer(w1) + regularizer(w2)

#8. Calculate the total loss
loss = cross_entropy_mean + regularition

#9. Set the learning rate of exponential decay (starting learning rate, number of epochs in current iteration, number of epochs after training all data, learning rate decay rate)
learning_rate = tf.train.exponential_decay( LEARNING_RATE_BASE, global_step, mnist.train.num_examples/BATCH_SIZE, LEARNING_RATE_DECAY )

#10. Gradient descent algorithm to optimize the loss function
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize( loss, global_step=global_step)

#11. Multiple operations at the same time. When training the neural network, each pass of the data needs to complete the back-propagation to update the neural network parameters, and also need to update the sliding average of each parameter.
with tf.control_dependencies( [train_step, variable_averages_op] ):
	train_op = tf.no_op( name='train' )

#12. Calculate whether the forward propagation result of the moving average model is correct, tf.argmax(averabe_y, 1), return a one-dimensional array with the dimension batch, each element is the maximum index of the prediction array
correct_prediction = tf.equal( tf.argmax(average_y, 1), tf.argmax(y_, 1) )
#Calculate the accuracy, cast the boolean type to the real type
accuracy = tf.reduce_mean( tf.cast( correct_prediction, tf.float32 ) )

#13. Create a session with a context manager
with tf.Session() as sess:
	#Initialize variables
	tf.initialize_all_variables().run()
	#validation dataset
	validate_feed = { x:mnist.validation.images, y_:mnist.validation.labels }
	#test dataset
	test_feed = { x:mnist.test.images, y_:mnist.test.labels }
	#Iteratively train the neural network
	for i in range(TRAINING_STEPS):
		#14. Output the test results on the validation set and test set every 1000 rounds
		if i%1000==0:
			#Accuracy on validation set
			validate_acc = sess.run( accuracy, feed_dict = validate_feed )
			print("Accuracy of the current model on the validation set: ", validate_acc)
			#Accuracy on the test set
			test_acc = sess.run( accuracy, feed_dict=test_feed )
			print("Accuracy of the current training model on the test set: ", test_acc)

		#15. Use a batch training set per round
		xs,ys = mnist.train.next_batch(BATCH_SIZE)
		sess.run( train_op, feed_dict={ x:xs, y_:ys } )

	#16. Accuracy on the final test set
	test_acc = sess.run( accuracy, feed_dict=test_feed )
	print("Accuracy of the final training model on the test set: ", test_acc)

Knowledge used:

1. Loss functions for regression problems and classification problems?
The loss function of the classification problem is usually the cross entropy loss function, and the number of nodes in the output layer is the number of categories; the loss function of the regression problem is usually the mean squared error (MSE, mean squared error) loss function, the number of nodes in the output layer is generally one node, and the output of the node value is the predicted value.

2. Cross-entropy loss function?

It depicts the distance between the actual output p (probability) and the expected output q (probability), that is, the smaller the value of cross entropy, the closer the two probability distributions are. In neural networks, cross-entropy is often used in combination with the Sorfmax function. In TensorFlow, its cross-entropy loss function obtains an nxm matrix, where n is the number of batches, m is the number of categories, and the final result is the mean of the entire matrix (the row is averaged first, and then the column is averaged), which is the value of the entire batch. average cross entropy;

2. Gradient descent algorithm?

Batch gradient descent: There is no guarantee that the optimized function (convex functions can) reach the global optimal solution; computation time is long because the loss function J is the sum of the losses over all training data.

Stochastic gradient descent: In order to speed up the training process, the stochastic gradient descent algorithm can be used, and in each iteration, the loss function on a certain piece of training data is randomly optimized. But it may not be able to achieve local optimum;

Batch gradient descent: Combining the batch gradient descent algorithm and the stochastic gradient descent algorithm, a batch gradient descent is proposed, which calculates the loss function on a batch of training data each time, and optimizes the parameters on the neural network.

4. What is the role of validation data?

The ultimate goal of a neural network model is to make predictions on unknown data. Therefore, a part of the validation data set is generally obtained from the training data set to debug the model parameters (when the amount of data is small, the cross-validation method can be used) and then the model is used to predict the test data.

References:

1. "TensorFlow combat Google deep learning framework"


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324849447&siteId=291194637