This article is used to learn further optimization of neural network models in TensorFlow! ! !
The training process of the neural network:
1) Forward propagation calculates the predicted value, and makes the difference between the predicted value and the real value;
2) Backpropagation calculates the gradient of each parameter in the loss function, and then uses the gradient descent algorithm to update each parameter according to the gradient and learning rate.
Ways to further optimize the neural network model:
Exponentially decaying learning rate, adding regularization loss function, moving average model.
1. Exponentially decaying learning rate in TensorFlow?
tf.train.exponential_decay() generates a learning rate, which is reduced once every complete pass through the data.
First use a large learning rate to quickly decay to a better solution, and then gradually reduce the learning rate as the iteration continues.
The rate used in each round of optimization = the pre-set initial learning rate * decay coefficient ^ (the current iteration to the number of rounds / decay speed (the number of iterations required to use the training data once))
learning_rate = tf.train.exponential_decay( LEARNING_RATE_BASE, global_step, TRAIN_SIZE/BATCH_SIZE, LEARNING_RATE_DECAY )
2. L2 regularization in TensorFlow?
tf.contrib.layers.l2_regularizer(LAMBDA)(weight)
LAMBDA represents the proportion of model complex loss in total loss
3. Moving average in TensorFlow?
Principle: When training the neural network, keep and update the sliding average of each parameter. During verification and testing, the parameter value uses its sliding average, which can effectively improve the accuracy of the neural network.
tf.train.ExponentialMovingAverage() implements a moving average model and uses exponential decay to calculate the moving average of the variable, which is used to control the update speed of the model.
shadow_variable = decay *shadow_variable + (1 - decay) * variable
The shadow variable shadow_variable is the updated variable value of the previous round of sliding average, and the variable variable is the variable value of the current round
The apply() method adds a shadow copy of the training variable and maintains a moving average operation of the training variable in its shadow copy. Call this operation after each training to update the moving average
The average() and average_name() methods can get shadow variables and their names
import tensorflow as tf #define a variable to calculate the moving average v1 = tf.Variable( 0, dtype=tf.float32 ) #Define variables to simulate the number of iterations in the neural network to control the decay rate step = tf.Variable( 0, trainable=False ) #Define a class of moving averages, given an initial decay rate and a variable that controls the decay rate variable_averages = tf.train.ExponentialMovingAverage( 0.99, step ) #Update the variable sliding average operation, each time this operation is performed, the variables in the list are updated variable_averages_op = variable_averages.apply( [v1] ) #create session with tf.Session() as sess: #initialize all variables init = tf.initialize_all_variables() sess.run( init ) #1. The updated value of the variable after the first sliding average, 0*0+0*0=0 print( sess.run( [v1, variable_averages.average(v1)] ) ) #2. Update the value of variable v1 to 5 sess.run( tf.assign(v1, 5) ) #Update the sliding average of v1 0.1*0+0.9*5=0.45, the decay rate is min{ 0.99,(1+step)/(10+step)=0.1 }=0.1 sess.run( variable_averages_op ) print( sess.run( [v1, variable_averages.average(v1)] ) ) #3. Update the value of step to 10000 v1 to the value of 10 sess.run( tf.assign(step, 10000) ) sess.run( tf.assign(v1, 10) ) #Update the sliding average of v1 0.99*4.5+0.01*10, the decay rate is min{ 0.99, (1+10000)/(10+10000)=0.999 }=0.99 sess.run( variable_averages_op ) print( sess.run( [v1, variable_averages.average(v1)] ) ) #4. Update the sliding average of v1 again 0.99*4.555+0.01*10, the decay rate is min{ 0.99, (1+10000)/(10+10000)=0.999 }=0.99 sess.run( variable_averages_op ) print( sess.run( [v1, variable_averages.average(v1)] ) )
References:
1. "TensorFlow combat Google deep learning framework"