Further optimization of neural network models in TensorFlow

This article is used to learn further optimization of neural network models in TensorFlow! ! !

The training process of the neural network:

1) Forward propagation calculates the predicted value, and makes the difference between the predicted value and the real value;

2) Backpropagation calculates the gradient of each parameter in the loss function, and then uses the gradient descent algorithm to update each parameter according to the gradient and learning rate.

Ways to further optimize the neural network model:

Exponentially decaying learning rate, adding regularization loss function, moving average model.

1. Exponentially decaying learning rate in TensorFlow?

tf.train.exponential_decay() generates a learning rate, which is reduced once every complete pass through the data.

First use a large learning rate to quickly decay to a better solution, and then gradually reduce the learning rate as the iteration continues.

The rate used in each round of optimization = the pre-set initial learning rate * decay coefficient ^ (the current iteration to the number of rounds / decay speed (the number of iterations required to use the training data once))

learning_rate = tf.train.exponential_decay( LEARNING_RATE_BASE, global_step, TRAIN_SIZE/BATCH_SIZE, LEARNING_RATE_DECAY )

2. L2 regularization in TensorFlow?

tf.contrib.layers.l2_regularizer(LAMBDA)(weight)

LAMBDA represents the proportion of model complex loss in total loss

3. Moving average in TensorFlow?

Principle: When training the neural network, keep and update the sliding average of each parameter. During verification and testing, the parameter value uses its sliding average, which can effectively improve the accuracy of the neural network.

tf.train.ExponentialMovingAverage() implements a moving average model and uses exponential decay to calculate the moving average of the variable, which is used to control the update speed of the model.

shadow_variable = decay *shadow_variable + (1 - decay) * variable

The shadow variable shadow_variable is the updated variable value of the previous round of sliding average, and the variable variable is the variable value of the current round

The apply() method adds a shadow copy of the training variable and maintains a moving average operation of the training variable in its shadow copy. Call this operation after each training to update the moving average

The average() and average_name() methods can get shadow variables and their names

import tensorflow as tf

#define a variable to calculate the moving average
v1 = tf.Variable( 0, dtype=tf.float32 )
#Define variables to simulate the number of iterations in the neural network to control the decay rate
step = tf.Variable( 0, trainable=False )

#Define a class of moving averages, given an initial decay rate and a variable that controls the decay rate
variable_averages = tf.train.ExponentialMovingAverage( 0.99, step )
#Update the variable sliding average operation, each time this operation is performed, the variables in the list are updated
variable_averages_op = variable_averages.apply( [v1] )

#create session
with tf.Session() as sess:
	#initialize all variables
	init = tf.initialize_all_variables()
	sess.run( init )

	#1. The updated value of the variable after the first sliding average, 0*0+0*0=0
	print( sess.run( [v1, variable_averages.average(v1)] ) )

	#2. Update the value of variable v1 to 5
	sess.run( tf.assign(v1, 5) )

	#Update the sliding average of v1 0.1*0+0.9*5=0.45, the decay rate is min{ 0.99,(1+step)/(10+step)=0.1 }=0.1
	sess.run( variable_averages_op )
	print( sess.run( [v1, variable_averages.average(v1)] ) )

	#3. Update the value of step to 10000 v1 to the value of 10
	sess.run( tf.assign(step, 10000) )
	sess.run( tf.assign(v1, 10) )

	#Update the sliding average of v1 0.99*4.5+0.01*10, the decay rate is min{ 0.99, (1+10000)/(10+10000)=0.999 }=0.99
	sess.run( variable_averages_op )
	print( sess.run( [v1, variable_averages.average(v1)] ) )

	#4. Update the sliding average of v1 again 0.99*4.555+0.01*10, the decay rate is min{ 0.99, (1+10000)/(10+10000)=0.999 }=0.99
	sess.run( variable_averages_op )
	print( sess.run( [v1, variable_averages.average(v1)] ) )

References:

1. "TensorFlow combat Google deep learning framework"

Further optimization of neural network models in TensorFlow

Guess you like