tensorflow: tf.variable_scope,tf.summary,tf.gradients

1 Spatial domain

1.1 Namespace: tf.name_scope()

tensorflow does not know what nodes should be gathered together, unless it is explicitly told to tensorflow through tf.name_scope(name) when defining the function

with tf.name_scope('data'):
	...

with tf.name_scope('loss'):
	...

with tf.name_scope('optimizer'):
	...

1.2 Variable space: tf.variable_scope()

First, let's look at an example of a function definition:

def two_hidden_layers(x):
	w1 = tf.Variable(tf.random_normal([100,50]), name='h1_weight')
	b1 = tf.Variable(tf.zeros([50]), name='h1_biases')
	h1 = tf.matmul(x, w1) + b1
	
	w2 = tf.Variable(tf.random_normal([50,10]), name='h2_weight')
	b2 = tf.Variable(tf.zeros([10]), name='h2_biases')
	logits = tf.matmul(h1, w2) + b2
	return logits

According to the above definition, what graph will be constructed in the following two lines?

logits1 = two_hidden_layers(x1)
logits2 = two_hidden_layers(x2)

We will get two sets of parameter variables, but our purpose is to hope that all inputs use the same w and b weight parameters.
Insert picture description here
How should we define it? Through the tf.get_variable() function, if a variable already exists, it will be reused. If it does not exist, it will be initialized with the initializer. The modified code is as follows:

def two_hidden_layers(x):
	assert x.shape.as_list() == [200, 100]
	w1 = tf.get_variable("h1_weight", [100,50], initializer=tf.random_normal_initializer())
	b1 = tf.get_variable("h1_biases", [50], initializer=tf.constant_initializer(0.0))
	h1 = tf.matmul(x, w1) + b1
	assert h1.shape.as_list() == [200,50]
	w2 = tf.get_variable("h2_weights", [50,10], initializer=tf.random_normal_initializer())
	b2 = tf.get_variable("h2_biases", [10], initializer=tf.constant_initializer(0.0))
	logits = tf.matmul(h1, w2) + b2
	return logits
with tf.variable_scope('two_layer') as scope:
	logits1 = two_hidden_layers(x1)
	scope.reuse_variables()
	logits2 = two_hidden_layers(x2)

There are three points to pay attention to in the above code. First: use tf.get_variable() to get the created variable based on the name; second, use tf.variable_scope() to put the variable in a scope space; third, use scope. reuse_variables() reuses all parameter variables in the scope. Let's take a look at the constructed picture:
Insert picture description herenow the same parameter variables are shared. But the code is a bit redundant to write, let us simplify the code again:

def fully_connected(x, output_dim, scope):
	with tf.variable_scope(scope, reuse=tf.AUTO_REUSE) as scope:
		w = tf.get_variable("weights", [x.shape[1], output_dim], initializer=tf.random_normal_initializer())
		b = tf.get_variable("biases", [output_dim], initializer=tf.constant_initializer(0.0))
		return tf.matmul(x, w) +b
def two_hidden_layers(x):
	h1 = fully_connected(x, 50, 'h1')
	h2 = fully_connected(h1, 10, 'h2')
with tf.variable_scope('two_layers') as scope:
	logits1 = two_hidden_layers(x1)
	logits2 = two_hidden_layers(x2)

Use the fully_connected function to define the fully connected layer to reduce code redundancy. By setting reuse=tf.AUTO_REUSE in tf.variable_scope() , variable parameters can be flexibly reused.

2 tf.train.Saver

tf.train.Saver saves the current sessions, save and restore through the following two function call forms:

# save
tf.train.Saver.save(sess, save_path, global_step=None...)
# restore
tf.train.Saver.restore(sess, save_path)
# restore the latest
tf.train.Saver.restore(sess, tf.train.latest_checkpoint(save_path))

Note that tf.train.Saver only saves variables, not graphs. You can specify specific variables to save, as shown below:

v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')
saver = tf.train.Saver({
    
    'v1':v1, 'v2':v2})
# or
saver = tf.train.Saver([v1,v2])

2.1 global step

Let us first look at how to define and obtain global step during model training. First, we need to define a global step variable, which is not allowed to be updated. Finally, pass this variable into the optimizer function and calculate the gradient in reverse. , It will automatically count once, as shown in the following function:

global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
optimizer = tf.train.AdamOptimizer(lr).minimize(loss, global_step=global_step)

3 tf.summary

When training the model, the changes in the visualization parameters and result values ​​are mainly through the following three functions:

  • tf.summary.scalar
  • tf.summary.histogram
  • The
    main steps to implement tf.summary.image are as follows:
    Step 1: Create summaries
with tf.name_scope("summaries"):
	tf.summary.scalar("loss", self.loss)
	tf.summary.scalar("accuracy", self.accuracy)
	tf.summary.histogram("histogram loss", self.loss)
	summary_op = tf.summary.merge_all()

Step 2: run
summaries are also ops, so when summaries are created, they need to be run in a session

loss, _, summary = sess.run([loss, optimizer, summary_op])

Step 3: Write summaries into the file

writer = tf.summary.FileWriter('./graphs', sess.graph)
writer.add_summary(summary, global_step=step)

So the overall code flow steps for training and saving summaries are as follows:

# save result
tf.summary.scalar("loss", self.loss)
tf.summary.histogram("histogram_loss", self.loss)
summary_op = tf.summary.merge_all()
saver = tf.train.Saver()
with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())
	ckpt =tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint'))
	#
	if ckpt and ckpt.model_checkpoint_path:
		saver.restore(sess, ckpt.model_checkpoint_path)
	writer = tf.summary.FileWriter('./graphs', sess.graph)
	for step in range(1000):
		...
		loss, _, summary = sess.run([loss, optimizer, summary_op], feed_dict)
		writer.add_summary(summary, global_step=step)
		if (step+1) % 1000 == 0:
			saver.save(sess, 'checkpoints/model', step)

4 gradients

tensorflow automatically calculates the gradient according to the chain rule, as shown in the following figure:
Insert picture description here
According to the chain rule, the derivative is reversed, in f, q, z, y, xf,q,z,y,xf,q,with ,and ,What is the derivative at x ? Give the result first, as shown in the figure below:
Insert picture description here
How does the derivative of the red number come from? First we calculate from the end, the result isfff is derivation, becausefff itself vs.ffThe derivation result of f is 1, and then the reverse chain calculation is performed forqqThe derivative of q is equal toffTake the derivative of f and multiply byfff vsqqq derivation is equal tozzz,而 z z The value of z is equal to -4, and so on.

4.1 tf.gradients

Let us simply implement an example of derivation:

# -*-coding:utf8 -*-
import tensorflow as tf

def main():
	x = tf.Variable(2.0)
	y = 2.0 * (x ** 3)
	z = 3.0 + y**2
	grad_z = tf.gradients(z, [x,y])
	with tf.Session() as sess:
		sess.run(x.initializer)
		print(sess.run(grad_z))

if __name__=='__main__':
	main()

The running results are as follows: The
Insert picture description here
gradient calculation function in tensorflow has the following functions:

  • tf.gradients(ys, xs, grad_ys=None, …): Solve the gradient
  • tf.stop_gradient(input, name=None): When training the model, the input is truncated, and the parameters before the input will not update the gradient
  • tf.clip_by_value(t, clip_value_min, clip_value_max, name=None): control the range of the gradient value
  • tf.clip_by_norm(t, clip_norm, axes=None, name=None): gradient normalization

Guess you like

Origin blog.csdn.net/BGoodHabit/article/details/109442376