table of Contents
1 Spatial domain
1.1 Namespace: tf.name_scope()
tensorflow does not know what nodes should be gathered together, unless it is explicitly told to tensorflow through tf.name_scope(name) when defining the function
with tf.name_scope('data'):
...
with tf.name_scope('loss'):
...
with tf.name_scope('optimizer'):
...
1.2 Variable space: tf.variable_scope()
First, let's look at an example of a function definition:
def two_hidden_layers(x):
w1 = tf.Variable(tf.random_normal([100,50]), name='h1_weight')
b1 = tf.Variable(tf.zeros([50]), name='h1_biases')
h1 = tf.matmul(x, w1) + b1
w2 = tf.Variable(tf.random_normal([50,10]), name='h2_weight')
b2 = tf.Variable(tf.zeros([10]), name='h2_biases')
logits = tf.matmul(h1, w2) + b2
return logits
According to the above definition, what graph will be constructed in the following two lines?
logits1 = two_hidden_layers(x1)
logits2 = two_hidden_layers(x2)
We will get two sets of parameter variables, but our purpose is to hope that all inputs use the same w and b weight parameters.
How should we define it? Through the tf.get_variable() function, if a variable already exists, it will be reused. If it does not exist, it will be initialized with the initializer. The modified code is as follows:
def two_hidden_layers(x):
assert x.shape.as_list() == [200, 100]
w1 = tf.get_variable("h1_weight", [100,50], initializer=tf.random_normal_initializer())
b1 = tf.get_variable("h1_biases", [50], initializer=tf.constant_initializer(0.0))
h1 = tf.matmul(x, w1) + b1
assert h1.shape.as_list() == [200,50]
w2 = tf.get_variable("h2_weights", [50,10], initializer=tf.random_normal_initializer())
b2 = tf.get_variable("h2_biases", [10], initializer=tf.constant_initializer(0.0))
logits = tf.matmul(h1, w2) + b2
return logits
with tf.variable_scope('two_layer') as scope:
logits1 = two_hidden_layers(x1)
scope.reuse_variables()
logits2 = two_hidden_layers(x2)
There are three points to pay attention to in the above code. First: use tf.get_variable() to get the created variable based on the name; second, use tf.variable_scope() to put the variable in a scope space; third, use scope. reuse_variables() reuses all parameter variables in the scope. Let's take a look at the constructed picture:
now the same parameter variables are shared. But the code is a bit redundant to write, let us simplify the code again:
def fully_connected(x, output_dim, scope):
with tf.variable_scope(scope, reuse=tf.AUTO_REUSE) as scope:
w = tf.get_variable("weights", [x.shape[1], output_dim], initializer=tf.random_normal_initializer())
b = tf.get_variable("biases", [output_dim], initializer=tf.constant_initializer(0.0))
return tf.matmul(x, w) +b
def two_hidden_layers(x):
h1 = fully_connected(x, 50, 'h1')
h2 = fully_connected(h1, 10, 'h2')
with tf.variable_scope('two_layers') as scope:
logits1 = two_hidden_layers(x1)
logits2 = two_hidden_layers(x2)
Use the fully_connected function to define the fully connected layer to reduce code redundancy. By setting reuse=tf.AUTO_REUSE in tf.variable_scope() , variable parameters can be flexibly reused.
2 tf.train.Saver
tf.train.Saver saves the current sessions, save and restore through the following two function call forms:
# save
tf.train.Saver.save(sess, save_path, global_step=None...)
# restore
tf.train.Saver.restore(sess, save_path)
# restore the latest
tf.train.Saver.restore(sess, tf.train.latest_checkpoint(save_path))
Note that tf.train.Saver only saves variables, not graphs. You can specify specific variables to save, as shown below:
v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')
saver = tf.train.Saver({
'v1':v1, 'v2':v2})
# or
saver = tf.train.Saver([v1,v2])
2.1 global step
Let us first look at how to define and obtain global step during model training. First, we need to define a global step variable, which is not allowed to be updated. Finally, pass this variable into the optimizer function and calculate the gradient in reverse. , It will automatically count once, as shown in the following function:
global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
optimizer = tf.train.AdamOptimizer(lr).minimize(loss, global_step=global_step)
3 tf.summary
When training the model, the changes in the visualization parameters and result values are mainly through the following three functions:
- tf.summary.scalar
- tf.summary.histogram
- The
main steps to implement tf.summary.image are as follows:
Step 1: Create summaries
with tf.name_scope("summaries"):
tf.summary.scalar("loss", self.loss)
tf.summary.scalar("accuracy", self.accuracy)
tf.summary.histogram("histogram loss", self.loss)
summary_op = tf.summary.merge_all()
Step 2: run
summaries are also ops, so when summaries are created, they need to be run in a session
loss, _, summary = sess.run([loss, optimizer, summary_op])
Step 3: Write summaries into the file
writer = tf.summary.FileWriter('./graphs', sess.graph)
writer.add_summary(summary, global_step=step)
So the overall code flow steps for training and saving summaries are as follows:
# save result
tf.summary.scalar("loss", self.loss)
tf.summary.histogram("histogram_loss", self.loss)
summary_op = tf.summary.merge_all()
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
ckpt =tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint'))
#
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
writer = tf.summary.FileWriter('./graphs', sess.graph)
for step in range(1000):
...
loss, _, summary = sess.run([loss, optimizer, summary_op], feed_dict)
writer.add_summary(summary, global_step=step)
if (step+1) % 1000 == 0:
saver.save(sess, 'checkpoints/model', step)
4 gradients
tensorflow automatically calculates the gradient according to the chain rule, as shown in the following figure:
According to the chain rule, the derivative is reversed, in f, q, z, y, xf,q,z,y,xf,q,with ,and ,What is the derivative at x ? Give the result first, as shown in the figure below:
How does the derivative of the red number come from? First we calculate from the end, the result isfff is derivation, becausefff itself vs.ffThe derivation result of f is 1, and then the reverse chain calculation is performed forqqThe derivative of q is equal toffTake the derivative of f and multiply byfff vsqqq derivation is equal tozzz,而 z z The value of z is equal to -4, and so on.
4.1 tf.gradients
Let us simply implement an example of derivation:
# -*-coding:utf8 -*-
import tensorflow as tf
def main():
x = tf.Variable(2.0)
y = 2.0 * (x ** 3)
z = 3.0 + y**2
grad_z = tf.gradients(z, [x,y])
with tf.Session() as sess:
sess.run(x.initializer)
print(sess.run(grad_z))
if __name__=='__main__':
main()
The running results are as follows: The
gradient calculation function in tensorflow has the following functions:
- tf.gradients(ys, xs, grad_ys=None, …): Solve the gradient
- tf.stop_gradient(input, name=None): When training the model, the input is truncated, and the parameters before the input will not update the gradient
- tf.clip_by_value(t, clip_value_min, clip_value_max, name=None): control the range of the gradient value
- tf.clip_by_norm(t, clip_norm, axes=None, name=None): gradient normalization