tensorflow gradient gradient solver and cutting

1. tensorflow gradient solved in several ways

1.1 tf.gradients

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE
)

Gradient xs ys on calculated results tf.gradients returns a length len (xs) of the tensor list List, e.g.

tf.gradients(y, [x1, x2, x3]返回[dy/dx1, dy/dx2, dy/dx3]

When y is independent of x, i.e., the path from x to y no graph, the seeking of the gradient y of x return on [None]; parameter stop_gradientsspecified variable to solve for the current gradient, a gradient stop at the solving these variables.

a = tf.constant(0.)
b = 2 * a
g = tf.gradients(a + b, [a, b], stop_gradients=[a, b]) #梯度计算不再追溯a,b之前的变量

Output:

In:  sess.run(g) 
out:[1.0, 1.0]

If the parameter is not set stop_gradients backpropagation gradient calculation will be traced back to the beginning of the value a, output is:

In : sess.run(g)
Out: [3.0, 1.0]

1.2 optimizer.compute_gradients

compute_gradients(
    loss,
    var_list=None,
    gate_gradients=GATE_OP,
    aggregation_method=None,
    colocate_gradients_with_ops=False,
    grad_loss=None
)

optimizer.compute_gradientsA tf.gradientspackage, the same effect, but only returns tfgradients gradient, compute_gradients gradient and return variables derivable; tf.compute_gradients first step optimizer.minimize () is, optimizer.compute_gradients returns a [(gradient, variable), ... ] the list of tuples , where the gradient is a tensor. Intuitively, optimizer.compute_gradients only one more than the output variable tf.gradients.

optimizer = tf.train.GradientDescentOptimizer(learning_rate = 1.0)
self.train_op = optimizer.minimize(self.cost)
sess.run([train_op], feed_dict={x:data, y:labels})

In this process, when the method of calling minimize, for working the bottom comprises:
(1) using the gradient calculated for all parameters tf.optimizer.compute_gradients trainable_variables set
(2) using a gradient optimizer.apply_gradients update corresponding calculated variable
The above code is equivalent to the code below

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
grads_and_vars = optimizer.compute_gradients(loss)
train_op = optimizer.apply_gradients(grads_and_vars)

1.3 tf.stop_gradient

tf.stop_gradient(
    input,
    name=None
)

tf.stop_gradient prevent input variables involved in calculating the gradient, i.e., prior to input graph shield during the gradient calculation.
Returns: gradient of about input

2. The cut gradient

If we want to cut gradient, the gradient is necessary to calculate their own, and then clip, and finally applied to the variable code shown below, then we introduced one of the major steps in which

#return a list of trainable variable in you model
params = tf.trainable_variables()

#create an optimizer
opt = tf.train.GradientDescentOptimizer(self.learning_rate)

#compute gradients for params
gradients = tf.gradients(loss, params)

#process gradients
clipped_gradients, norm = tf.clip_by_global_norm(gradients,max_gradient_norm)

train_op = opt.apply_gradients(zip(clipped_gradients, params)))

Introduction 2.1 tf.clip_by_global_norm

tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)
t_list 表示梯度张量
clip_norm是截取的比率

After application of this function, t_list [i] becomes public update:

global_norm = sqrt(sum(l2norm(t)**2 for t in t_list))
t_list[i] = t_list[i] * clip_norm / max(global_norm, clip_norm)

Is divided into two steps:
(1) calculates the square all gradients and global_norm
(2) if the gradient of the square and exceed our global_norm specified clip_norm, then zoom in on the gradient; otherwise in accordance with the original results

Example 2 Crop gradient

loss = w*x*x
optimizer = tf.train.GradientDescentOptimizer(0.1)
grads_and_vars = optimizer.compute_gradients(loss,[w,x])
grads = tf.gradients(loss,[w,x])
# 修正梯度
for i,(gradient,var) in enumerate(grads_and_vars):
    if gradient is not None:
        grads_and_vars[i] = (tf.clip_by_norm(gradient,5),var)
train_op = optimizer.apply_gradients(grads_and_vars)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(grads_and_vars))
     # 梯度修正前[(9.0, 2.0), (12.0, 3.0)];梯度修正后 ,[(5.0, 2.0), (5.0, 3.0)]
    print(sess.run(grads))  #[9.0, 12.0],
    print(train_op)

3. References

. [1] TensorFlow study notes (3): tf.gradients calculate the derivative and gradient clipping solve gradient explosion / disappearing

Published 33 original articles · won praise 1 · views 2595

Guess you like

Origin blog.csdn.net/orangerfun/article/details/104642577