Tensorflow in k.gradients () and tf.stop_gradient () in-depth understanding of

Copyright notice: reproduced please indicate the source and marked "AI algorithms Wuhan study" https://blog.csdn.net/qq_36931982/article/details/90340438

Last week, in a wasteland laboratory codes, see the middle of such a period of Tensorflow in stop_gradient () are not familiar with, is hereby re-weekend and summary.

    y = xx + K.stop_gradient(rounded - xx)

This code will eventually call position tensoflow.python.ops.gen_array_ops. In stop_gradient (the INPUT, name = None), why write about the code's meaning is given at the end of the text.

[Stop_gradient () sense]

Generates a gradient of the loss function with wrt stop_gradient.

[Tf.gradients ()] understanding

tf we only need to design our own function, tf and provides a powerful function automatically calculates the gradient method, tf.gradients ().

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE
)

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.

  1. tf.gradients () derivative of xs ys achieved in
  2. xs and ys can be included in the list or Tensor Tensor
  3. Derivation return value is a list, list a length equal to len (xs)

eg. the return value is assumed [grad1, grad2, grad3], ys = [y1, y2], xs = [x1, x2, x3]. The calculation is: 

grad1 = \frac {dy_{1}}{dx_{1}} + \frac {dy_{2}}{dx_{1}}  ,grad2 = \frac {dy_{1}}{dx_{2}} + \frac {dy_{2}}{dx_{2}}, grad3 = \frac {dy_{1}}{dx_{3}} + \frac {dy_{2}}{dx_{3}}

import numpy as np
import tensorflow as tf

#构造数据集
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 32
y_train = 3 * x_pure + 2 + np.random.randn(32) / 32

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)

loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)
gradients_node = tf.gradients(loss_op, w)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

for i in range(20):
    _, gradients, loss = sess.run([train_op, gradients_node, loss_op], feed_dict={x_input: x_train[i], y_input: y_train[i]})
    print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))
sess.close()

Custom gradient and update functions

import numpy as np
import tensorflow as tf

#构造数据集
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 32
y_train = 3 * x_pure + 2 + np.random.randn(32) / 32

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)

loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
# train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)

#自定义权重更新
grad_w, grad_b = tf.gradients(loss_op, [w, b])
new_w = w.assign(w - 0.01 * grad_w)
new_b = b.assign(b - 0.01 * grad_b)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(20):
    _, gradients, loss = sess.run([new_w, new_b, loss_op], feed_dict={x_input: x_train[i], y_input: y_train[i]})
    print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))
sess.close()

[Tf.stop_gradient ()] understanding

() Parameters present in tf.gradients stop_gradients , which is a List, list of elements is tensorflow graph of op, once into the List, the gradient will not be counted, more importantly, BP after the op, calculations are not run.

import numpy as np
import tensorflow as tf

a = tf.constant(0.)
b = 2 * a
c = a + b
g = tf.gradients(c, [a, b])

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(g))

#输出[3.0, 1.0]

With a stop_gradient () Examples

import tensorflow as tf

#实验一
w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)

# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
gradients = tf.gradients(b, xs=[w1, w2])
print(gradients)
#输出[None, <tf.Tensor 'gradients/Mul_1_grad/Reshape_1:0' shape=() dtype=float32>]

#实验二
a = tf.Variable(1.0)
b = tf.Variable(1.0)
c = tf.add(a, b)
c_stoped = tf.stop_gradient(c)
d = tf.add(a, b)
e = tf.add(c_stoped, d)
gradients = tf.gradients(e, xs=[a, b])
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(gradients))

#因为梯度从另外地方传回,所以输出 [1.0, 1.0]

 

【answer】

Question posed at the beginning, why there is that part of the code:

t = g(x)
y = t + tf.stop_gradient(f(x) - t)

Here, we would have a transfer function of the former is XX , but want to reverse transfer function is G (X) , because the process forward, tf.stop_gradient () does not work, and thus -t + t cancel out, leaving only the F (X) transmitted to the front; in the reverse process, because tf.stop_gradient () function, so that f (x) -t gradient becomes 0, so that only the G ( x) in the reverse pass.

 

【references】

[1] using a gradient descent tf.gradients implemented in the TensorFlow

[2] In TensorFlow custom gradient two methods

[3] tensorflow study notes (thirty): tf.gradients and tf.stop_gradient () and higher derivative

 

Guess you like

Origin blog.csdn.net/qq_36931982/article/details/90340438