tensorflow 2.0 随机梯度下降 之 梯度下降

版权声明:本文为博主([email protected])原创文章,未经博主允许不得转载。 https://blog.csdn.net/z_feng12489/article/details/89916645

梯度

  1. 导数,derivative

  2. 偏微分,partial derivative

  3. 梯度,gradient

    f = ( f x 1 , f x 2 , . . . , f x n ) \nabla f = (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},...,\frac{\partial f}{\partial x_n})
    在这里插入图片描述

含义

在这里插入图片描述
在这里插入图片描述
梯度值揭示了函数值增大或者减小的方向。

梯度下降

  1. f ( θ ) l a r g e r   v a l u e \nabla f(\theta) - larger ~value
  2. Search for minima:
    • l r    α    η lr~~\alpha~~\eta
      θ t + 1 = θ t α t f ( θ t ) \theta_{t+1} = \theta_t - \alpha_t \nabla f(\theta_t)

在这里插入图片描述

实例

θ t + 1 = θ t α t f ( θ t ) \theta_{t+1} = \theta_t - \alpha_t \nabla f(\theta_t)
在这里插入图片描述

优化过程一

在这里插入图片描述

优化过程二

在这里插入图片描述

自动求导

  • With tf.GradientTape() as tape:
    • build computation graph
    • loss = f θ ( x ) f_\theta (x)
  • [ w g r a d w_{grad} ] = tape.gradient(loss, [w])
w = tf.constant(1.)
x = tf.constant(2.)
y = x * w
with tf.GradientTape() as tape:
    tape.watch([w])
    y2 = x * w

grad1 = tape.gradient(y, [w])
grad1   # [None]
with tf.GradientTape() as tape:
    tape.watch([w])
    y2 = x * w

grad2 = tape.gradient(y2, [w])
grad2   # [<tf.Tensor: id=7, shape=(), dtype=float32, numpy=2.0>]

设置persistent GradientTape 实现多次求导

w = tf.constant(1.)
x = tf.constant(2.)
y = x * w
with tf.GradientTape() as tape:
    tape.watch([w])
    y2 = x * w

grad = tape.gradient(y2, [w])
grad1   # [<tf.Tensor: id=6, shape=(), dtype=float32, numpy=2.0>]
grad = tape.gradient(y2, [w])
grad1   
# RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

设置 persistent=True

w = tf.constant(1.)
x = tf.constant(2.)
y = x * w
with tf.GradientTape(persistent=True) as tape:
    tape.watch([w])
    y2 = x * w

grad = tape.gradient(y2, [w])
grad   # [<tf.Tensor: id=6, shape=(), dtype=float32, numpy=2.0>]
grad = tape.gradient(y2, [w])
grad   # [<tf.Tensor: id=10, shape=(), dtype=float32, numpy=2.0>]

二阶梯度

  • y = x w + b y = xw + b
  • y w = x \frac{\partial y}{\partial w} = x
  • 2 y w 2 = y w = x w = \frac{\partial^2y}{\partial w^2}=\frac{\partial y&#x27;}{\partial w} = \frac{\partial x}{\partial w}= None
w = tf.Variable(1.0)
b = tf.Variable(2.0)
x = tf.Variable(3.0)

with tf.GradientTape() as t1:
  with tf.GradientTape() as t2:
    y = x * w + b
  dy_dw, dy_db = t2.gradient(y, [w, b])
d2y_dw2 = t1.gradient(dy_dw, w)

dy_dw   # tf.Tensor(3.0, shape=(), dtype=float32)
dy_db   # tf.Tensor(1.0, shape=(), dtype=float32)
d2y_dw2   # None

猜你喜欢

转载自blog.csdn.net/z_feng12489/article/details/89916645
今日推荐