[ MOOC课程学习 ] 人工智能实践:Tensorflow笔记_CH4_2 学习率

学习率

课程链接在这里

  1. 学习率:表示了每次参数更新的幅度大小。学习率过大,会导致待优化的参数在最小值附近波动,不收敛;学习率过小,会导致待优化的参数收敛缓慢。在训练过程中,参数的更新向着损失函数梯度下降的方向。

    • 参数的更新公式为:
      w n + 1 = w n l r
    • 例子

      lr = 0.2
      w = tf.Variable(tf.constant(5, dtype=tf.float32))
      loss = tf.square(w+1)
      train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss)
      
      with tf.Session() as sess:
          sess.run(tf.global_variables_initializer())
          for i in range(40):
              sess.run(train_op)
              w_val = sess.run(w)
              loss_val = sess.run(loss)
              print('After %s steps: w is %f, loss is %f.' % (i, w_val, loss_val))

      输出:
      After 0 steps: w is 2.600000, loss is 12.959999.
      After 1 steps: w is 1.160000, loss is 4.665599.
      After 2 steps: w is 0.296000, loss is 1.679616.
      After 3 steps: w is -0.222400, loss is 0.604662.
      After 4 steps: w is -0.533440, loss is 0.217678.
      After 5 steps: w is -0.720064, loss is 0.078364.
      ……
      After 35 steps: w is -1.000000, loss is 0.000000.
      After 36 steps: w is -1.000000, loss is 0.000000.
      After 37 steps: w is -1.000000, loss is 0.000000.
      After 38 steps: w is -1.000000, loss is 0.000000.
      After 39 steps: w is -1.000000, loss is 0.000000.

    • 学习率过大

      lr = 1

      输出:
      After 0 steps: w is -7.000000, loss is 36.000000.
      After 1 steps: w is 5.000000, loss is 36.000000.
      After 2 steps: w is -7.000000, loss is 36.000000.
      After 3 steps: w is 5.000000, loss is 36.000000.
      After 4 steps: w is -7.000000, loss is 36.000000.
      After 5 steps: w is 5.000000, loss is 36.000000.
      …….
      After 35 steps: w is 5.000000, loss is 36.000000.
      After 36 steps: w is -7.000000, loss is 36.000000.
      After 37 steps: w is 5.000000, loss is 36.000000.
      After 38 steps: w is -7.000000, loss is 36.000000.
      After 39 steps: w is 5.000000, loss is 36.000000.

    • 学习率过小

      lr = 0.0001

      输出:
      After 0 steps: w is 4.998800, loss is 35.985600.
      After 1 steps: w is 4.997600, loss is 35.971207.
      After 2 steps: w is 4.996400, loss is 35.956818.
      After 3 steps: w is 4.995201, loss is 35.942436.
      After 4 steps: w is 4.994002, loss is 35.928059.
      After 5 steps: w is 4.992803, loss is 35.913689.
      ……
      After 35 steps: w is 4.956947, loss is 35.485222.
      After 36 steps: w is 4.955756, loss is 35.471027.
      After 37 steps: w is 4.954565, loss is 35.456841.
      After 38 steps: w is 4.953373, loss is 35.442654.
      After 39 steps: w is 4.952183, loss is 35.428478.

  2. 指数衰减学习率:学习率随着训练轮数变化而动态更新

    • 学习率计算公式如下:
      d e c a y e d _ l r = l e a r n i n g _ r a t e d e c a y _ r a t e g l o b a l _ s t e p d e c a y _ s t e p s
    • 用 Tensorflow 的函数表示为:

      LR_BASE = 0.1
      LR_DECAY_RATE = 0.99
      LR_DECAY_STEPS = 1   
      
      global_step = tf.Variable(0, trainable=False)
      lr = tf.train.exponential_decay(
          learning_rate = LR_BASE, #最初学习率
          global_step = global_step,
          decay_steps = LR_DECAY_STEPS, # 喂入多少轮 BATCH_SIZE后,更新一次学习率,一般设为:总样本数/BATCH_SIZE,即代表了完整的使用一遍训练数据所需要的迭代轮数
          decay_rate = LR_DECAY_RATE, #学习率衰减率
          staircase=True   # 或者False
          # 若 staircase 设置为 True 时,表示 global_step / decay_steps 取整数,学习率阶梯型衰减;
          # 若 staircase 设置为 False 时,学习率会是一条平滑下降的曲线。
      )
    • 例子:

      import tensorflow as tf
      
      LR_BASE = 0.1
      LR_DECAY_RATE = 0.99
      LR_DECAY_STEPS = 1
      global_step = tf.Variable(0, trainable=False)
      lr = tf.train.exponential_decay(
          learning_rate = LR_BASE, #最初学习率
          global_step = global_step,
          decay_steps = LR_DECAY_STEPS, # 喂入多少轮 BATCH_SIZE后,更新一次学习率,一般设为:总样本数/BATCH_SIZE,即代表了完整的使用一遍训练数据所需要的迭代轮数
          decay_rate = LR_DECAY_RATE, #学习率衰减率
          staircase=True   # 或者False
          # 若 staircase 设置为 True 时,表示 global_step / decay_steps 取整数,学习率阶梯型衰减;
          # 若 staircase 设置为 False 时,学习率会是一条平滑下降的曲线。
      )
      
      w = tf.Variable(tf.constant(5, dtype=tf.float32))
      loss = tf.square(w+1)
      train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss, global_step = global_step)
      
      with tf.Session() as sess:
          sess.run(tf.global_variables_initializer())
          for i in range(40):
              sess.run(train_op)
              global_step_val = sess.run(global_step)
              lr_val = sess.run(lr)
              w_val = sess.run(w)
              loss_val = sess.run(loss)
              print('After %s steps: global_step is %f, lr is %f, w is %f, loss is %f.' % (i, global_step_val, lr_val, w_val, loss_val))

      输出:
      After 0 steps: global_step is 1.000000, lr is 0.099000, w is 3.800000, loss is 23.040001.
      After 1 steps: global_step is 2.000000, lr is 0.098010, w is 2.849600, loss is 14.819419.
      After 2 steps: global_step is 3.000000, lr is 0.097030, w is 2.095001, loss is 9.579033.
      After 3 steps: global_step is 4.000000, lr is 0.096060, w is 1.494386, loss is 6.221961.
      After 4 steps: global_step is 5.000000, lr is 0.095099, w is 1.015167, loss is 4.060896.
      After 5 steps: global_step is 6.000000, lr is 0.094148, w is 0.631886, loss is 2.663051.
      ……
      After 35 steps: global_step is 36.000000, lr is 0.069641, w is -0.992297, loss is 0.000059.
      After 36 steps: global_step is 37.000000, lr is 0.068945, w is -0.993369, loss is 0.000044.
      After 37 steps: global_step is 38.000000, lr is 0.068255, w is -0.994284, loss is 0.000033.
      After 38 steps: global_step is 39.000000, lr is 0.067573, w is -0.995064, loss is 0.000024.

      由结果可以看出,随着训练轮数增加学习率在不断减小。

注意:记得在 minimize() 里指定 global_step 参数

猜你喜欢

转载自blog.csdn.net/RanMW1129/article/details/81082282
今日推荐