TensorFlow笔记--学习率衰减

版权声明:如要转载请标记出处,谢谢合作! https://blog.csdn.net/CQDIY/article/details/84643384

在"求解二次函数最小值对应的值"例子中,我们已经直观看到TensorFlow如何求解;今天我们来讨论一下“学习率衰减”。

1. Why?

假设我们在使用梯度下降算法训练模型,让学习率 α 为固定值,如果 α 太小影响训练速度;如果 α 过大,那么在训练后期损失值不会精确的收敛而是在最小值附近摆动。也许我们会说:我选择一个合适的 α 就可以解决这个问题了。听起来很对,但是实际行动起来却困难重重,这时我们引入Learning rate decay

2. What?

学习率衰减:为了加快学习算法,学习率随时间慢慢减少,我们将之称为学习率衰减;下面我们来看看它是如何做到的。

3. How?

3.1 指数衰减公式

d e c a y e d _ l e a r n i n g _ r a t e = l e a r n i n g _ r a t e d e c a y _ r a t e g l o b a l _ s t e p d e c a y _ s t e p s decayed\_learning\_rate=learning\_rate*decay\_rate^{{global\_step} \over {decay\_steps}}

3.2 模块:exponential_decay

tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)

参数:
learning_rate:标量float32float64``Tensor或Python数字。初始学习率。
global_step:标量int32int64``Tensor或Python数字。用于衰减计算的全局步骤。一定不要是负数。
decay_steps:标量int32int64``Tensor或Python数字。必须是正数。
decay_rate:标量float32float64``Tensor或Python数字。衰减率。
staircase::布尔。如果 “True” 以不连续的间隔衰减学习率。
name:String。操作的可选名称。默认为’ExponentialDecay’。

返回
learning_rate相同类型的标量Tensor。学习率衰减。

3.2 例子

# coding:utf-8
# 设损失函数 loss=(w+1)^2,令w初始值为10
# 反向传播就是求最优 w,即求最小loss对应的w值
# 使用指数衰减的学习率,在迭代初期得到较高的下降速度,在较小的迭代轮数下取得更好的收敛

import tensorflow as tf

LEARNINF_RATE_BASE = 0.1 #最初学习率
LEARNING_RATE_DECAY = 0.99 #学习率衰减率
LEARNING_RATE_STEP = 1 # 训练多少轮BATCH_SIZE后,更新一次学习率,一般设置:总样本数/BATCH_SIZE   

# 运行了几轮BATCH_SIZE 的计数器,初始值为0,设为不被训练
global_step = tf.Variable(0, trainable=False)
# 定义指数下降学习率
learning_rate = tf.train.exponential_decay(LEARNINF_RATE_BASE, 
                                           global_step,
                                          LEARNING_RATE_STEP,
                                          LEARNING_RATE_DECAY,
                                          staircase=True)
# 定义待优化参数 w 初值为 10
w = tf.Variable(tf.constant(10, dtype=tf.float32))
# 定义损失函数loss
loss = tf.square(w+1)
# 定义反向传播方法
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

# 生成会话,训练100轮
with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(100):
        sess.run(train_step)
        learning_rate_val = sess.run(learning_rate)
        global_step_val = sess.run(global_step)
        w_val = sess.run(w)
        loss_val = sess.run(loss)
        print("After %s steps: global_step is %f, w is %f, learning rate is %f, loss is %f."%(i, global_step_val,w_val, learning_rate_val, loss_val))

运行结果:
After 0 steps: global_step is 1.000000, w is 7.800000, learning rate is 0.099000, loss is 77.440002.
After 1 steps: global_step is 2.000000, w is 6.057600, learning rate is 0.098010, loss is 49.809719.
After 2 steps: global_step is 3.000000, w is 4.674169, learning rate is 0.097030, loss is 32.196194.
……
After 55 steps: global_step is 56.000000, w is -0.999063, learning rate is 0.056960, loss is 0.000001.
After 56 steps: global_step is 57.000000, w is -0.999170, learning rate is 0.056391, loss is 0.000001.
After 57 steps: global_step is 58.000000, w is -0.999263, learning rate is 0.055827, loss is 0.000001.
After 58 steps: global_step is 59.000000, w is -0.999346, learning rate is 0.055268, loss is 0.000000.
After 59 steps: global_step is 60.000000, w is -0.999418, learning rate is 0.054716, loss is 0.000000.
……
After 97 steps: global_step is 98.000000, w is -0.999985, learning rate is 0.037346, loss is 0.000000.
After 98 steps: global_step is 99.000000, w is -0.999986, learning rate is 0.036973, loss is 0.000000.
After 99 steps: global_step is 100.000000, w is -0.999987, learning rate is 0.036603, loss is 0.000000.

4.小结

就这样我们完成了指数学习率衰减的学习;如果你在在看一些文献中遇到离散下降的学习率或手动衰减,其实他们都是为了让模型更好,我们所要做的只要修改一下learning_rate就ok啦。今天就到这里,下次我们再一起看一下指数滑动平均。

猜你喜欢

转载自blog.csdn.net/CQDIY/article/details/84643384