tensorflow使用可变学习率进行训练(tf.train.exponential_decay)

tensorflow提供了接口,可以直接可变学习率,在训练过程中动态改变学习率。

两种train_op,一种普通的固定学习率训练作为对比train_op,一种是可变学习率train_op2。

接口参数:

起始学习率;

训练步数计数器:用来计算何时改变学习率,之所以不自动提供,而是让你手动传入变量,可能是为了你提取监控方便一些(就如我做的一样);

学习率改变步数阈值:就是每过多少步变一次学习率;

衰减率:每次学习率乘以这个数值,第一次变成初始值的0.96,第二次变成0.9216

staircase:是否在global_step/decay_steps整除时改变,目前看到用的都是True。如果是False,会把学习率的改变匀到每一步去做,但是不是每一步都乘以衰减因子,总的学习率衰减并没有变,同样都是第10步到0.96倍,第20步到0.9216倍。

    If the argument `staircase` is `True`, then `global_step / decay_steps` is an
    integer division and the decayed learning rate follows a staircase function.
 

train_op说明:

train_op0:对照组,普通梯度下降

train_op1:使用exponential_decay

train_op2:对照组,没有传global_steps,并不能改变learning_rate

train_op22:对照组,有传global_steps,能改变learning_rate,但是并没有使用,一定要在optimizer传入Tensor。

train_op3:操作分解,实质等于train_op1.

#this is a learning rate decay examples
import tensorflow as tf

starter_learning_rate = 0.01

x = tf.constant(5.0)
label = tf.constant(17.0)#3x+2
w = tf.Variable(0.8)
b = tf.Variable(0.1)
prediction = w * x + b
loss = (prediction - label) ** 2

train_op0 = tf.train.GradientDescentOptimizer(starter_learning_rate).minimize(loss)

steps_per_decay = 10
decay_factor = 0.96

global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(learning_rate = starter_learning_rate,
                                           global_step = global_step,
                                           decay_steps = steps_per_decay,
                                           decay_rate = decay_factor,
                                           staircase = True,#If `True` decay the learning rate at discrete intervals
                                           #staircase = False,change learning rate at every step
                                           )


#passing global_step to minimize() will increment it at each step
train_op1 = (
    tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step = global_step)
)
#this cannot change lr
train_op2 = (
    #minimize()'s param global_step is the one to updates global_step
    #aka apply_gradients()
    tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
)

#tchange lr,but not use it.
train_op22 = (
    tf.train.GradientDescentOptimizer(starter_learning_rate).minimize(loss,global_step = global_step)
)
#this will change lr too
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
grads = optimizer.compute_gradients(loss)
train_op3 = optimizer.apply_gradients(grads, global_step = global_step)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(30):
        print('global_step:',sess.run(global_step))
        print('predict:',sess.run(prediction))
        print('learning rate:',sess.run(learning_rate))
        print(sess.run(loss))
        # sess.run(train_op0)
        #sess.run(train_op1)
        #sess.run(train_op2)
        sess.run(train_op22)
        #sess.run(train_op3)


global_step: 0
predict: 4.1
learning rate: 0.01
166.40999
global_step: 1
predict: 10.808
learning rate: 0.01
38.34087
global_step: 2
predict: 14.02784
learning rate: 0.01
8.833737
global_step: 3
predict: 15.573363
learning rate: 0.01
2.0352921
global_step: 4
predict: 16.315214
learning rate: 0.01
0.46893165
global_step: 5
predict: 16.671303
learning rate: 0.01
0.10804185
global_step: 6
predict: 16.842226
learning rate: 0.01
0.024892626
global_step: 7
predict: 16.924267
learning rate: 0.01
0.005735515
global_step: 8
predict: 16.963648
learning rate: 0.01
0.0013214793
global_step: 9
predict: 16.982552
learning rate: 0.01
0.00030444755
global_step: 10
predict: 16.991625
learning rate: 0.0095999995
7.014343e-05
global_step: 11
predict: 16.995806
learning rate: 0.0095999995
1.7591814e-05
global_step: 12
predict: 16.9979
learning rate: 0.0095999995
4.4099615e-06
global_step: 13
predict: 16.998947
learning rate: 0.0095999995
1.1085067e-06
global_step: 14
predict: 16.999474
learning rate: 0.0095999995
2.7712667e-07
global_step: 15
predict: 16.999737
learning rate: 0.0095999995
6.928167e-08
global_step: 16
predict: 16.999868
learning rate: 0.0095999995
1.7320417e-08
global_step: 17
predict: 16.999933
learning rate: 0.0095999995
4.456524e-09
global_step: 18
predict: 16.999968
learning rate: 0.0095999995
1.0513759e-09
global_step: 19
predict: 16.999983
learning rate: 0.0095999995
2.9467628e-10
global_step: 20
predict: 16.999992
learning rate: 0.009215999
5.820766e-11
global_step: 21
predict: 16.999996
learning rate: 0.009215999
1.4551915e-11
global_step: 22
predict: 16.999996
learning rate: 0.009215999
1.4551915e-11
global_step: 23
predict: 16.999998
learning rate: 0.009215999
3.637979e-12
global_step: 24
predict: 16.999998
learning rate: 0.009215999
3.637979e-12
global_step: 25
predict: 17.0
learning rate: 0.009215999
0.0
global_step: 26
predict: 17.0
learning rate: 0.009215999
0.0
global_step: 27
predict: 17.0
learning rate: 0.009215999
0.0
global_step: 28
predict: 17.0
learning rate: 0.009215999
0.0
global_step: 29
predict: 17.0
learning rate: 0.009215999
0.0
 

猜你喜欢

转载自blog.csdn.net/huqinweI987/article/details/82954641
今日推荐