Code combining cosine decay learning rate and linear warmup

The following code is taken from tensorflow official tpu repository

def cosine_learning_rate_with_linear_warmup(global_step,
                                            init_learning_rate,
                                            warmup_learning_rate,
                                            warmup_steps,
                                            total_steps):
    """Creates the cosine learning rate tensor with linear warmup."""
    global_step = tf.cast(global_step, dtype=tf.float32)
    linear_warmup = (warmup_learning_rate + global_step / warmup_steps *
                   (init_learning_rate - warmup_learning_rate))
    cosine_learning_rate = (
        init_learning_rate * (tf.cos(
            np.pi * (global_step - warmup_steps) / (total_steps - warmup_steps))
                              + 1.0) / 2.0)
    learning_rate = tf.where(global_step < warmup_steps,
                             linear_warmup, cosine_learning_rate)
    return learning_rate

For the meaning of the five parameters, just look at the picture. It’s easy to take a look at the code.
Insert image description here
In the warmup stage, the learning rate changes from warmup_learning_rateto init_learning_rate. In this stage, the learning rate increases or decreases linearly.

In the cosine decay stage, the learning rate decays like this:

l r = c o s ( g l − w t − w π ) + 1 2 ∗ i n i t _ l e a r n i n g _ r a t e lr = \frac{ cos \left ( \frac{gl-w} {t-w} \pi \right ) + 1 }{ 2 } * init\_learning\_rate lr=2cos(twg l wp )+1init_learning_rate

c o s cos Variables in cos :

  • g l gl g lglobal _ step global\_stepglobal_step
  • w w w w a r m u p _ s t e p s warmup\_steps warmup_steps
  • t t t t o t a l _ s t e p total\_step total_step

The attenuation curve is shown in the blue box in the figure below:
Insert image description here
the degree of decline gradually accelerates at first, then gradually slows down, and converges to a very small value.

Guess you like

Origin blog.csdn.net/HaoZiHuang/article/details/130000622