Neural network tuning --- learning rate lr

  • About the initial learning rate

    • Try to use a large learning rate, because many studies have shown that a larger learning rate is conducive to improving the generalization ability
    • The relationship with the amount of data: the learning rate generally needs to be appropriately reduced as the amount of training data increases
    • The relationship with batch_size: a smaller bath_size requires a smaller learning rate

  • The Pytorch learning rate adjustment strategy is implemented through the torch.optim.lr_sheduler interface. The learning rate adjustment strategies provided by pytorch are divided into three categories, namely:

    • Orderly adjustment : equal interval adjustment (Step), multi-interval adjustment (MultiStep), exponential decay (Exponential), cosine annealing (CosineAnnealing);
    • Adaptive adjustment : wait for an opportunity to change according to the training situation, by monitoring the change of a certain indicator (loss, accuracy), when the indicator does not change much, it is the time to adjust the learning rate (ReduceLROnPlateau);
    • Custom tuning : Adjust the learning rate by customizing the lambda function about epoch (LambdaLR)
  • In the training of each epoch, the learning rate is updated using the step() statement, which is similar to optimizer.step() to update the model parameters

    optimizer = torch.optim.Adam(model.parameters(),
                                 lr=0.0003,
                                 betas=(0.9, 0.999),
                                 eps=1e-08,
                                 weight_decay=0,
                                 amsgrad=False)
    ExpLR = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9999)
    for epoch in range(epoch_num):
    	for step, (batch_x, batch_y) in enumerate(loader):
        	y_pred = model(batch_x)
            loss = loss_func(y_pred, batch_y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    
        ExpLR.step() # 每个epoch衰减一次学习率
    
  • Learning rate decay type in Pytorhc

    1. Adjust the learning rate StepLR at equal intervals
      • Meaning: For each training step_size epoch, the learning rate is adjusted to lr=lr*gamma.
      • parameter:
        • optimizer: The optimizer used in neural network training, such as optimizer=torch.optim.SGD(…)
        • step_size(int): The number of learning rate drop intervals, the unit is epoch, not iteration.
        • gamma(float): learning rate adjustment multiple, the default is 0.1
        • last_epoch(int): The last epoch number, this variable is used to indicate whether the learning rate needs to be adjusted. When last_epoch meets the set interval, the learning rate will be adjusted; when it is -1, the learning rate is set to the initial value
      torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
      
    2. Multi-interval adjustment learning rate MultiStepLR
      • Meaning: The epoch interval of learning rate adjustment is not equal, such as once when epoch=10, once when epoch=30, once when epoch=80
      • parameter:
        • milestone(list): A list parameter, indicating the epoch value that multiple learning rates need to adjust, such as milestones=[10, 30, 80]
        • Other parameters are equally spaced
      torch.optim.lr_sheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
      
    3. Exponential decay adjusts the learning rate ExponentialLR
      • Meaning: The learning rate decays exponentially, every training epoch, lr=lr * pow(gamma, epoch)
      • parameter:
        • gamma (float): the base of the learning rate adjustment multiple, the exponent is epoch
      torch.optim.lr_sheduler.ExponentialLR(optimizer, gamma, last_epoch)
      
    4. The cosine annealing function adjusts the learning rate CosineAnnealingLR
      • Meaning: The learning rate decays in the form of a cosine function, and the period of the cosine function is 2*T_max. The whole trend is like Cos(x)

      • parameter:

        • T_max(int): The number of epochs when the learning rate drops to the minimum value, that is, when epoch=T_max, the learning rate drops to the minimum value of the cosine function, generally Tmax=the total number of epochs
        • eta_min: The minimum value of learning rate adjustment, that is, when epoch=T_max, [the transfer of the external link image failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-yNCxUZSI-1615865569968)(https:// math.jianshu.com/math?formula=lr_{min}%3D)]eta_min, default is 0
      torch.optim.lr_sheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
      
    5. Adjust the learning rate ReduceLROnPlateau according to the indicator
      • Meaning: When a certain indicator (loss or accuracy) has not changed (decreased or increased beyond a given threshold) in the last few epochs, adjust the learning rate.
        For example, adjust the learning rate when the loss of the validation set no longer decreases; or adjust the learning rate when the accuracy of the monitoring validation set no longer increases.

      • parameter:

        • mode(str): mode selection, there are two modes: min and max, min means when the index no longer decreases (such as monitoring loss), max means when the index no longer increases (such as monitoring accuracy)

        • factor(float): Learning rate adjustment multiple, same as the previous gamma, when the monitoring indicators meet the requirements, lr=lr * factor

        • Patience(int): how many epochs the indicator can bear without changing, when it is unbearable, adjust the learning rate

        • verbose(bool): Whether to print the learning rate information, the default is False, that is, the information will not be printed

        • threshold_mode (str): Select the mode for judging whether the indicator is optimal. There are two modes: 'rel' and 'abs'.
          When threshold_mode == rel, and mode == max, dynamic_threshold = best * (1 + threshold) ;
          When threshold_mode == rel, and mode == min, dynamic_threshold = best * (1 - threshold);
          when threshold_mode == abs, and mode == max, dynamic_threshold = best + threshold;
          when threshold_mode == abs, and When mode == min, dynamic_threshold = best - threshold;

        • threshold(float): Used with threshold_mode.

        • cooldown(int): "cooling time", after adjusting the learning rate, let the learning rate adjustment strategy cool down, let the model train for a period of time, and then restart the monitoring mode.

        • min_lr(float or list): The lower limit of the learning rate, which can be float or list. When there are multiple parameter groups, you can use list to set it.

        • eps(float): The minimum value of the learning rate attenuation . When the change value of the learning rate is less than eps, the learning rate will not be adjusted

      torch.optim.lr_sheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10,
       verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
      

Guess you like

Origin blog.csdn.net/hechao3225/article/details/114873489