Deep learning------adjustment of learning rate (for self-study use)

Ir_scheduler

The Pytorch learning rate adjustment strategy is implemented through the torch.optim.ir_scheduler interface. Includes three categories:

a. Ordered adjustment: equal interval adjustment (Step), on-demand adjustment of learning rate (MultiStep), exponential decay adjustment (Exponential) and cosine annealing CosineAnnealing.
b. Adaptive adjustment: adaptively adjust the learning rate ReduceLROnPlateau.
c. Custom tuning: custom tuning learning rate LambdaLR.
————————————————
Original link: https://blog.csdn.net/shanglianlm/article/details/85143614

1 Adjust the learning rate StepLR at equal intervals

Adjust the learning rate at equal intervals, the adjustment multiple is gammma times, the adjustment interval is step_size, and the interval unit is step. It should be noted that step usually refers to epoch, not iteration.

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

Parameters:
step_size(int)- The number of learning rate drop intervals, if it is 30, the learning rate will be adjusted to lr*gamma at 30, 60, 90...steps.
gamma (float) - Learning rate adjustment multiple, the default is 0.1 times, that is, it is reduced by 10 times.
last_epoch (int) - the last epoch number, this variable is used to indicate whether the learning rate needs to be adjusted. When the last_epoch fits the set interval, the learning rate is adjusted. When -1, the learning rate is set to the initial value.

2 Adjust the learning rate MultiStepLR as needed

Adjust the learning rate according to the set time interval, which is suitable for later debugging, observe the loss curve, and customize the timing of learning rate adjustment for each experiment.

torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)

parameter:

milestones(list)- a list, each element represents when to adjust the learning rate, list elements must be increasing. Such as milestones=[30,80,120]
gamma(float)- Learning rate adjustment multiple, the default is 0.1 times, that is, it is reduced by 10 times

3 Exponential decay to adjust the learning rate

Adjust the learning rate according to exponential decay, the adjustment formula: lr = lr * gamma**epochlr=lr∗gamma∗∗epoch

torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)

parameter:

Gamma - the bottom of the learning rate adjustment multiple, the exponent is epoch, that is, gamma**epoch

4 Cosine annealing to adjust the learning rate CosineAnnealingLR

Take a cosine function as a period, and reset the learning rate at the maximum value of each period. Take the initial learning rate as the maximum learning rate, take 2-Tmax as the cycle, first decrease and then increase in one cycle.

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)

parameter

T_max(int) - The number of iterations of a learning rate cycle, that is, the learning rate is reset after T_max epochs.
eta_min(float) - the minimum learning rate, that is, in one cycle, the learning rate will drop to eta_min minimum, the default value is 0.

5 Adaptively adjust the learning rate ReduceLROnPlateau

When the indicator no longer changes (decreases or increases), adjust the learning rate, which is a very practical learning rate adjustment strategy.
For example: when the loss of the verification machine no longer decreases, adjust the learning rate, or monitor the accuracy of the verification set, and adjust the learning rate when the accuracy no longer increases.

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)

parameter:

mode(str)- mode selection, there are two modes: min and max, min means when the index no longer decreases (such as monitoring loss), max means when the index no longer increases (such as monitoring accuracy).
factor(float)- learning rate adjustment multiple (equivalent to gamma of other methods), that is, the learning rate is updated as lr = lr * factor
patience(int)- how many steps the indicator can tolerate without changing, when it is unbearable, adjust the learning rate.
verbose(bool)- Whether to print learning rate information, print('Epoch {:5d}: reducing learning rate of group {} to {:.4e}.'.format(epoch, i, new_lr)) threshold_mode(str)
- Select the mode for judging whether the index is optimal. There are two modes, rel and abs.
When threshold_mode == rel, and mode == max, dynamic_threshold = best * ( 1 +threshold );
when threshold_mode == rel, and mode == min, dynamic_threshold = best * ( 1 -threshold );
when threshold_mode == abs, and mode== max, dynamic_threshold = best + threshold;
when threshold_mode == rel, and mode == max, dynamic_threshold = best - threshold;
threshold(float) - used with threshold_mode.
cooldown(int)- "cooling time", after adjusting the learning rate, let the learning rate adjustment strategy cool down, let the model train for a while, and then restart the monitoring mode.
min_lr(float or list)- The lower limit of the learning rate, which can be float or list. When there are multiple parameter groups, you can use list to set it.
eps(float)- The minimum value of learning rate decay, when the learning rate change is less than eps, the learning rate will not be adjusted.

6 Custom adjustment learning rate LambdaLR
sets different learning rate adjustment strategies for different parameter groups. The adjustment rule is,

l r = b a s e _ l r ∗ l m b d a ( s e l f . l a s t _ e p o c h ) lr = base_lr *lmbda(self.last_epoch)
lr=base_lr∗lmbda(self.last_epoch)

It is very useful in fine-tune. We can not only set different learning rates for different layers, but also set different learning rate adjustment strategies for them.

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
1
参数:

lr_lambda(function or list)- A function to calculate the learning rate adjustment multiple, the input is usually step, when there are multiple parameter groups, it is set to list.
————————————————
Copyright statement: This article is an original article of CSDN blogger "mingo_min", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement.
Original link: https://blog.csdn.net/shanglianlm/article/details/85143614

Guess you like

Origin blog.csdn.net/weixin_48983346/article/details/125736303