PyTorch learning rate adjustment strategy

The PyTorch learning rate adjustment strategy is implemented through the torch.optim.lr_scheduler interface. The learning rate adjustment strategies provided by PyTorch are divided into three categories, namely:

Orderly adjustment: equal interval adjustment (Step), on-demand adjustment of learning rate (MultiStep), exponential decay adjustment (Exponential) and cosine annealing CosineAnnealing
adaptive adjustment: adaptive adjustment of learning rate ReduceLROnPlateau
custom adjustment: custom adjustment of learning rate LambdaLR .

1. Adjust the learning rate StepLR at equal intervals:

Adjust the learning rate at equal intervals, the adjustment multiple is gamma times, and the adjustment interval is step_size. The interval unit is step. It should be noted that step usually refers to epoch, don’t make iteration

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size = 20, gamma = 0.1, last_epoch=-1)
 for epoch in range(60):
        scheduler.step()
        train()

step_size(int)- the number of intervals for the learning rate to decrease. If it is 30, the learning rate will be adjusted to lr*gamma at 30, 60, 90...steps.
gamma(float)- the learning rate adjustment multiplier, the default is 0.1 times, that is, a decrease of 10 times.
last_epoch(int)-The last epoch number, this variable is used to indicate whether the learning rate needs to be adjusted. When last_epoch meets the set interval, the learning rate will be adjusted. When it is -1, the learning rate is set to the initial value.

2. Adjust the learning rate MultiStepLR as needed:

Adjust the learning rate at the set interval. This method is suitable for later debugging, observe the loss curve, and customize the timing of learning rate adjustment for each experiment.

torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer,  milestones = [50, 80], gamma = 0.1, last_epoch=-1)
 for epoch in range(100):
        scheduler.step()
        train()

milestones(list)-a list, each element represents when to adjust the learning rate, the elements of the list must be incremental. For example, milestones=[30,80,120]
gamma(float)- the learning rate adjustment multiple, the default is 0.1 times, that is, a decrease of 10 times.

3. Exponential decay to adjust the learning rate ExponentialLR

torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma = 0.1, last_epoch=-1)
 for epoch in range(100):
        scheduler.step()
        train()

gamma- the base of the learning rate adjustment multiple, the exponent is epoch, that is, gamma**epoch

4. Cosine Annealing to adjust the learning rate CosineAnnealingLR

Take the cosine function as the period, and reset the learning rate at the maximum value of each period. Take the initial learning rate as the maximum learning rate, and take 2∗Tmax2∗Tmax 2*Tmax2∗Tmax as the cycle, which first decreases and then increases within one cycle.

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max = 10, eta_min=0, last_epoch=-1)
 for epoch in range(100):
        scheduler.step()
        train()

T_max(int)-The number of iterations of a learning rate cycle, that is, the learning rate is reset after T_max epochs.
eta_min(float)-minimum learning rate, that is, in a cycle, the minimum learning rate will drop to eta_min, the default value is 0.

5. Adaptively adjust the learning rate ReduceLROnPlateau

When an indicator does not change (decrease or increase), adjust the learning rate, which is a very practical learning rate adjustment strategy.
For example, when the loss of the validation set no longer decreases, adjust the learning rate; or monitor the accuracy of the validation set, and adjust the learning rate when the accuracy does not increase.

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
复制代码
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
 for epoch in range(100):
        train()
        val_loss = val()
        scheduler.step(val_loss)  # 这里注意要带个监控变量

mode(str)- mode selection, there are two modes: min and max, min means when the indicator no longer decreases (such as monitoring loss), max means when the indicator no longer increases (such as monitoring accuracy).
factor(float)- the learning rate adjustment multiple (equivalent to the gamma of other methods), that is, the learning rate is updated to lr = lr * factor
patience(int)- how many steps to endure the index unchanged, and when the tolerance is unbearable, adjust the learning rate.
verbose(bool)- Whether to print learning rate information, print('Epoch {:5d}: reducing learning rate of group {} to {:.4e}.'.format(epoch, i, new_lr))
threshold_mode(str)- Choose the mode that determines whether the index is optimal, there are two modes, rel and abs.
When threshold_mode == rel and mode == max, dynamic_threshold = best * (1 +threshold );
when threshold_mode == rel and mode == min, dynamic_threshold = best * (1 -threshold );
when threshold_mode == abs, and mode== max, dynamic_threshold = best + threshold;
when threshold_mode == rel, and mode == max, dynamic_threshold = best-threshold;
threshold(float)- Used in conjunction with threshold_mode.
cooldown(int)-"Cooldown time", after adjusting the learning rate, let the learning rate adjustment strategy calm down, let the model train for a period of time, and then restart the monitoring mode.
min_lr(float or list)-The lower limit of learning rate, which can be float or list. When there are multiple parameter groups, it can be set by list.
eps(float)-The minimum learning rate attenuation. When the learning rate change is less than eps, the learning rate is not adjusted.

6. Customize and adjust the learning rate LambdaLR

Set different learning rate adjustment strategies for different parameter groups. The adjustment rules are:

optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
 for epoch in range(100):
        scheduler.step()
        train()

It is very useful in fine-tune. We can not only set different learning rates for different layers, but also set different learning rate adjustment strategies for them.

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)

lr_lambda(function or list)-a function to calculate the learning rate adjustment multiple. The input is usually step. When there are multiple parameter groups, set it to list.

Guess you like

Origin blog.csdn.net/BigData_Mining/article/details/112679361