-
About the initial learning rate
- Try to use a large learning rate, because many studies have shown that a larger learning rate is conducive to improving the generalization ability
- The relationship with the amount of data: the learning rate generally needs to be appropriately reduced as the amount of training data increases
- The relationship with batch_size: a smaller bath_size requires a smaller learning rate
-
The Pytorch learning rate adjustment strategy is implemented through the torch.optim.lr_sheduler interface. The learning rate adjustment strategies provided by pytorch are divided into three categories, namely:
- Orderly adjustment : equal interval adjustment (Step), multi-interval adjustment (MultiStep), exponential decay (Exponential), cosine annealing (CosineAnnealing);
- Adaptive adjustment : wait for an opportunity to change according to the training situation, by monitoring the change of a certain indicator (loss, accuracy), when the indicator does not change much, it is the time to adjust the learning rate (ReduceLROnPlateau);
- Custom tuning : Adjust the learning rate by customizing the lambda function about epoch (LambdaLR)
-
In the training of each epoch, the learning rate is updated using the step() statement, which is similar to optimizer.step() to update the model parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False) ExpLR = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9999) for epoch in range(epoch_num): for step, (batch_x, batch_y) in enumerate(loader): y_pred = model(batch_x) loss = loss_func(y_pred, batch_y) optimizer.zero_grad() loss.backward() optimizer.step() ExpLR.step() # 每个epoch衰减一次学习率
-
Learning rate decay type in Pytorhc
-
Adjust the learning rate StepLR at equal intervals
- Meaning: For each training step_size epoch, the learning rate is adjusted to lr=lr*gamma.
- parameter:
- optimizer: The optimizer used in neural network training, such as optimizer=torch.optim.SGD(…)
- step_size(int): The number of learning rate drop intervals, the unit is epoch, not iteration.
- gamma(float): learning rate adjustment multiple, the default is 0.1
- last_epoch(int): The last epoch number, this variable is used to indicate whether the learning rate needs to be adjusted. When last_epoch meets the set interval, the learning rate will be adjusted; when it is -1, the learning rate is set to the initial value
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
-
Multi-interval adjustment learning rate MultiStepLR
- Meaning: The epoch interval of learning rate adjustment is not equal, such as once when epoch=10, once when epoch=30, once when epoch=80
- parameter:
- milestone(list): A list parameter, indicating the epoch value that multiple learning rates need to adjust, such as milestones=[10, 30, 80]
- Other parameters are equally spaced
torch.optim.lr_sheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
-
Exponential decay adjusts the learning rate ExponentialLR
- Meaning: The learning rate decays exponentially, every training epoch, lr=lr * pow(gamma, epoch)
- parameter:
- gamma (float): the base of the learning rate adjustment multiple, the exponent is epoch
torch.optim.lr_sheduler.ExponentialLR(optimizer, gamma, last_epoch)
-
The cosine annealing function adjusts the learning rate CosineAnnealingLR
-
Meaning: The learning rate decays in the form of a cosine function, and the period of the cosine function is 2*T_max. The whole trend is like Cos(x)
-
parameter:
- T_max(int): The number of epochs when the learning rate drops to the minimum value, that is, when epoch=T_max, the learning rate drops to the minimum value of the cosine function, generally Tmax=the total number of epochs
- eta_min: The minimum value of learning rate adjustment, that is, when epoch=T_max, [the transfer of the external link image failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-yNCxUZSI-1615865569968)(https:// math.jianshu.com/math?formula=lr_{min}%3D)]eta_min, default is 0
torch.optim.lr_sheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
-
-
Adjust the learning rate ReduceLROnPlateau according to the indicator
-
Meaning: When a certain indicator (loss or accuracy) has not changed (decreased or increased beyond a given threshold) in the last few epochs, adjust the learning rate.
For example, adjust the learning rate when the loss of the validation set no longer decreases; or adjust the learning rate when the accuracy of the monitoring validation set no longer increases. -
parameter:
-
mode(str): mode selection, there are two modes: min and max, min means when the index no longer decreases (such as monitoring loss), max means when the index no longer increases (such as monitoring accuracy)
-
factor(float): Learning rate adjustment multiple, same as the previous gamma, when the monitoring indicators meet the requirements, lr=lr * factor
-
Patience(int): how many epochs the indicator can bear without changing, when it is unbearable, adjust the learning rate
-
verbose(bool): Whether to print the learning rate information, the default is False, that is, the information will not be printed
-
threshold_mode (str): Select the mode for judging whether the indicator is optimal. There are two modes: 'rel' and 'abs'.
When threshold_mode == rel, and mode == max, dynamic_threshold = best * (1 + threshold) ;
When threshold_mode == rel, and mode == min, dynamic_threshold = best * (1 - threshold);
when threshold_mode == abs, and mode == max, dynamic_threshold = best + threshold;
when threshold_mode == abs, and When mode == min, dynamic_threshold = best - threshold; -
threshold(float): Used with threshold_mode.
-
cooldown(int): "cooling time", after adjusting the learning rate, let the learning rate adjustment strategy cool down, let the model train for a period of time, and then restart the monitoring mode.
-
min_lr(float or list): The lower limit of the learning rate, which can be float or list. When there are multiple parameter groups, you can use list to set it.
-
eps(float): The minimum value of the learning rate attenuation . When the change value of the learning rate is less than eps, the learning rate will not be adjusted
-
torch.optim.lr_sheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
-
-
Neural network tuning --- learning rate lr
Guess you like
Origin blog.csdn.net/hechao3225/article/details/114873489
Recommended
Ranking