Neural network tuning --- learning rate lr

About the initial learning rate
- Try to use a large learning rate, because many studies have shown that a larger learning rate is conducive to improving the generalization ability
- The relationship with the amount of data: the learning rate generally needs to be appropriately reduced as the amount of training data increases
- The relationship with batch_size: a smaller bath_size requires a smaller learning rate
The Pytorch learning rate adjustment strategy is implemented through the torch.optim.lr_sheduler interface. The learning rate adjustment strategies provided by pytorch are divided into three categories, namely:
- Orderly adjustment : equal interval adjustment (Step), multi-interval adjustment (MultiStep), exponential decay (Exponential), cosine annealing (CosineAnnealing);
- Adaptive adjustment : wait for an opportunity to change according to the training situation, by monitoring the change of a certain indicator (loss, accuracy), when the indicator does not change much, it is the time to adjust the learning rate (ReduceLROnPlateau);
- Custom tuning : Adjust the learning rate by customizing the lambda function about epoch (LambdaLR)

In the training of each epoch, the learning rate is updated using the step() statement, which is similar to optimizer.step() to update the model parameters

optimizer = torch.optim.Adam(model.parameters(),
                             lr=0.0003,
                             betas=(0.9, 0.999),
                             eps=1e-08,
                             weight_decay=0,
                             amsgrad=False)
ExpLR = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9999)
for epoch in range(epoch_num):
	for step, (batch_x, batch_y) in enumerate(loader):
    	y_pred = model(batch_x)
        loss = loss_func(y_pred, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    ExpLR.step() # 每个epoch衰减一次学习率

Learning rate decay type in Pytorhc
1. Adjust the learning rate StepLR at equal intervals
  - Meaning: For each training step_size epoch, the learning rate is adjusted to lr=lr*gamma.
  - parameter:
    - optimizer: The optimizer used in neural network training, such as optimizer=torch.optim.SGD(…)
    - step_size(int): The number of learning rate drop intervals, the unit is epoch, not iteration.
    - gamma(float): learning rate adjustment multiple, the default is 0.1
    - last_epoch(int): The last epoch number, this variable is used to indicate whether the learning rate needs to be adjusted. When last_epoch meets the set interval, the learning rate will be adjusted; when it is -1, the learning rate is set to the initial value
```
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
```
2. Multi-interval adjustment learning rate MultiStepLR
  - Meaning: The epoch interval of learning rate adjustment is not equal, such as once when epoch=10, once when epoch=30, once when epoch=80
  - parameter:
    - milestone(list): A list parameter, indicating the epoch value that multiple learning rates need to adjust, such as milestones=[10, 30, 80]
    - Other parameters are equally spaced
```
torch.optim.lr_sheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
```
3. Exponential decay adjusts the learning rate ExponentialLR
  - Meaning: The learning rate decays exponentially, every training epoch, lr=lr * pow(gamma, epoch)
  - parameter:
    - gamma (float): the base of the learning rate adjustment multiple, the exponent is epoch
```
torch.optim.lr_sheduler.ExponentialLR(optimizer, gamma, last_epoch)
```
4. The cosine annealing function adjusts the learning rate CosineAnnealingLR
  - Meaning: The learning rate decays in the form of a cosine function, and the period of the cosine function is 2*T_max. The whole trend is like Cos(x)
  - parameter:
    - T_max(int): The number of epochs when the learning rate drops to the minimum value, that is, when epoch=T_max, the learning rate drops to the minimum value of the cosine function, generally Tmax=the total number of epochs
    - eta_min: The minimum value of learning rate adjustment, that is, when epoch=T_max, [the transfer of the external link image failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-yNCxUZSI-1615865569968)(https:// math.jianshu.com/math?formula=lr_{min}%3D)]eta_min, default is 0
```
torch.optim.lr_sheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)
```
5. Adjust the learning rate ReduceLROnPlateau according to the indicator
  - Meaning: When a certain indicator (loss or accuracy) has not changed (decreased or increased beyond a given threshold) in the last few epochs, adjust the learning rate.
    For example, adjust the learning rate when the loss of the validation set no longer decreases; or adjust the learning rate when the accuracy of the monitoring validation set no longer increases.
  - parameter:
    - mode(str): mode selection, there are two modes: min and max, min means when the index no longer decreases (such as monitoring loss), max means when the index no longer increases (such as monitoring accuracy)
    - factor(float): Learning rate adjustment multiple, same as the previous gamma, when the monitoring indicators meet the requirements, lr=lr * factor
    - Patience(int): how many epochs the indicator can bear without changing, when it is unbearable, adjust the learning rate
    - verbose(bool): Whether to print the learning rate information, the default is False, that is, the information will not be printed
    - threshold_mode (str): Select the mode for judging whether the indicator is optimal. There are two modes: 'rel' and 'abs'.
      When threshold_mode == rel, and mode == max, dynamic_threshold = best * (1 + threshold) ;
      When threshold_mode == rel, and mode == min, dynamic_threshold = best * (1 - threshold);
      when threshold_mode == abs, and mode == max, dynamic_threshold = best + threshold;
      when threshold_mode == abs, and When mode == min, dynamic_threshold = best - threshold;
    - threshold(float): Used with threshold_mode.
    - cooldown(int): "cooling time", after adjusting the learning rate, let the learning rate adjustment strategy cool down, let the model train for a period of time, and then restart the monitoring mode.
    - min_lr(float or list): The lower limit of the learning rate, which can be float or list. When there are multiple parameter groups, you can use list to set it.
    - eps(float): The minimum value of the learning rate attenuation . When the change value of the learning rate is less than eps, the learning rate will not be adjusted
```
torch.optim.lr_sheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10,
 verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
```

Neural network tuning --- learning rate lr

Adjust the learning rate StepLR at equal intervals

Multi-interval adjustment learning rate MultiStepLR

Exponential decay adjusts the learning rate ExponentialLR

The cosine annealing function adjusts the learning rate CosineAnnealingLR

Adjust the learning rate ReduceLROnPlateau according to the indicator

Guess you like