Deep learning: Pytorch's most comprehensive learning rate adjustment strategy lr_scheduler

This blog most comprehensively shows the parameters, usage and corresponding example curves of pytorch's various learning rate adjustment strategies. Learning rate adjustment strategies are mainly divided into four categories: specified method adjustment (MultiStepLR, LinearLR, CosineAnnealingLR, OneCycleLR, etc.) , combined adjustment (SequentialLR and ChainedScheduler), custom adjustment (LambdaLR and MultiplicativeLR), adaptive adjustment (ReduceLROnPlateau).

The parameter configuration of all examples: the initial learning rate is 1, and the epoch starts from 0 and ends at the 200th time.

lr_scheduler.LambdaLR

LambdaLR provides a more flexible way for users to customize the decay function to complete a specific learning rate curve. LambdaLR adjusts the learning rate by applying a multiplicative factor of the lambda function to the initial LR.

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • lr_lambda ( function or list ) – a function to compute multiplicative factors, or a list of such functions
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

Default:
lambda1 = lambda epoch: np.cos(epoch/max_epoch*np.pi/2)
scheduler = LambdaLR(optimizer, lr_lambda=[lambda1]);
insert image description here

lr_scheduler.MultiplicativeLR

MultiplicativeLR can also customize the change of the learning rate. Unlike LambdaLR, MultiplicativeLR adjusts the learning rate by applying the multiplication factor of the lambda function to the LR of the previous epoch.

torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – 优化器
    lr_lambda (function or list) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups.
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:
lmbda = lambda epoch: 0.95
scheduler = MultiplicativeLR(optimizer, lr_lambda=lmbda)
insert image description here

lr_scheduler.StepLR

Every time a certain period (step_size) is reached, the learning rate is multiplied by a coefficient gamma.

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • step_size (int) – period of learning rate decay
  • gamma (float) – multiplication factor for learning rate decay, default: 0.1
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:scheduler = StepLR(optimizer, step_size=30, gamma=0.5)
insert image description here

lr_scheduler.MultiStepLR

StepLR's Step is fixed, while MultiStepLR can set the size of each step.

torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=- 1, verbose=False)

Parameters:
optimizer (Optimizer) – optimizer
milestones (list) – list of epoch indices, must be increased
gamma (float) – multiplication factor for learning rate decay, default value: 0.1
last_epoch (int) – index of the last epoch, default value: -1
verbose (bool) – if True, every update of the learning rate will print a message to stdout, default: False

示例:MultiStepLR(optimizer, milestones=[30,80,150], gamma=0.5)
insert image description here

lr_scheduler.ConstantLR

In the total_iters round, multiply the learning rate specified in the optimizer by factor, and restore the original learning rate outside the total_iters round.

torch.optim.lr_scheduler.ConstantLR(optimizer, factor=0.3333333333333333, total_iters=5, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • factor (float) – Constant factor for learning rate decay, default: 1./3.
  • total_iters (int) – The learning rate decays until the set epoch value, default value: 5.
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:scheduler = ConstantLR(optimizer, factor=0.5, total_iters=50)
insert image description here

lr_scheduler.LinearLR

Linearly vary the learning rate for each parameter group until the epoch reaches a predefined value (total_iters).

torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=0.3333333333333333, end_factor=1.0, total_iters=5, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • start_factor (float) – The value of the learning rate at the start. Default: 1./3
  • end_factor (float) – At the end, the value of the learning rate. Default: 1.0
  • total_iters (int) – The epoch value at which the learning rate decay rate becomes 1, default value: 5.
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例: scheduler = LinearLR(optimizer, start_factor=1, end_factor=1/2, total_iters=200)
insert image description here

lr_scheduler.ExponentialLR

Each epoch decays the learning rate by gamma for each parameter group.

torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • gamma (float) – multiplicative factor for learning rate decay
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:scheduler = ExponentialLR(optimizer, gamma=0.9)
insert image description here

lr_scheduler.PolynomialLR

A polynomial function to decay the learning rate.

torch.optim.lr_scheduler.PolynomialLR(optimizer, total_iters=5, power=1.0, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • total_iters (int) – the number of steps to decay the learning rate, default: 5
  • power (int) – The power of the polynomial. Default: 1.0.
  • last_epoch (int) – the power of the polynomial, default: 1.0
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:
scheduler = PolynomialLR(optimizer, total_iters=100, power=2)
insert image description here

lr_scheduler.CosineAnnealingLR

Compared with the linear learning rate decay method, the cosine learning rate decay method can reach the optimal effect faster, better maintain the stability of the model, and can also improve the generalization performance of the model. The cosine learning rate decays slowly in the early stage, fast in the middle stage, and slow in the late stage, which is similar to the learning of the model.

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • T_max (int) – maximum number of iterations
  • eta_min (float) – Minimum learning rate value. Default: 0.
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:scheduler = CosineAnnealingLR(optimizer, T_max=200, eta_min=0.5)
insert image description here

lr_scheduler.SequentialLR

Multiple attenuation methods can be combined in series.

torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers, milestones, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • schedulers (list) – list of learning rate adjustment strategies (schedulers)
  • milestones (list) – epoch turning points for policy changes, list of integers
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:
scheduler1 = LinearLR(optimizer, start_factor=1, end_factor=1/2, total_iters=100)
scheduler2 = CosineAnnealingLR(optimizer, T_max=100, eta_min=0.5)
schedulers = [scheduler1, scheduler2]
milestones = [100]
scheduler = SequentialLR(optimizer, schedulers, milestones)
insert image description here

lr_scheduler.ChainedScheduler

ChainedScheduler is similar to SequentialLR. It also calls multiple learning rate adjustment strategies connected in series in sequence. The difference is that the learning rate change in ChainedScheduler is continuous.

torch.optim.lr_scheduler.ChainedScheduler(schedulers)

Parameters:
schedulers (list) – list of learning rate adjustment strategies (scheduler)

示例:
scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=10)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = ChainedScheduler([scheduler1,scheduler2])
insert image description here

lr_scheduler.CyclicLR

CyclicLR adjusts the learning rate cyclically.

torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • base_lr (float or list) – the initial learning rate, which is the lower bound value of the learning rate in each cycle
  • max_lr (float or list) – upper bound on the learning rate in each cycle
  • step_size_up (int) – number of training iterations in increment cycle, default: 2000
  • step_size_down (int) – Number of training iterations in decrementing cycle, set to step_size_up if step_size_down is None. Default: None
  • mode (str) – one of {triangular, triangular2, exp_range}, the learning rate change strategy of increasing and decreasing, if scale_fn is not None, this parameter is ignored. Default: "triangular"
  • gamma (float) – constant in 'exp_range' scaling function, default: 1.0
  • scale_fn (function) – A custom decay strategy defined by a lambda function where 0 <= scale_fn(x) <= 1 for all x >= 0. If specified, 'mode' is ignored. Default: None
  • scale_mode (str) – {'cycle', 'iterations'}. Defines whether scale_fn is evaluated according to cycle or iterations (training iterations since cycle start). Default: 'cycle'
  • cycle_momentum (bool) – If True, the momentum cycles between 'base_momentum' and 'max_momentum' in the opposite direction to the learning rate. Default: True
  • base_momentum (float or list) – The lower bound on the momentum in each cycle, note that cycles in momentum are inversely proportional to the learning rate; at the peak of a cycle, the momentum is 'base_momentum' and the learning rate is 'max_lr'. Default value: 0.8
  • max_momentum (float or list) – Momentum upper limit in each cycle, note that cycles of momentum are inversely proportional to learning rate; at the beginning of an cycle, momentum is 'max_momentum' and learning rate is 'base_lr', default: 0.9
  • last_epoch (int) – The index of the last epoch, this parameter is used when resuming training, since step() should be called after each batch, not after each epoch, this number represents the total number of batches computed, not computed The total number of epochs. When last_epoch=-1, the scheduling starts from the beginning. Default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例
scheduler = CyclicLR(optimizer, base_lr=0.1, max_lr=1, step_size_up=50)
insert image description here

lr_scheduler.OneCycleLR

OneCycleLR is a one-cycle version of CyclicLR.

torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, three_phase=False, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • max_lr (float or list) – maximum learning rate
  • total_steps (int) – total number of iterations, note that if no value is provided here, it must be inferred by providing values ​​for epochs and steps_per_epoch, so a value must be provided for total_steps, or a value for epochs and steps_per_epoch. , Default: None
  • epochs (int) – number of epochs for training, default: None
  • steps_per_epoch (int) – the number of training steps per epoch, default value: none
  • pct_start (float) – proportion of learning rate increase, default value: 0.3
  • anneal_strategy (str) – {'cos', 'linear'} specifies the annealing strategy: "cos" means cosine annealing, "linear" means linear annealing. Default: 'cos'
  • cycle_momentum (bool) – If True, the momentum cycles between 'base_momentum' and 'max_momentum' in the opposite direction to the learning rate. Default: True
  • base_momentum (float or list) – The lower bound on the momentum in each cycle, note that cycles in momentum are inversely proportional to the learning rate; at the peak of a cycle, the momentum is 'base_momentum' and the learning rate is 'max_lr'. Default value: 0.85
  • max_momentum (float or list) – Momentum upper limit in each cycle, note that cycles of momentum are inversely proportional to learning rate; at the beginning of an cycle, momentum is 'max_momentum' and learning rate is 'base_lr', default: 0.95
  • div_factor (float) – determine the initial learning rate by initial_lr = max_lr/div_factor, default: 25
  • final_div_factor (float) – determine the minimum learning rate by min_lr = initial_lr/final_div_factor Default: 1e4
  • three_phase (bool) – If True, use the third phase of the plan to eliminate the learning rate according to 'final_div_factor', instead of modifying the second phase (the first two phases will be symmetric about the step indicated by 'pct_start'). Default: False
  • last_epoch (int) – The index of the last epoch, this parameter is used when resuming training, since step() should be called after each batch, not after each epoch, this number represents the total number of batches computed, not computed The total number of epochs. When last_epoch=-1, the scheduling starts from the beginning. Default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:
scheduler = OneCycleLR(optimizer, max_lr=1, steps_per_epoch=10, epochs=20)

insert image description here

lr_scheduler.CosineAnnealingWarmRestarts

CosineAnnealingWarmRestartsLR is similar to CosineAnnealingLR, but it can restart the decay of LR from the initial LR in a cycle.

torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • T_0 (int) – Number of epochs to restart decaying
  • T_mult (int, optional) – Incremental change value of T_0, default: 1
  • eta_min (float, optional) – learning rate lower bound, default: 0
  • last_epoch (int) – The index of the last epoch, default: -1
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

示例:
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=30, T_mult=2)
insert image description here

lr_scheduler.ReduceLROnPlateau

ReduceLROnPlateau reduces the learning rate when the index metrics (eg loss, precision, etc.) stop improving. Its function is to adjust the learning rate adaptively. It will observe the loss or accuracy rate on the verification set during step. Of course, the lower the loss, the better, and the higher the accuracy rate, the better, so use loss as the parameter of step When , the mode is min, and when accuracy is used as a parameter, the mode is max.

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08, verbose=False)

parameter:

  • optimizer (Optimizer) – optimizer
  • mode (str) – one of min, max. In min mode, lr will decrease when the monitored quantity stops decreasing; in max mode, lr will decrease when the monitored quantity stops increasing. Default: "min"
  • factor (float) – The proportion of each learning rate decrease, new_lr = lr * factor. Default value: 0.1
  • Patience (int) – Patience is the number of times that can be tolerated. When the network performance has not improved after patience times, the learning rate will be reduced. Default value: 10
  • threshold (float) – the threshold for measuring the best value, generally only focusing on relatively large performance improvements, default value: 1e-4
  • threshold_mode (str) – Select the mode for judging whether the indicator is optimal. There are two modes, rel and abs.
    When threshold_mode == rel, and mode == max, dynamic_threshold = best * ( 1 +threshold );
    when threshold_mode == rel, and mode == min, dynamic_threshold = best * ( 1 -threshold );
    when threshold_mode == abs, and mode== max, dynamic_threshold = best + threshold;
    when threshold_mode == rel, and mode == max, dynamic_threshold = best - threshold;
  • cooldown (int) – Cooldown time, after adjusting the learning rate, keep the learning rate adjustment strategy unchanged, let the model retrain for a certain epoch and then restart the monitoring mode. Default: 0
  • min_lr (float or list) – minimum learning rate, default: 0
  • eps (float) – Minimum decay of lr. If the difference between old and new lr is less than eps, ignore the update, default value: 1e-8
  • verbose (bool) – if True, each update of the learning rate will print a message to stdout, default: False

Guess you like

Origin blog.csdn.net/weixin_43603658/article/details/131885273