Adam formula + parameter analysis

table of Contents

Adam algorithm:

Algorithm purpose:

​When it does not converge, the following program (pseudo code) is executed in a loop:

Parameter explanation:

Description of parameters:


Adam algorithm:

Algorithm purpose:

Improve the training method to minimize (or maximize) the loss function E(x), thereby adjusting the model update weight and deviation parameters

When it does not converge, the following program (pseudo code) is executed in a loop:

Parameter explanation:

  1. t: t is the time step, initialized to 0
  2. : Gradient at time step t
  3. : The parameter to be updated
  4. : Random objective function of parameters
  5. :Respectively the exponential decay rate of the first and second moments
  6. : First-order moment estimation of gradient
  7. : Second-order moment estimation of gradient
  8. : Correct correction
  9. : To the power of t
  10. : Correct correction
  11. : Learning rate
  12. : Constant added to maintain numerical stability

Description of parameters:

  1. The default settings of some parameters:
  2. Provides parameters for increasing the learning rate and the ability to accelerate training. Because the larger the accumulated first-order momentum (gradient), the more updated in a single direction, the more convergence is required. Its initial value is 0.
  3.   Provides the ability to reduce the learning rate, because the larger the accumulated second-order momentum (the square of the gradient), the more frequent this parameter is updated and the more severe the oscillation, so the learning rate needs to be attenuated. Its initial value is 0.
  4. : The range is [0,1), which plays a role in exponential decay of the first and second order momentum, avoiding excessive accumulation
  5. : The function of gradient descent is to find the minimum value, control the variance, update the model parameters, and finally make the model converge. In the neural network, it is mainly used to update the weight, that is, to update and adjust the parameters of the model in one direction to minimize the loss function.
  6. The first-order moment represents the mean value of the gradient, the second-order moment represents the variance, the first-order moment controls the direction of model update, and the second-order moment controls the learning rate.

 


Reference 

【1】https://www.cnblogs.com/wuchengze/p/13610500.html

【2】https://blog.csdn.net/fu6543210/article/details/83044905?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-13.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-13.control

 

Guess you like

Origin blog.csdn.net/allein_STR/article/details/113708562