Optimizing neural network-related parameters:
w represents the parameters to be optimized, loss represents the loss function, lr represents the learning rate, batch represents the data of each iteration, t represents the total number of iterations of the current batch:
the steps of neural network parameter optimization:
First-order momentum: a function related to the gradient.
Second-order momentum: a function related to the square of the gradient.
Commonly used optimizers:
(1) SDG (Stochastic gradient descent): Stochastic gradient descent
without momentum
(2) SGDM (Stochastic gradient descent with momentum) adds first-order momentum on the basis of SGD.
(3) Adagrad, adding second-order momentum on the basis of SGD
(4) RMSProp, adding second-order momentum on the basis of SGD
(5) Adam combines SGDM first-order momentum and RMSProp second-order momentum at the same time