Table of contents
I hope so
Adam can be seen as a fusion of RMSProp and momentum. RMSprop contributes to the average value of the exponential decay of the historical square gradient , while momentum is responsible for the average value of the exponential decay of the historical gradient . Nadam adds the accumulation of first-order momentum on the basis of Adam. , namely Nesterov + Adam = Nadam, in order to integrate NAG into Adam, we need to modify the momentum item