Optimization algorithms(优化算法)---deeplearning.ai---笔记(17)

版权声明:博主是初学者,博文可能会有错误,望批评指正!转载请注明本博客地址,谢谢! https://blog.csdn.net/LieQueov/article/details/80107467
一、思维导图

二、关键公式

(1)momentum梯度下降

$$\begin{array}{l}{{\rm{v}}_{dW}} = \beta {v_{dW}} + (1 - \beta )dW\\{{\rm{v}}_{db}} = \beta {v_{db}} + (1 - \beta )db\\W = W - \alpha {{\rm{v}}_{dW}},b = b - \alpha {{\rm{v}}_{db}}\end{array}$$

其中alpha和beta为超参数,beta取值一般为0.9

(2)RMSprop

$$\begin{array}{l}{s_{dW}} = {\beta _2}{s_{dW}} + (1 - {\beta _2})d{W^2}\\{s_{db}} = {\beta _2}{s_{db}} + (1 - {\beta _2})d{b^2}\\W = W - \alpha \frac{{dW}}{{\sqrt {{s_{dW}} + \varepsilon } }},b = b - \alpha \frac{{db}}{{\sqrt {{s_{db}} + \varepsilon } }}\end{array}$$

其中alpha和beta2为超参数.

(3)Adam

$$\begin{array}{l}{v_{dW}} = {\beta _1}{v_{dW}} + (1 - {\beta _1})dW\\{v_{db}} = {\beta _1}{v_{db}} + (1 - {\beta _1})db\\{s_{dW}} = {\beta _2}{s_{dW}} + (1 - {\beta _2})d{W^2}\\{s_{db}} = {\beta _2}{s_{db}} + (1 - {\beta _2})d{b^2}\\v_{dw}^{corrected} = {v_{dW}}/(1 - \beta _1^t)\\v_{db}^{corrected} = {v_{db}}/(1 - \beta _1^t)\\s_{dw}^{corrected} = {s_{dw}}/(1 - \beta _2^t)\\s_{db}^{corrected} = {s_{db}}/(1 - \beta _2^t)\\W = W - \alpha \frac{{v_{dw}^{corrected}}}{{\sqrt {s_{dw}^{corrected} + \varepsilon } }},b = b - \alpha \frac{{v_{db}^{corrected}}}{{\sqrt {s_{db}^{corrected} + \varepsilon } }}\end{array}$$

其中,beta1=0.9,beta2=0.999,epsilon=10^(-8).alpha为学习率,t为迭代次数。

猜你喜欢

转载自blog.csdn.net/LieQueov/article/details/80107467