关于深度学习的backpropagation

loss function:

c = 0.5*(y^n-y^t)^2

在gradient descent时:\tfrac{\partial C}{\partial w_i_j^l } = \tfrac{\partial C}{\partial z_j^l }*\tfrac{\partial z_j^l }{\partial w_i_j^l }    (前向*后向)

前向:\tiny \\z^l & = w^l * x^l + b^l \\ a^l & = \sigma (z^l) \\ x^l = a^{l-1} 所以 \tfrac{\partial z_j^l}{\partial w_i_j^l } = x^l_j=\alpha^{l-1}_i

后向:

\tfrac{\partial C}{\partial z^l} = [\tfrac{\partial C}{\partial a_1^l }* \sigma ^{'}{(z^l_1)},...,\tfrac{\partial C}{\partial a_n^l }* \sigma ^{'}{(z^l_n)}]

\tfrac{\partial C}{\partial z^{l-1}} = [\tfrac{\partial C}{\partial w^{l}}] *[\tfrac{\partial C}{\partial z^{l}}] *\sigma^{'}(z^{l-1})

x^l * \sigma(z^{l-1}) -> *w ^l>x^{l-1}

注:截图来自台湾大学李宏毅教授课程PPT,对于我本人理解BP很有启发

后向求导具体过程:

\\\tfrac{\partial C}{\partial z_j^l } = \tfrac{\partial C}{\partial a_j^l }*\tfrac{\partial a_j^l }{\partial z_j^l } \\ ...... ={\color{Red} \tfrac{\partial C}{\partial a_j^l }* \sigma ^{'}{(z^l_j)} }\\ \\ \tfrac{\partial C}{\partial z^l} = [\tfrac{\partial C}{\partial a_1^l }* \sigma ^{'}{(z^l_1)},...,\tfrac{\partial C}{\partial a_n^l }* \sigma ^{'}{(z^l_n)}]

\\\tfrac{\partial C}{\partial z_j^{l-1} } = \tfrac{\partial C}{\partial a_j^{l-1} }*\tfrac{\partial a_j^{l-1} }{\partial z_j^{l-1} } \\ .......= \sum _{i=1,n^l}\tfrac{\partial C}{\partial a_i^{l} }*\tfrac{\partial a_i^{l} }{\partial a_j^{l-1} }*\tfrac{\partial a_j^{l-1} }{\partial z_j^{l-1} } \\ .... ...= \sum _{i=1,n^l}\tfrac{\partial C}{\partial a_i^{l} }*\tfrac{\partial \sigma(w^l_{ji} * {a^{l-1}_{j}}+b^l_{ji})) }{\partial a_j^{l-1} }* \sigma ^{'}{(z_{j}^{l-1})} \\ .......= {\color{Red} \sum _{i=1,n^l}\tfrac{\partial C}{\partial a_i^{l} }*\sigma ^{'}{(z^{l}_{i})}}*w^l_{ji}* \sigma ^{'}{(z^{l-1}_j))}\\ .......=[w^l_{j1},...,w^l_{jn^l}]*[\tfrac{\partial C}{\partial a_1^{l}}*\sigma^{'}(z_1^l),...,\tfrac{\partial C}{\partial a^{l}_{n^l}}*\sigma^{'}(z_{n^l}^l)] *\sigma^{'}(z^{l-1}_j))\\.......=[\tfrac{\partial C}{\partial w^{l}}]*[\tfrac{\partial C}{\partial z^l}]*\sigma^{'}(z_j^{l-1}) \tfrac{\partial C}{\partial z^{l-1}} = [\tfrac{\partial C}{\partial w^{l}}] *[\tfrac{\partial C}{\partial z^{l}}] *\sigma^{'}(z^{l-1})

猜你喜欢

转载自blog.csdn.net/qq_32110859/article/details/85698564