【学习笔记】BP神经网络学习

简介

误差反向传播算法简称反向传播算法(即BP算法)。
使用反向传播算法的多层感知器又称为BP神经网络。BP算法是一个迭代算法,它的基本思想为:

  • 1、先计算每一层的状态和激活值,直到最后一层(前向传播)

  • 2、计算每一层的误差,误差的计算过程是从最后一层向前推进的

  • 3、更新参数(目标是误差变小)。迭代前面两个步骤,直到满足停止准则(比如相邻两次迭代的误差的差别很小)

本文约定

对于M-P神经元和感知机(简单的前馈神经网络)都在上一篇博文中介绍了,现在先规定一下下面讲解推到过程的时候的一些记号

  •   n l \ n_l  nl表示第   l \ l  l层的神经元个数

  •   f ( ⋅ ) \ f(·)  f() 表示神经元的激活函数(激活函数我另外会再开一篇博文来记录)

  •   W ( l ) ∈ R n l × n l \ W^{(l)} \in \mathbb{R}^{n_l \times n_l}  W(l)Rnl×nl 表示第   l − 1 \ l-1  l1 层到第   l \ l  l 层的权重矩阵

  •   w i j ( l ) \ w^{(l)}_{ij}  wij(l)表示第   l \ l  l层的第   j \ j  j个神经元与上一个,即   ( l − 1 ) \ (l-1)  (l1)层的第   i \ i  i个神经元的连接权重

  •   b i ( l ) \ b^{(l)}_i  bi(l)表示第   l \ l  l层的第   i \ i  i个神经元的偏置

  •   b ( l ) = ( b 1 ( l ) , b 2 ( l ) , . . . , b n l ( l ) ) T ∈ R l n \ b^{(l)} = (b^{(l)}_1, b^{(l)}_2,...,b^{(l)}_{n_l})^T\in\mathbb{R}^n_l  b(l)=(b1(l),b2(l),...,bnl(l))TRln表示第   l − 1 \ l-1  l1层到第   l \ l  l层的偏置

  •   z i ( l ) \ z^{(l)}_i  zi(l) 表示第   l \ l  l层中第   i \ i  i个神经元节点的输入值

  •   z ( l ) = ( z 1 ( l ) , z 2 ( l ) , . . . , z n l ( l ) ) T ∈ R l n \ z^{(l)} = (z^{(l)}_1, z^{(l)}_2,...,z^{(l)}_{n_l})^T\in\mathbb{R}^n_l  z(l)=(z1(l),z2(l),...,znl(l))TRln表示第   l − 1 \ l-1  l1层到第   l \ l  l层的输入

  •   a i ( l ) \ a^{(l)}_i  ai(l)表示第   l \ l  l层中第   i \ i  i个神经元节点的激活值(输出值)

使用的图片来源网络,部分符号约定不同自行变通

本文以三层感知机为例

信息前向传播

由该神经网络可以得出第二层的参数

$$\ z^{(2)}_1 = w^{2}_{11}x_1 + w^{2}_{21}x_2 + w^{2}_{31}x_3 + b^{(2)}_1$$ $$\ z^{(2)}_2 = w^{2}_{12}x_1 + w^{2}_{22}x_2 + w^{2}_{32}x_3 + b^{(2)}_2$$ $$\ z^{(2)}_3 = w^{2}_{13}x_1 + w^{2}_{23}x_2 + w^{2}_{33}x_3 + b^{(2)}_3$$ $$\ a^{(2)}_1 = f(z^{(2)}_1) $$ $$\ a^{(2)}_2 = f(z^{(2)}_2) $$ $$\ a^{(2)}_3 = f(z^{(2)}_3) $$

并且,我们能够用相同的方法计算第三层的参数

$$\ z^{(3)}_1 = w^{3}_{11}a^{(2)}_1 + w^{3}_{21}a^{(2)}_2 + w^{3}_{31}a^{(2)}_3 + b^{(3)}_1$$ $$\ z^{(3)}_2 = w^{3}_{12}a^{(2)}_1 + w^{3}_{22}a^{(2)}_2 + w^{3}_{32}a^{(2)}_3 + b^{(3)}_2$$ $$\ a^{(3)}_1 = f(z^{(3)}_1) $$ $$\ a^{(3)}_2 = f(z^{(3)}_2) $$

所以可以总结出,第   l ( 2 ≤ l ≤ L ) \ l(2\leq l \leq L)  l(2lL) 层神经元的输入和激活值(输出值)

$$\ z^{(l)} = W^{(l)}a^{(l-1)} + b^{(l)} $$ $$\ a^{(l)} = f(z^{(l)}) $$

所以对于前馈神经网络的信息前向传播的传递过程入下:

$$\ x \rightarrow a^{(1)} \rightarrow z^{(2)} \rightarrow ··· \rightarrow z^{(L)} \rightarrow a^{(L)} \rightarrow y $$

误差反向传播

目的:调整   w 、 b \ w 、 b  wb权重和偏置直到最优,知道损失函数最小为止

使用方法:梯度下降法(本文使用批量梯度下降、随机梯度下降)

权重和偏置的更新规则为:

$$\ w_{new} = w_{old} - \mu \frac{\partial J_{total}}{\partial w_{old}} $$ $$\ b_{new} = b_{old} - \mu \frac{\partial J_{total}}{\partial b_{old}} $$
  •   w n e w 、 w o l d \ w_{new}、w_{old}  wnewwold 表示该连接的新权重和旧的权重

  •   b n e w 、 b o l d \ b_{new}、b_{old}  bnewbold 表示该连接的新偏置和旧的偏置

  •   J t o t a l \ J_{total}  Jtotal表示每个   ( x ( i ) , y ( i ) ) \ (x_{(i)},y_{(i)})  (x(i),y(i)) 数据计算出的损失函数的平均

  μ \ \mu  μ 代表学习率,即“步长”

下面我们求损失函数(本文使用平均损失,交叉熵损失函数暂无)

对于训练数据为   ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , . . . , ( x ( N ) , y ( N ) ) \ {(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(N)},y^{(N)})}  (x(1),y(1)),(x(2),y(2)),...,(x(N),y(N))即总共由   N \ N  N组训练数据(不含测试数据),所以它最后的输出的训练实际值就有   y ( i ) = ( y 1 ( i ) , ⋅ ⋅ ⋅ , y n L ( i ) ) T \ y^{(i)} = (y^{(i)}_1,···,y^{(i)}_{nL})^T  y(i)=(y1(i),,ynL(i))T

对于某一个数训练数据   ( x ( i ) , y ( i ) ) \ (x^{(i)},y^{(i)})  (x(i),y(i))来说就有一个损失函数:

\begin{equation}\label{1} \begin{aligned} J_{(i)} &= \frac{1}{2}\parallel y^{(i)}-o^{(i)}\parallel \\ &=\frac{1}{2}\sum^{n_L}_{k=1}(y^{(i)}_k-o^{(i)}_k)^2 \end{aligned} \end{equation}
  •   y ( i ) \ y^{(i)}  y(i)代表期望的输出,也就是我们自己给出的数据中的   y \ y  y

  •   o ( i ) \ o^{(i)}  o(i) 为网络的实际输出

所以一个epoch下来,的平均损失:

$$\ J_{total} = \frac{1}{N} \sum^{N}_{i=1}J_{(i)} $$

输出层权重更新

还是用本文前那个神经网络进行示例进行输出层权重的更新

$\ J_{(3)} = \frac{1}{2}\parallel y{(3)}-o{(3)}\parallel \ \qquad = \frac{1}{2}\parallel y{(3)}-a{(3)}\parallel \ \qquad =\frac{1}{2}\left [(y{(3)}_1-a{(3)}_1)2+(y{(3)}_2-a{(3)}_1)2 \right ] \ \qquad =\frac{1}{2}\left {\left [y{(3)}_1-f(z{(3)}_1)\right]^2+\left [y{(3)}_2-f(z{(3)}2)\right]^2\right } \ \qquad =\frac{1}{2}\left {\left [y{(3)}_1-f(w{3}{11}a^{(2)}_1 + w{3}_{21}a{(2)}_2 + w{3}_{31}a{(2)}3 + b{(3)}_1)\right]2+\left [y{(3)}_2-f(w{3}{12}a^{(2)}_1 + w{3}_{22}a{(2)}_2 + w{3}_{32}a{(2)}_3 + b{(3)}_2)\right]2\right } $

由链式求导法则去分别对   w 11 ( 3 ) 、 w 21 ( 3 ) 、 w 31 ( 3 ) \ w^{(3)}_{11}、w^{(3)}_{21}、w^{(3)}_{31}  w11(3)w21(3)w31(3)求偏导

  ∂ J 3 ∂ w 11 ( 3 ) = ∂ J 3 ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) ∂ z 1 ( 3 ) ∂ w 11 ( 3 ) \ \frac{\partial J_3}{\partial w^{(3)}_{11}}=\frac{\partial J_{3}}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{11}}  w11(3)J3=a1(3)J3z1(3)a1(3)w11(3)z1(3)
  ∂ J 3 ∂ w 21 ( 3 ) = ∂ J 3 ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) ∂ z 1 ( 3 ) ∂ w 21 ( 3 ) \ \frac{\partial J_3}{\partial w^{(3)}_{21}}=\frac{\partial J_{3}}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{21}}  w21(3)J3=a1(3)J3z1(3)a1(3)w21(3)z1(3)
  ∂ J 3 ∂ w 31 ( 3 ) = ∂ J 3 ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) ∂ z 1 ( 3 ) ∂ w 31 ( 3 ) \ \frac{\partial J_3}{\partial w^{(3)}_{31}}=\frac{\partial J_{3}}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{31}}  w31(3)J3=a1(3)J3z1(3)a1(3)w31(3)z1(3)
  ∂ J 3 ∂ w 12 ( 3 ) = ∂ J 3 ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) ∂ z 1 ( 3 ) ∂ w 12 ( 3 ) \ \frac{\partial J_3}{\partial w^{(3)}_{12}}=\frac{\partial J_{3}}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{12}}  w12(3)J3=a2(3)J3z2(3)a2(3)w12(3)z1(3)
  ∂ J 3 ∂ w 22 ( 3 ) = ∂ J 3 ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) ∂ z 1 ( 3 ) ∂ w 22 ( 3 ) \ \frac{\partial J_3}{\partial w^{(3)}_{22}}=\frac{\partial J_{3}}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{22}}  w22(3)J3=a2(3)J3z2(3)a2(3)w22(3)z1(3)
  ∂ J 3 ∂ w 32 ( 3 ) = ∂ J 3 ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) ∂ z 1 ( 3 ) ∂ w 32 ( 3 ) \ \frac{\partial J_3}{\partial w^{(3)}_{32}}=\frac{\partial J_{3}}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{32}}  w32(3)J3=a2(3)J3z2(3)a2(3)w32(3)z1(3)

再拿   w 11 ( 3 ) \ w^{(3)}_{11}  w11(3)为例,带入求偏导得:

  ∂ J 3 ∂ w 11 ( 3 ) = 1 2 ⋅ 2 ( y 1 ( 3 ) − a 1 ( 3 ) ) ( − ∂ a 1 ( 3 ) ∂ w 11 ( 3 ) ) = − ( y 1 ( 3 ) − a 1 ( 3 ) ) f ′ ( z 1 ( 3 ) ) ∂ z 1 ( 3 ) ∂ w 11 ( 3 ) = − ( y 1 ( 3 ) − a 1 ( 3 ) ) f ′ ( z 1 ( 3 ) ) a 1 ( 2 ) \ \frac{\partial J_3}{\partial w^{(3)}_{11}}=\frac{1}{2}\cdot 2(y^{(3)}_1-a^{(3)}_1)(-\frac{\partial a^{(3)}_1}{\partial w^{(3)}_{11}}) \\ \qquad \quad = -(y^{(3)}_1-a^{(3)}_1) f'(z^{(3)}_1)\frac{\partial z^{(3)}_1}{\partial w^{(3)}_{11}} \\ \qquad = -(y^{(3)}_1-a^{(3)}_1)f'(z^{(3)}_1)a^{(2)}_1  w11(3)J3=212(y1(3)a1(3))(w11(3)a1(3))=(y1(3)a1(3))f(z1(3))w11(3)z1(3)=(y1(3)a1(3))f(z1(3))a1(2)

根据上面的公式,我们令:

  δ i ( l ) = ∂ J ∂ z i ( l ) = ∂ J ∂ a i ( l ) ∂ a i ( l − 1 ) ∂ z i ( l ) = − ( y i ( l ) − a i ( l ) ) f ′ ( z i ( l ) ) \ \delta^{(l)}_i = \frac{\partial J}{\partial z^{(l)}_i}= \frac{\partial J}{\partial a^{(l)}_i}\frac{\partial a^{(l-1)}_i}{\partial z^{(l)}_i} = -(y^{(l)}_i-a^{(l)}_i)f'(z^{(l)}_i)  δi(l)=zi(l)J=ai(l)Jzi(l)ai(l1)=(yi(l)ai(l))f(zi(l))

所以:

  ∂ J ∂ w 11 ( 3 ) = δ 1 ( 3 ) a 1 ( 2 ) \ \frac{\partial J}{\partial w^{(3)}_{11}}=\delta^{(3)}_1a^{(2)}_1  w11(3)J=δ1(3)a1(2)
  ∂ J ∂ w 21 ( 3 ) = δ 1 ( 3 ) a 2 ( 2 ) \ \frac{\partial J}{\partial w^{(3)}_{21}}=\delta^{(3)}_1a^{(2)}_2  w21(3)J=δ1(3)a2(2)
  ∂ J ∂ w 31 ( 3 ) = δ 1 ( 3 ) a 3 ( 2 ) \ \frac{\partial J}{\partial w^{(3)}_{31}}=\delta^{(3)}_1a^{(2)}_3  w31(3)J=δ1(3)a3(2)
  ∂ J ∂ w 12 ( 3 ) = δ 2 ( 3 ) a 1 ( 2 ) \ \frac{\partial J}{\partial w^{(3)}_{12}}=\delta^{(3)}_2a^{(2)}_1  w12(3)J=δ2(3)a1(2)
  ∂ J ∂ w 22 ( 3 ) = δ 2 ( 3 ) a 2 ( 2 ) \ \frac{\partial J}{\partial w^{(3)}_{22}}=\delta^{(3)}_2a^{(2)}_2  w22(3)J=δ2(3)a2(2)
  ∂ J ∂ w 32 ( 3 ) = δ 2 ( 3 ) a 3 ( 2 ) \ \frac{\partial J}{\partial w^{(3)}_{32}}=\delta^{(3)}_2a^{(2)}_3  w32(3)J=δ2(3)a3(2)

所以,假设神经网络一共由   L \ L  L层,那么对一般式而言:

  δ i ( L ) = − ( y i ( L ) − a i ( L ) ) f ′ ( z i ( L ) ) \ \delta^{(L)}_i = -(y^{(L)}_i-a^{(L)}_i)f'(z^{(L)}_i)  δi(L)=(yi(L)ai(L))f(zi(L))
  ∂ J w i j ( L ) = δ i ( L ) a i ( L − 1 ) \ \frac{\partial J}{w^{(L)}_{ij}} = \delta^{(L)}_ia^{(L-1)}_i  wij(L)J=δi(L)ai(L1)

对向量/矩阵运算:

  δ ( L ) = − ( y ( L ) − a ( L ) ) ⊙ f ′ ( z ( L ) ) \ \delta^{(L)} = -(y^{(L)}-a^{(L)})\odot f'(z^{(L)})  δ(L)=(y(L)a(L))f(z(L))
  ▽ W ( L ) J = δ ( L ) ( a ( L − 1 ) ) T \ \bigtriangledown_{W^{(L)}}J = \delta^{(L)}(a^{(L-1)})^T  W(L)J=δ(L)(a(L1))T

再用这个式子进行权重的更新即可

$$\ w_{new} = w_{old} - \mu \frac{\partial J_{total}}{\partial w_{old}} $$

隐藏层权重更新

隐藏层的权重更新也是使用链式法则求偏导数,只不过平时使用的都是向量而已:

  w 11 ( 2 ) \ w^{(2)}_{11}  w11(2)更新:

  ∂ J 3 ∂ w 11 ( 2 ) = ∂ J 3 ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) ∂ z 1 ( 3 ) ∂ a 1 ( 2 ) ∂ a 1 ( 2 ) ∂ z 1 ( 2 ) z 1 ( 2 ) w 11 ( 2 ) + ∂ J 3 ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) ∂ z 2 ( 3 ) ∂ a 1 ( 2 ) ∂ a 1 ( 2 ) ∂ z 1 ( 2 ) z 1 ( 2 ) w 11 ( 2 ) \ \frac{\partial J_3}{\partial w^{(2)}_{11}}=\frac{\partial J_{3}}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{\partial z^{(3)}_1}{\partial a^{(2)}_{1}}\frac{\partial a^{(2)}_{1}}{\partial z^{(2)}_1}\frac{z^{(2)}_1}{w^{(2)}_{11}}+\frac{\partial J_{3}}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{\partial z^{(3)}_2}{\partial a^{(2)}_{1}}\frac{\partial a^{(2)}_{1}}{\partial z^{(2)}_1}\frac{z^{(2)}_1}{w^{(2)}_{11}}  w11(2)J3=a1(3)J3z1(3)a1(3)a1(2)z1(3)z1(2)a1(2)w11(2)z1(2)+a2(3)J3z2(3)a2(3)a1(2)z2(3)z1(2)a1(2)w11(2)z1(2)

再结合

$$\ w_{new} = w_{old} - \mu \frac{\partial J_{total}}{\partial w_{old}} $$

其他隐藏层权重更新同理,在这里不再过多赘述

接着使用刚刚我们定义的   δ i ( l ) \ \delta^{(l)}_i  δi(l)推导公式

  ∂ J ∂ w i j ( l ) = ∂ J ∂ z i ( l ) = δ i ( l ) ∂ z i ( l ) w i j ( l ) = δ i ( l ) a j ( l − 1 ) \ \frac{\partial J}{\partial w^{(l)}_{ij}}=\frac{\partial J}{\partial z^{(l)}_i}=\delta^{(l)}_i\frac{\partial z^{(l)}_i}{w^{(l)}_{ij}}=\delta^{(l)}_ia^{(l-1)}_j  wij(l)J=zi(l)J=δi(l)wij(l)zi(l)=δi(l)aj(l1)

当在隐藏层时,又链式法则和函数和求导公式就有:

  ∂ J ∂ z i ( l ) = ∂ J ∂ z 1 ( l − 1 ) ∂ z 1 ( l − 1 ) ∂ z ( i ) + ∂ J ∂ z 2 ( l − 1 ) ∂ z 2 ( l − 1 ) ∂ z ( i ) + ⋅ ⋅ ⋅ + ∂ J ∂ z n l + 1 ( l − 1 ) ∂ z n l + 1 ( l − 1 ) ∂ z ( i ) = ∑ j = 1 n l + 1 ∂ J ∂ z j ( l + 1 ) ∂ z j l + 1 ∂ z i l \ \frac{\partial J}{\partial z^{(l)}_i} = \frac{\partial J}{\partial z^{(l-1)}_1}\frac{\partial z^{(l-1)}_1}{\partial z^{(i)}}+\frac{\partial J}{\partial z^{(l-1)}_2}\frac{\partial z^{(l-1)}_2}{\partial z^{(i)}}+···+\frac{\partial J}{\partial z^{(l-1)}_{n_l+1}}\frac{\partial z^{(l-1)}_{n_l+1}}{\partial z^{(i)}}=\sum^{n_l+1}_{j=1}\frac{\partial J}{\partial z^{(l+1)}_j}\frac{\partial z^{l+1}_j}{\partial z^{l}_i}  zi(l)J=z1(l1)Jz(i)z1(l1)+z2(l1)Jz(i)z2(l1)++znl+1(l1)Jz(i)znl+1(l1)=j=1nl+1zj(l+1)Jzilzjl+1

所以

$$\ \delta^{(l)}_i = \frac{\partial J}{\partial z^{(l)}_i}=\sum^{n_l+1}_{j=1}\frac{\partial J}{\partial z^{(l+1)}_j}\frac{\partial z^{l+1}_j}{\partial z^{l}_i}=\sum^{n_l+1}_{j=1}\delta^{(l+1)}_j\frac{\partial z^{l+1}_j}{\partial z^{l}_i} $$

又因为

  z j ( l + 1 ) = ∑ i = 1 n l w j i ( l + 1 ) a i ( l ) + b j ( l + 1 ) = ∑ i = 1 n l w j i ( l + 1 ) f ( z i ( l ) ) + b j ( l + 1 ) \ z^{(l+1)}_j=\sum^{n_l}_{i=1}w^{(l+1)}_{ji}a^{(l)}_i+b^{(l+1)}_j = \sum^{n_l}_{i=1}w^{(l+1)}_{ji}f(z^{(l)}_i)+b^{(l+1)}_j  zj(l+1)=i=1nlwji(l+1)ai(l)+bj(l+1)=i=1nlwji(l+1)f(zi(l))+bj(l+1)

所以有:

  ∂ z j ( l + 1 ) ∂ z i ( l ) = ∂ z j ( l + 1 ) ∂ a i ( l ) ∂ a i ( l ) ∂ z j ( l ) = w j i ( l + 1 ) f z i ( l ) \ \frac{\partial z^{(l+1)}_j}{\partial z^{(l)}_i}= \frac{\partial z^{(l+1)}_j}{\partial a^{(l)}_i}\frac{\partial a^{(l)}_i}{\partial z^{(l)}_j}=w^{(l+1)}_{ji}f{z^{(l)}_i}  zi(l)zj(l+1)=ai(l)zj(l+1)zj(l)ai(l)=wji(l+1)fzi(l)

再带入前面的   δ i ( l ) \ \delta^{(l)}_i  δi(l)

$$\ \delta^{(l)}_i = f'(z^{(l)}_i)\sum^{n_l+1}_{j=1}\delta^{(l+1)}_{j}w^{(l+1)}_{ji} $$ 对向量/矩阵运算: $$\ \delta^{(l)}_i = f'(z^{(l)}_i)\odot (W^{(l+1)})^T\delta^{(l+1)} $$

输出层偏置更新

偏置的更新其实和权重更新是一样的

输出层的偏置比较好算

  ∂ J ∂ b 1 ( 3 ) = ∂ J ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) z 1 ( 3 ) b 1 ( 3 ) \ \frac{\partial J}{\partial b^{(3)}_1} = \frac{\partial J}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{z^{(3)}_1}{b^{(3)}_1}  b1(3)J=a1(3)Jz1(3)a1(3)b1(3)z1(3)
  ∂ J ∂ b 2 ( 3 ) = ∂ J ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) z 2 ( 3 ) b 2 ( 3 ) \ \frac{\partial J}{\partial b^{(3)}_2} = \frac{\partial J}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{z^{(3)}_2}{b^{(3)}_2}  b2(3)J=a2(3)Jz2(3)a2(3)b2(3)z2(3)

再结合

$$\ b_{new} = b_{old} - \mu \frac{\partial J_{total}}{\partial b_{old}} $$

隐藏层偏执更新

隐藏层偏置更新和权重更新也是一个道理

  ∂ J ∂ b 1 ( 2 ) = ∂ J ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) z 1 ( 3 ) a 1 ( 2 ) a 1 ( 2 ) z 1 ( 2 ) z 1 ( 2 ) b 1 ( 2 ) + ∂ J ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) z 2 ( 3 ) a 1 ( 2 ) a 1 ( 2 ) z 1 ( 2 ) z 1 ( 2 ) b 1 ( 2 ) \ \frac{\partial J}{\partial b^{(2)}_1} = \frac{\partial J}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{z^{(3)}_1}{a^{(2)}_1}\frac{a^{(2)}_1}{z^{(2)}_1}\frac{z^{(2)}_1}{b^{(2)}_1}+\frac{\partial J}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{z^{(3)}_2}{a^{(2)}_1}\frac{a^{(2)}_1}{z^{(2)}_1}\frac{z^{(2)}_1}{b^{(2)}_1}  b1(2)J=a1(3)Jz1(3)a1(3)a1(2)z1(3)z1(2)a1(2)b1(2)z1(2)+a2(3)Jz2(3)a2(3)a1(2)z2(3)z1(2)a1(2)b1(2)z1(2)
  ∂ J ∂ b 2 ( 2 ) = ∂ J ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) z 1 ( 3 ) a 2 ( 2 ) a 2 ( 2 ) z 2 ( 2 ) z 2 ( 2 ) b 2 ( 2 ) + ∂ J ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) z 2 ( 3 ) a 2 ( 2 ) a 2 ( 2 ) z 2 ( 2 ) z 2 ( 2 ) b 2 ( 2 ) \ \frac{\partial J}{\partial b^{(2)}_2} = \frac{\partial J}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{z^{(3)}_1}{a^{(2)}_2}\frac{a^{(2)}_2}{z^{(2)}_2}\frac{z^{(2)}_2}{b^{(2)}_2}+\frac{\partial J}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{z^{(3)}_2}{a^{(2)}_2}\frac{a^{(2)}_2}{z^{(2)}_2}\frac{z^{(2)}_2}{b^{(2)}_2}  b2(2)J=a1(3)Jz1(3)a1(3)a2(2)z1(3)z2(2)a2(2)b2(2)z2(2)+a2(3)Jz2(3)a2(3)a2(2)z2(3)z2(2)a2(2)b2(2)z2(2)
  ∂ J ∂ b 3 ( 2 ) = ∂ J ∂ a 1 ( 3 ) ∂ a 1 ( 3 ) ∂ z 1 ( 3 ) z 1 ( 3 ) a 3 ( 2 ) a 3 ( 2 ) z 3 ( 2 ) z 3 ( 2 ) b 3 ( 2 ) + ∂ J ∂ a 2 ( 3 ) ∂ a 2 ( 3 ) ∂ z 2 ( 3 ) z 2 ( 3 ) a 3 ( 2 ) a 3 ( 2 ) z 3 ( 2 ) z 3 ( 2 ) b 3 ( 2 ) \ \frac{\partial J}{\partial b^{(2)}_3} = \frac{\partial J}{\partial a^{(3)}_1}\frac{\partial a^{(3)}_1}{\partial z^{(3)}_1}\frac{z^{(3)}_1}{a^{(2)}_3}\frac{a^{(2)}_3}{z^{(2)}_3}\frac{z^{(2)}_3}{b^{(2)}_3}+\frac{\partial J}{\partial a^{(3)}_2}\frac{\partial a^{(3)}_2}{\partial z^{(3)}_2}\frac{z^{(3)}_2}{a^{(2)}_3}\frac{a^{(2)}_3}{z^{(2)}_3}\frac{z^{(2)}_3}{b^{(2)}_3}  b3(2)J=a1(3)Jz1(3)a1(3)a3(2)z1(3)z3(2)a3(2)b3(2)z3(2)+a2(3)Jz2(3)a2(3)a3(2)z2(3)z3(2)a3(2)b3(2)z3(2)

再根据对权重的推论,同理可得:

$$\ \delta^{(l)}_i = \frac{\partial J}{\partial b^{(l)}_i}=\frac{\partial J}{\partial z^{(l)}_i}\frac{\partial z^{(l)}_i}{b^{(l)}_i} $$ 对向量/矩阵运算: $$\ \delta^{l}=\bigtriangledown_b^{(l)}J $$

再结合:

$$\ b_{new} = b_{old} - \mu \frac{\partial J_{total}}{\partial b_{old}} $$

猜你喜欢

转载自blog.csdn.net/Fosu_Chenai/article/details/111193147