交叉熵的反向传播梯度推导(使用softmax激活函数)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/diyoosjtu/article/details/89426465

1. 多分类问题的交叉熵

设标签 y k = 1 y_k=1 ,也即 x k x_k 对应的第 k k 类的标签为1,则交叉熵损失函数为:
(1) J = j = 1 N y j log a j L = log a k L J = -\sum_{j=1}^Ny_j\log a_j^L = -\log a_k^L \tag{1}
其中 N N 是分类的类别数目。

softmax激活函数的表达式为:
(2) a k L = e z k L j = 1 N e z j L a_k^L = \frac{e^{z_k^L}}{\sum\limits_{j=1}^{N}e^{z_j^L}} \tag{2}

反向传播过程需要对每一个 z j L , j = 1 , 2 ,   , N z_j^L, j=1, 2, \cdots, N 求导数。

(1) 当 j = k j=k 时:
(3) J z j L = J z k L = J a k L a k L z k L = 1 a k L ( e z k L ) j = 1 N e z j L e z k L e z k L ( j = 1 N e z j L ) 2 = 1 a k L a k L ( 1 a k L ) = a k L 1 \begin{aligned} \frac{\partial J}{\partial z_j^L}=\frac{\partial J}{\partial z_k^L}& = \frac{\partial J}{\partial a_k^L}\frac{\partial a_k^L}{\partial z_k^L} \\ & = -\frac{1}{a_k^L}\frac{(e^{z_k^L})\sum\limits_{j=1}^{N}e^{z_j^L}-e^{z_k^L}e^{z_k^L}}{(\sum\limits_{j=1}^{N}e^{z_j^L)^2}} \\ & = -\frac{1}{a_k^L} a_k^L(1- a_k^L) \\ & = a_k^L -1 \tag{3} \end{aligned}

(2) 当 j   k j~\neq k 时:
(4) J z j L = J a k L a k L z j L = 1 a k L 0 j = 1 N e z j L e z k L e z j L ( j = 1 N e z j L ) 2 = a j L \begin{aligned} \frac{\partial J}{\partial z_j^L}& = \frac{\partial J}{\partial a_k^L}\frac{\partial a_k^L}{\partial z_j^L} \\ & = -\frac{1}{a_k^L}\frac{0*\sum\limits_{j=1}^{N}e^{z_j^L}-e^{z_k^L}e^{z_j^L}}{(\sum\limits_{j=1}^{N}e^{z_j^L)^2}} \\ & = a_j^L \tag{4} \end{aligned}

(3)和(4)式可以合并为:
(5) J z j L = a j L y j \frac{\partial J}{\partial z_j^L} = a_j^L - y_j \tag{5}
其中,只有当 j = k j=k 时, y j = 1 y_j=1 ,其余的 y j y_j 都是0。

2. 二分类问题的交叉熵

二分类问题的交叉熵损失函数:
(6) L = ( y log a L + ( 1 y ) log ( 1 a L ) ) L = -(y\log a^L+(1-y)\log(1-a^L)) \tag{6}

(7) J z L = J a L a L z L = ( y a L + 1 y 1 a L ) a L ( 1 a L ) = y ( 1 a L ) + ( 1 y ) a L = a L y \begin{aligned} \frac{\partial J}{\partial z^L}& = \frac{\partial J}{\partial a^L}\frac{\partial a^L}{\partial z^L} \\ & = (- \frac{y}{a^L} + \frac{1-y}{1-a^L})a^L(1-a^L)\\ &=-y(1-a^L) + (1-y)a^L\\ &= a^L - y \tag{7} \end{aligned}

综合比较式(7)和式(5),可以发现两者的形式是一致的。

猜你喜欢

转载自blog.csdn.net/diyoosjtu/article/details/89426465