Softmax与Cross-entropy的求导

引言

在多分类问题中,一般会把输出结果传入到softmax函数中,得到最终结果。并且用交叉熵作为损失函数。本来就来分析下以交叉熵为损失函数的情况下,softmax如何求导。

对softmax求导

softmax函数为:

y i = e z i k = 1 K e z k y_i = \frac{e^{z_i}}{\sum_{k=1}^K e^{z_k}}

这里 K K 是类别的总数,接下来求 y i y_i 对某个输出 z j z_j 的导数,
y i z j = e z i k = 1 K e z k z j \frac{\partial y_i}{\partial z_j} = \frac{\partial \frac{e^{z_i}}{\sum_{k=1}^K e^{z_k}}}{\partial z_j}

这里要分两种情况,分别是 i = j i=j i j i \neq j 。当 i = j i=j 时, e z i e^{z_i} z j z_j 的导数为 e z i e^{z_i} ,否则当 i j i \neq j 时,导数为 0 0

i = j i = j
y i z j = e z i k = 1 K e z k e z i e z j ( k = 1 m e z k ) 2 = e z i k = 1 m e z k e z i k = 1 m e z k e z j k = 1 m e z k = y i y i 2 = y i ( 1 y i ) \frac{\partial y_i}{\partial z_j} = \frac{e^{z_i}\cdot \sum_{k=1}^K e^{z_k} - e^{z_i} \cdot e^{z_j} }{(\sum_{k=1}^m e^{z_k})^2} \\ = \frac{e^{z_i}}{\sum_{k=1}^m e^{z_k}} - \frac{e^{z_i}}{\sum_{k=1}^m e^{z_k}} \cdot \frac{e^{z_j}}{\sum_{k=1}^m e^{z_k}} \\ = y_i - y_i^2 = y_i(1 - y_i)

i j i \neq j
y i z j = 0 k = 1 K e z k e z i e z j ( k = 1 m e z k ) 2 = e z i k = 1 m e z k e z j k = 1 m e z k = y i y j \frac{\partial y_i}{\partial z_j} = \frac{0 \cdot \sum_{k=1}^K e^{z_k} - e^{z_i} \cdot e^{z_j}}{(\sum_{k=1}^m e^{z_k})^2} \\ = - \frac{e^{z_i}}{\sum_{k=1}^m e^{z_k}} \cdot \frac{e^{z_j}}{\sum_{k=1}^m e^{z_k}} \\ = - y_i y_j

对cross-entropy求导

损失函数 L L 为:

L = k y ^ k log y k L = -\sum_k \hat y_k \log y_k

其中 y ^ k \hat y_k 是真实类别,相当于一个常数,接下来求 L L z j z_j 的导数

L z j = ( k y ^ k log y k ) z j = ( k y ^ k log y k ) y k y k z j = k y ^ k 1 y k y k z j = ( y ^ k y k ( 1 y k ) 1 y k ) k = j k j y ^ k 1 y k ( y k y j ) = y ^ j ( 1 y j ) k j y ^ k ( y j ) = y ^ j + y ^ j y j + k j y ^ k ( y j ) = y ^ j + k y ^ k ( y j ) = y ^ j + y j = y j y ^ j \frac{\partial L}{\partial z_j} = \frac{\partial -(\sum_k \hat y_k \log y_k)}{z_j} = \frac{\partial -(\sum_k \hat y_k \log y_k)}{\partial y_k} \frac{\partial y_k}{\partial z_j} \\ = -\sum_k \hat y_k \frac{1}{y_k} \frac{\partial y_k}{z_j} \\ = \left(-\hat y_k \cdot y_k(1 - y_k) \frac{1}{y_k} \right)_{k=j} - \sum_{k \neq j} \hat y_k \frac{1}{y_k} (-y_ky_j) \\ = -\hat y_j (1 - y_j) - \sum_{k \neq j} \hat y_k (-y_j) \\ = -\hat y_j + \hat y_j y_j + \sum_{k \neq j} \hat y_k (y_j) \\ = -\hat y_j + \sum_{k} \hat y_k (y_j) \\ = -\hat y_j + y_j \\ = y_j -\hat y_j

这里用到了 k y ^ k = 1 \sum_{k} \hat y_k = 1

可以看到,求导结果非常简单,如果不推倒都不敢信。

猜你喜欢

转载自blog.csdn.net/yjw123456/article/details/106767782