sigmoid 函数结合交叉熵后向传播推导

sigmoid(x) 函数定义:

\[
\begin{align*}
\sigma(x) &= \frac{1}{1+e^{-x}} \\
\frac{d(\sigma(x))}{dx} &= \sigma(x)(1-\sigma(x))
\end{align*}
\]

逻辑斯谛回归分类模型定义如下:

\[
\begin{align*}
P(Y=1|x) &= \frac{e^{w \cdot x}}{1+e^{w \cdot x}}\\
&= \frac{1}{1+e^{-w \cdot x}}\\
&=\sigma (w \cdot x)\\
P(Y=0|x) &= \frac{1}{1+e^{w \cdot x}} \\
&=1 - \sigma (w \cdot x)\\
\end{align*}
\]

由上可知, 二分类问题中, sigmoid 函数的输出就是标签值为 1 的样本对应的预测值

由交叉熵函数定义:

\[
crossEntropy = -\sum_{i=0}^{classNum}{y^{label}_i\log {y^{pred}_i}}
\]

\(y^{label}, y^{pred}\) 都是概率分布, 而标签 \(y^{label}\) 使用 one-hot 编码来表明概率分布.

对于二分类来说, 交叉熵损失定义如下:

\[
\begin{align*}
Loss_{crossEntropy}& = - \left[ y^{label}_0 \log y_0^{pred} + y^{label}_1 \log y_1^{pred}\right] \\
&= - \left[ y^{label}_1 \log y_1^{pred} + (1- y^{label}_1) \log(1- y_1^{pred})\right]
\end{align*}
\]

而且 sigmoid 函数的输出就是标签值为 1 的样本对应的预测值,故

\[
\begin{align*}
Loss &= - \left[ y^{label}_1 \log \sigma(x) + (1- y^{label}_1) \log (1- \sigma(x)) \right] \\
\frac{\partial{Loss}}{{\partial{\sigma(x)}}} &= - \left[ \frac {y^{label}_1}{\sigma(x)} - \frac{(1- y^{label}_1)}{(1- \sigma(x))} \right] \\
&= \frac {\sigma(x) -y^{label}_1}{\sigma(x){(1- \sigma(x))}} \\
\end{align*}
\]

已知 \(x = w \cdot z\), 那么 Loss 函数对 w 的倒数为

\[
\begin{align*}
\frac{\partial Loss}{\partial w} &= \frac{\partial{Loss}}{{\partial{\sigma(x)}}}\frac{\partial{\sigma(x)}}{{\partial{x}}}\frac{\partial{x}}{{\partial{w}}} \\
&= \frac {\sigma(x) -y^{label}_1}{\sigma(x){(1- \sigma(x))}} \cdot {\sigma(x)(1-\sigma(x))} \cdot z \\
&= z \cdot \left ({\sigma(x) -y^{label}_1} \right)
\end{align*}
\]

猜你喜欢

转载自www.cnblogs.com/nowgood/p/sigmoidcrossentropy.html