Sigmoid] [deep learning and softmax

sigmoid与softmax

and the output unit may softmax sigmoid neural network.

principle

sigmoid

Predicted binary variable $ Y $ value, defined as follows:

$$\hat{y}= \sigma( \omega^{T}h+b)=\frac{1}{1+exp{-( \omega^ {T}h+b)}}$$

Typically use maximum likelihood to learn, because the maximum likelihood cost function is $ -log (y | x) $, the cost function of log offset the sigmoid in exp, so the result is that only when the $ \ sigma $ Function argument is very large will have a saturation gradient is very small.

 

softmax

The output unit softmax most common multi-classifier.

$$z=W^{T}h+b$$

$$y_{i}=softmax(z_{i})=\frac{exp(z_{i})}{\sum_{j}exp(z_{j})}$$

Similarly, the number of log-likelihood can be offset by exp, and other forms of objective function (such as square error) can not play a role in learning.

 

Cross entropy and softmax loss

Entropy is a measure of cross-distance distribution and the true sample prediction sample distribution, the smaller the distance distribution, the smaller the value of the cross-entropy, the following formula:

$$H(p,q)=\sum_{i}^{n}-p_{i}log(q_{i})$$

Where $ p $ is the true distribution of the sample, $ q $ is predictive distribution, $ n $ is the number of samples. The form of cross-entropy, the loss can write softmax function formula, $ \ hat {y_ {i}} $ is the real tag training data:

$$L=\sum_{i}^{n}-\hat{y_{i}}log(y_{i})$$

 

softmax backpropagation

Chain derivation rule can be seen:

$$\frac{ \partial L}{ \partial z_{i}}=\frac{ \partial L}{ \partial y_{j}} \frac{ \partial y_{j}} {\partial z_{i}}$$

Wherein L $ $ to $ y_ {j} $ is the derivative:

$$\frac{\partial L}{\partial y_{j}}=\frac{\partial \left[- \sum_{j} \hat{y_{j}}log(y_{j}) \right]}{\partial y_{j}}=- \sum_{j} \frac{\hat{y_{j}}}{y_{j}}$$

$ Y_ {j} $ $ z_ {i} of the derivative to be divided into two parts $ See

  • $ J = i $ when:

$$\frac{ \partial y_{j}} {\partial z_{i}}=\frac{ \partial \left[ \frac{e^{z_{i}}}{\sum_{k}e^{z_{k}}}\right]} {\partial z_{i}}$$

$$ = \ frac {e ^ {z_ {i}} \ sum_ {k} e ^ {z_ {k}} - (e ^ {z_ {i}}) ^ {2}} {(\ sum_ {k} e ^ {z_ {k}}) ^ {2}} $$

$$ = \ frac {e ^ {z_ {i}}} {\ sum_ {k} e ^ {z_ {k}}} (1- \ frac {e ^ {z_ {i}}} {\ sum_ {k } e ^ {z_ {k}}}) $$

$$=y_{i}(1-y_{i})$$

  • $j \not= i$时:

$$\frac{ \partial y_{j}} {\partial z_{i}}=\frac{ \partial \left[ \frac{e^{z_{i}}}{\sum_{k}e^{z_{k}}}\right]} {\partial z_{i}}$$

$$ = \ frac {0-e ^ {z_ {j} z_ {i}}} {(\ sum_ {k} e ^ {z_ {k}}) ^ {2}} $$

$$ = - y_ {j} y_ {i} $$

Therefore

Guess you like

Origin www.cnblogs.com/4PrivetDrive/p/12168564.html