softmax cross entropy loss function derivative

Source: https://www.jianshu.com/p/c02a1fbffad6

Softmax straightforward derivation cross entropy loss function

Softmax seeking to write a derivation guide, not only to their own clear thinking, can also benefit of the public, would not Miya ~
softmax frequently added in the neural network classification tasks in the output layer, back-propagation neural network key the step is the derivation from this process can also be a deeper understanding of the process of back-propagation, but also have more thought to the issue of the gradient spread.

softmax function

SoftMax (Flexible max) function, generally in the neural network, as the output layer SoftMax classification task. In fact, can be considered softmax output is the probability of selecting several categories, for example, I have a classification task, to be divided into three categories, according to their relative softmax function of the size of the output probability of selecting three categories, and the probability is 1 .

Formula softmax function is of the form:

img

Representative S_i i-th output neuron.
ok, in fact, the output behind a set this function before the derivation, we look at each symbol representing a unified network, avoiding sudden appearance behind a symbol ignorant what force is derived not go on.
First is the output of neuron, a neural below:

img

Output neuron to:

img

Wherein w_ {ij} is the j-th weight of the i th neuron weight, b is an offset value. z_i denotes the i-th output of the network.
This output softmax to add a function, it has become such:

img

a_i i represents the output values ​​softmax right side is applied a softmax function.

Loss function loss function

In the back-propagation neural network, the requirements of a loss function, this loss function actually represents the error estimate of the true value of the network, knowing the error in order to know how to modify the network weights.

Loss of function can take many forms, here is the cross-entropy function, mainly due to the result of this derivation is simple, easy to calculate, and solve some of the cross-entropy loss function study the problem of slow. Cross-entropy function is this:

img

Y_i which represents real classification results.
Here may be nested several levels, but do not worry, the following will be a step by step derivation, it is strongly recommended paper to write about, sometimes to see the light and looked at it confused, and he watched and derive more conducive to understanding ~

Final Preparations

When I started looking softmax deduced, sometimes see half do not know how to launch, in fact, mainly because some derivation rules have forgotten, alas -
so here the basis of derivation posted rules and formulas - some forget about friends can look at:

img

The derivation process

Well, this case started -
First of all, we must be clear about what we ask, we ask for is our loss for the neuron output (z_i) gradient, namely:

img

The composite function derivation method:

img

A man may be questioned why there is a_j instead a_i, look here what softmax formula, because the formula softmax characteristics, its denominator includes all the output neurons, therefore, is not equal to the other output of the i which also contains z_i, all be incorporated into a calculation range, and calculates the latter two cases may need to see i = j and i ≠ j derivation.
Below we push one by one:

img

The second a little more complicated, we put it into two cases:

If i = j:

img

If i ≠ j:

ok, let's just take a combination of the above it:

img

The final result looks a lot simpler, and finally, for the classification, given our results y_i will eventually have a category 1, other categories are 0, therefore, for the classification problem, this gradient is equal to:

img

Source: https://www.jianshu.com/p/c02a1fbffad6

Guess you like

Origin www.cnblogs.com/phoenixash/p/12129214.html