Cross entropy loss function (Cross Entropy)

Cross entropy loss function (Cross Entropy)

  We logistic regression algorithm introduced in the cross-entropy function, but the last time, I did not do a detailed explanation of the cross-entropy loss function. Here it onto the list. Loss function is also called the error function, a measure of the operation of the algorithm. In classification, and are positively correlated with the number of misclassification, the more misclassification, the greater the loss.
  We drawn in logistic regression, cross entropy, he said at the time that if a wrong sub-category, it will generate losses. J ( θ ) = y ^ l n y + ( 1 y ^ ) l n ( 1 y ) J(θ)=\hat ylny+(1-\hat y)ln(1-y) , you can see this formula, if you predict the category 1 is classified as 1, it will not produce loss, empathy, the category is forecast for the category 0 to 0 will not generate losses, in addition after misclassification loss is a result of a function, a loss is generated.
  However, we predicted results y ^ \ Hat and Is actually a probability value, can not be strictly equal to 0 or 1, under normal circumstances will still generate losses, but this function ensures that, when the true category y = 1, the predicted probability close to the real category 1, the resulting error is small , whereas y = 0, the predicted probability of the category 1, the smaller the error. After the analysis of this particular y = 0 and y = 1 is brought into the case of the loss analysis out easily.
  Further, as a general loss function has a characteristic, if required using a gradient descent (including back propagation) time series algorithm, there is generally a requirement that function is convex (concave) function, which is the basis of convex optimization, and it means that the second derivative of the loss function requires constant negative (always positive).
  We can calculate a loss for each training sample, if we put all the samples losses sum to get the loss of the entire data set. Our aim is to make the entire loss is minimized, the more accurate the predictions of our model. But in fact, a lot of loss function satisfies this feature, but why a soft spot for logistic regression in cross-entropy losses? This problem can be seen directly derived.
  Because logistic regression was used to do a sigmoid probability normalized, so that we do probability of loss of output, then back propagation loss to the parameters, the equivalent of doing a complex function.
Here Insert Picture Description
  This description is derived Although we use the cross-entropy loss and sigmoid Predicted probability normalized to do, these two functions are more complex, but after the two functions are combined, we need to update the parameter θ when, for derivative θ find out it is very simple, so that when the gradient descent, save a lot of complex calculations. This also explains why the sigmoid and cross-entropy loss is so match.

Published 39 original articles · won praise 6 · views 550 000 +

Guess you like

Origin blog.csdn.net/m0_38065572/article/details/105045925