[Machine] learn basic cross-entropy (cross entropy) loss function is convex do?

Why we have this problem, because when regression in learning logistic, "statistical machine learning" a book of it negative log-likelihood function is convex, and logistic regression negative logarithm of the likelihood function (negative log likelihood) and cross-entropy function (cross entropy) having the same form.

Conclusions given first, when the logistic regression, cross entropy is convex, but a multi-layer neural network, cross entropy not convex.

When logistic regression, cross entropy is convex:

Why is the error function minimized in logistic regression convex? -- Deepak Roy Chittajallu

Multilayer neural network (MLP), cross entropy not convex:

Cost function of neural network is non-convex? - Cross Validated

cross entropy loss function :( \ (\ Hat {Y} \) is the predicted value, \ (Y \) is the true value)

\ [- and \ log \ hat {y} - (1-y) \ log (1- \ hat {and}) \]

The intuition

Is the point of simplicity of explanation, when the logistic regression, proof of two convex function by adding or convex function as \ (Y \) is either 0 or 1, it should prove this case \ (- \ log \ hat { y} \) and \ (- \ log (1- \ hat {y}) \) on \ (W \) is a convex function, which is proved Semidefinite Hessian matrix. Look prove the link above.

While the MLP, intuitive explanation is given in the hidden layer of a neural network weights to exchange two neurons weight of the final output value obtained layer will not change, which indicates if there is an optimal solution, then the exchange neurons after weight, the solution is still the best, so this time there is an optimal solution of the two, it is not a convex function.

Why are solved by logistic regression gradient descent method, it does not seek a direct analytical solution?

In order cross entropy first derivative is 0, you will not find the weight \ (W \) mentioned in the left hand side, i.e. not written \ (w = expression \) of this form, although it is equality constraints, However, the analytical solution is still very difficult to directly seek. Therefore gradient descent method, Newton method, quasi-Newton method used to solve logistic regression.

References

Minimized error function at The IS Why in Logistic Regression Convex -? Deepak Roy Chittajallu
Cost function of Neural Network IS non-Convex -? Cross the Validated
Logistic regression analytical solutions can have it? - zzzzzzzz answer - almost known

Guess you like

Origin www.cnblogs.com/wuliytTaotao/p/11967620.html