Logistic regression with Perceptron similarities and differences - loss function

Logistic regression and machine perception of similarities and differences:

Both groups are linear classifier;

Two different loss function: logistic regression using maximum likelihood (logarithmic loss function), perceptron using a mean square loss function (i.e. the error distance between the point of separation plane minimize this value)

Improved logistic activation function for the advantage over perceptron.

The former is a sigmoid function, which is a step function. This leads LR is continuously guide, and step functions do not have this property.

LR has the ability to the probability that the end result interpretation (restricts the results between 0-1), sigmoid function is smooth, can be better classification results, the step function is piecewise function, the results of rough classification process , either 0 or 1, instead of returning the probability of a classification.

Why not use logistic regression mean square loss as a loss function

First Imagine the objective function is E_{w,b}=\sum_{i=1}^{m}\left ( y_{i}-\frac{1}{1+e^{-\left ( w^{T}x_{i}+b \right )}}\right )^2 not can not be solved, then why not?

Great God know almost solved my doubts:

If the least square method, the objective function is  E_{w,b}=\sum_{i=1}^{m}\left ( y_{i}-\frac{1}{1+e^{-\left ( w^{T}x_{i}+b \right )}}\right )^2 , non-convex, not easy to solve, it will obtain a local optimum.

Least squares curve as a function of the loss function:

 Least Squares loss function as a logistic regression model, theta is the parameter to be optimized



 

If using maximum likelihood estimation, the objective function is the log-likelihood function:  l_{w,b}=\sum_{i=1}^{m}\left ( -y_{i}\left ( w^{T}x_{i}+b \right )+ln\left ( 1+e^{w^{T}x_{i}+b} \right ) \right ) is about  (w,b) high-end continuously differentiable convex function, you can easily through a number of convex optimization algorithm, such as gradient descent, Newton method.

The maximum likelihood function curve as a function of loss (loss then maximum likelihood function is given later):

Guess you like

Origin blog.csdn.net/asdfsadfasdfsa/article/details/90261142