Easy Start machine learning - Logistic Regression (theory)

Small text | number of small text data of the public tour

On a regression model introduced in the simplest theoretical knowledge as well as linear regression models were from the least-squares method, batch gradient descent method, stochastic gradient descent and low-volume gradient to obtain the optimal solution of linear regression method drop today classification model introduced in the simplest logistic regression model. Logistic regression model sounds like a regression model, then it is how to be a classification model of it?

Above all, start with the regression model, now assume that a function g (x) and  [official]a continuous value, x there are n features, get:  [official] ; then how to use g (x) to solve the classification task? We will often take a threshold value, a positive type when more than this threshold, the threshold value is smaller than the negative type. This threshold selection often takes an intermediate value range, i.e. 0.5, namely: when g (x)> 0.5 is positive type, when g (x) <0.5 When the negative type, g (x) = 0.5 when any judgment, it constitutes a unit jump bounded function .

Unit hop community can function as a dichotomous model, but the model is not continuously differentiable model for seeking optimal solution is not friendly, so tend to use more friendly sigmoid function instead.

Sigmoid function:

sigmoid function has a very good properties, i.e. when z n tends to infinity, y tends to 1, when z tends to minus infinity, y tends to 0, the probability is very suitable for the classification model. It also has a nice derivative nature  [official], it will be used in the solution process. sigmoid function expression is:  [official] ; then let g (x) = z, to give: [official]

 

[official]

[official]

[official]

[official]

Well,

 [official]

[official]

which is

 [official]

So far, the general expression has been derived logistic regression, then how to understand the expression of it?

y is the number of samples n type, i.e. positive class probability of occurrence; 1-y is a number of samples negative type, the probability of occurrence is based negative; it  [official] is the ratio of the number of positive and negative samples, that is, the probability of occurrence of positive and negative samples than , so the positive and negative samples associated probability of occurrence than the number of the data set X linear.

Ideally, an equal number of positive and negative samples, positive and negative samples that is equal to the probability of occurrence, that is  [official] , that is  [official] , it will initially g (x) classification threshold value is set in line with our 0.5.

According to the logistic regression expression, if you know the data set corresponding to X and a coefficient k, we can obtain the probability of occurrence of positive and negative samples, then how to solve the k value of the corresponding X data set by it? Often by the loss of function of the optimal solution can be.

Linear regression is continuous, it is possible to use the square of the model error and defined loss function. But not continuous logistic regression, linear regression experience the natural loss of function definition would be irrelevant. But we can use the likelihood function to derive our loss function.

Set  [official] , the [official]

Then the likelihood function can be written  [official] , and then solving the maximum value L (k) is ok.

Solving process is as follows:

[official]

[official]

[official]

And because  [official] ,

and so

[official]

[official]

[official]

Properties of the derivative with a sigmoid function,  [official] faster to obtain the same result! Do not believe? ! You can verify it! Then  [official] substituting  [official] may be obtained:

[official]

[official]

[official]

[official]

Because the required  [official] maximum value, the parameter k is solved by gradient ascent method:

[official] ,among them[official]

which is [official]

At this point we find an optimum solution k.

— end —

Small data text brigade

The upper right corner stamp "+ concern" for the latest share

If you like it, please share or thumbs

Published 33 original articles · won praise 30 · views 30000 +

Guess you like

Origin blog.csdn.net/d345389812/article/details/94344351