Personal summary of the logistic regression machine learning

1. Logistic regression

1.1 Principle algorithm

Logistic regression is linear basis on the use of linear regression of a similar idea, get a straight line, the data is divided into two categories (each time only two categories).

But it is not as simple as linear regression, the smallest loss function simply seeking to get this straight, because the following two typical cases can be directly seen fit straight line derived some of the data may lead to considerable misclassification.

            

It is different from the linear regression of the following two points:

(1) it is to do the classification problem, so there will be training data with the category label the y- i , such as 1 or 0.

(2) it introduces a sigmoid function:

 

Features sigmoid function with are:

1) It is the compression function value, when greater than 0, sigmoid function will be greater than 0.5, the larger the value, the more close to 1. Conversely, when it is less than 0, sigmoid function value will be less than 0.5.

And we have to do is just binary classification, it is determined which category it belongs can see what kind of larger probability, or to see which type of probability is greater than 0.5, and the value of g (z) coincide, Therefore, we can use g (z) to represent the probability of belonging to the category, if more than 0.5, it is judged as such.

And if the point farther away from the straight line, the more clearly illustrate such classification characteristic, and the value obtained at this time tends to more positive infinity or negative infinity, after substituting the value g (z) obtained tends to 1, indicating that the larger the probability of belonging to this class, instructions probabilities g (z) representative of the classification is very suitable.

2) calculation characteristic sigmoid function are:

 

This can help us to obtain g (z), i.e., the classification probability P (y = 1 | x) and [theta] T The relationship between X, for us to get behind the maximum likelihood function.

 

With the above equation, we obviously can see that if θ is known, can be done to predict the classification. However, at this time we need to get through training θ, where the training data are classified with labels.

The following uses Maximum Likelihood Estimation Method θ, think we get the data, it is the most probable case. Since the data that are independent, it can be obtained even by the probability that the data obtained, i.e., the likelihood function.

 

Obtain the maximum value of formula in claim, to be described derivation is 0, θ obtained is also desired.

However, the formula for the computer, not seeking, after the computer is iterative solver like, it is possible by using the gradient descent method, to obtain a maximum θ such that its negative minimum.

Alternatively the SGD BGD employed, which is consistent with the content of the linear regression, not be introduced.

Newton's method may also be employed, i.e., the second-order Taylor expansion formula to obtain the amount of change of θ, as follows 

When required f (x) the maximum value, the formula for coping derivative equal to 0, the following equation can be obtained:

 

With the above properties, put on, be deployed at θj, can be obtained

 

among them:

 

θ Solving completed. Logistic regression required to get a straight piece of classification.

1.2 algorithm flow

(1) using the existing data, i.e., the training data set, obtain a likelihood function:

 

(2) setting the initial value θ, such as (0,0,0), or other, setting the value to some extent will affect whether the optimal value takes

(3) to give a final iterative solver θ value, the amount of change is smaller than a set threshold or updated after θ becomes larger, i.e. stop the iteration, the iteration formula is as follows, see the specific formula 2.1:

 

(4) obtained by solving θ, with sigmoid function, class prediction may be performed on the back of the data.

 

1.3 Considerations

(1) with a return logistic regression names, but it is doing is the classification problem, but correcting its nature, but also by the logic of probability calculation to obtain the equation of a regression line;

(2) When using the sigmoid function, it has already decided logistic regression algorithm can be binary, because it makes g (z) = P (y = 1 | x), and 1- g (z) = P (y = 0 | x);

(3) classification logistic regression to do more words are multiple binary operation, multi-purpose classification, the principle is always only do binary classification.

Guess you like

Origin www.cnblogs.com/wenghsimu/p/11234978.html