1. What is the logistic regression;

2, the logistic regression to derive and process;

3, multi-logistic regression classification

4, logistic regression linear regression VS

5, logistic regression VS SVM

1. What makes logistic regression;

Called regression, is actually classified, by calculating the $ P (y = 0 | x; \ theta) $ size to predict the classification categories, the predicted category is 0, 1, rather than probability, but the probability is calculated; $ 0 \ leq P (y = 0 | x; \ theta) \ leq 1 $, a probability value itself does not have the concept of non-1, i.e., 0;

Two logistic regression: x is the input, y is the output;

$ P (y = 0 | x) = \ frac {1} {1 + e ^ {-} z} $

$P(y=0|x)=\frac{1}{1+e^{-(\omega x+b)}}$

$P(y=1|x;\theta )+P(y=0|x;\theta )=1$

$z=\omega x+b$

$\frac{\partial z}{\partial w}=x$

$\frac{\partial z}{\partial b}=1$

logistic: logarithmic probability function

Probability: probability p / occurrence does not occur 1-p $ \ frac {p} {1-p} $

Logarithmic probability: $ logit (p) = \ log \ frac {p} {1-p} = \ omega x $

$ \ Omega x + b = 0 $ That decision boundary;

When p> 1-p, $ \ frac {p} {1-p}> 1 $, $ \ log \ frac {p} {1-p}> 0 $, i.e. $ \ omega x> 0 $;

When p <1-p, $ \ frac {p} {1-p} <1 $, $ \ log \ frac {p} {1-p} <0 $, i.e. $ \ omega x <0 $;

$z=\omega x+b$

When the time $ 0 $ z \ geq, $ g (z) \ geq 0.5 $, $ y = 1 $

When $ z <0 $ time, $ g (z) <0.5 $, $ y = 0 $

2, the logistic regression to derive and process;

:( model parameter estimation using likelihood function to estimate model parameters)

Loss function: $ \ prod_ {i = 1} ^ {N} p ^ {y_ {i}} (1-p) ^ {1-y_ {i}} $ (herein referred to as cross-entropy loss form)

Write log-likelihood function, find the maximum;

Gradient descent, seeking W;

Into P (y = k | x; \ theta) classification;

Derivation:

Likelihood function: $ L = p ^ {y_ {i}} (1-p) ^ {1-y_ {i}} $

Log-likelihood function: $ L = p {y_ {i}} + (1-p) ({1-y_ {i}}) $

From the maximum demand log- likelihood function, for the minimum converted to:

$L=-(p{y_{i}}+(1-p)({1-y_{i}}))$

To seek by gradient descent:

$\frac{\partial L}{\partial p}=-\frac{y_{i}}{p}+\frac{1-y_{i}}{1-p}$

$p=\frac{1}{1+e^{-z}}$

$\frac{\partial p}{\partial z}=\frac{e^{-z}}{(1+e^{-z})^{2}}=p(1-p)$

$\frac{\partial z}{\partial w}=x$

$\frac{\partial z}{\partial b}=1$

$dw=(p-y_{i})x$

$db=(p-y_{i})$

3, multi-logistic regression classification

Method a: N corresponds to two-classification made:

1VS23,$h_{\theta }^{1}(x)$

2VS13,$h_{\theta }^{2}(x)$

3VS12;$h_{\theta }^{3}(x)$

Seeking maximum $ h _ {\ theta} ^ {i} (x) $ corresponding category;

Method two: into the sigmoid function softmax

At this time, the cost function softmax regression algorithm is shown below (wherein ):

Obviously, the above formula is a generalization of logistic regression loss function.

We can lose logistic regression function of the form to read as follows:

Then use gradient descent find it.

4, logistic regression linear regression VS

Linear Regression:

$f(x_{i})=\omega x_{i}+b$，使得$y_{i}\approx f(x_{i})$

Seeking least-squares method

Generally linear regression error function squared difference (from maximum likelihood estimation),

Logistic regression but it is non-convex function, only the local optimal solution;

Logistic regression using cross entropy cost function (derived from the maximum likelihood estimation), because: convex function, there is a global optimal solution;

difference:

1, logistic regression classification, regression is a linear regression;

2, because of logistic regression variables are discrete, linear regression of the dependent variable is continuous;

Logistic regression is not linear regression plus activation function

$h_{\theta }(x)=g(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\theta _{3}x_{1}^{2}+\theta _{3}x_{2}^{2})$

Characterized by using a polynomial can be obtained more complex boundary, not just linear division;

Decision boundary is not a training set of attributes, but assumes itself $ h {\ theta} (x) $ _ parameters and their properties;

Nonlinear model, but is essentially linear classification model;

Add sigmoid mapping on linear regression, the estimated $ P (y = 1 | x) $ probability to classify;

$ \ Omega x + b = 0 $ as the decision boundary, into achieve linear;

E.g:

$\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}=0$

$\theta _{0}+\theta _{1}x_{1}^{2}+\theta _{2}\sqrt{x_{2}}=0$

Seemingly not linear, but only on the variables do change;

The $ t_ {1} = x_ {1} ^ {2}, t_ {2} = \ sqrt {x_ {2}} $

$\theta _{0}+\theta _{1}t_{1}+\theta _{2}t_{2}=0$

5, logistic regression VS SVM

$$ g (z) = \ frac {1} {1 + e ^ {-} z} $$

$$P(y=0|x;\theta )=\frac{1}{1+e^{-(wx+b)}}$$

$ H (x) = $ respect to the input x, the probability of the prediction result (parameter $ \ $ when theta) $ = P (y = 1 | x; \ theta) $

LR also be the same as SVM, a variable conversion kernel, sub-linear problem solving;

But LR easy to over-fitting, because LR VC dimension with variable linear growth;

SVM is not easy to over-fitting, because growth in the number of class SVM VC dimension with variables;