1. What is the logistic regression;
2, the logistic regression to derive and process;
3, multi-logistic regression classification
4, logistic regression linear regression VS
5, logistic regression VS SVM
1. What makes logistic regression;
Called regression, is actually classified, by calculating the $ P (y = 0 | x; \ theta) $ size to predict the classification categories, the predicted category is 0, 1, rather than probability, but the probability is calculated; $ 0 \ leq P (y = 0 | x; \ theta) \ leq 1 $, a probability value itself does not have the concept of non-1, i.e., 0;
Two logistic regression: x is the input, y is the output;
$ P (y = 0 | x) = \ frac {1} {1 + e ^ {-} z} $
$P(y=0|x)=\frac{1}{1+e^{-(\omega x+b)}}$
$P(y=1|x;\theta )+P(y=0|x;\theta )=1$
$z=\omega x+b$
$\frac{\partial z}{\partial w}=x$
$\frac{\partial z}{\partial b}=1$
logistic: logarithmic probability function
Probability: probability p / occurrence does not occur 1-p $ \ frac {p} {1-p} $
Logarithmic probability: $ logit (p) = \ log \ frac {p} {1-p} = \ omega x $
$ \ Omega x + b = 0 $ That decision boundary;
When p> 1-p, $ \ frac {p} {1-p}> 1 $, $ \ log \ frac {p} {1-p}> 0 $, i.e. $ \ omega x> 0 $;
When p <1-p, $ \ frac {p} {1-p} <1 $, $ \ log \ frac {p} {1-p} <0 $, i.e. $ \ omega x <0 $;
$z=\omega x+b$
When the time $ 0 $ z \ geq, $ g (z) \ geq 0.5 $, $ y = 1 $
When $ z <0 $ time, $ g (z) <0.5 $, $ y = 0 $
2, the logistic regression to derive and process;
:( model parameter estimation using likelihood function to estimate model parameters)
Loss function: $ \ prod_ {i = 1} ^ {N} p ^ {y_ {i}} (1-p) ^ {1-y_ {i}} $ (herein referred to as cross-entropy loss form)
Write log-likelihood function, find the maximum;
Gradient descent, seeking W;
Into P (y = k | x; \ theta) classification;
Derivation:
Likelihood function: $ L = p ^ {y_ {i}} (1-p) ^ {1-y_ {i}} $
Log-likelihood function: $ L = p {y_ {i}} + (1-p) ({1-y_ {i}}) $
From the maximum demand log- likelihood function, for the minimum converted to:
$L=-(p{y_{i}}+(1-p)({1-y_{i}}))$
To seek by gradient descent:
$\frac{\partial L}{\partial p}=-\frac{y_{i}}{p}+\frac{1-y_{i}}{1-p}$
$p=\frac{1}{1+e^{-z}}$
$\frac{\partial p}{\partial z}=\frac{e^{-z}}{(1+e^{-z})^{2}}=p(1-p)$
$\frac{\partial z}{\partial w}=x$
$\frac{\partial z}{\partial b}=1$
$dw=(p-y_{i})x$
$db=(p-y_{i})$
3, multi-logistic regression classification
Method a: N corresponds to two-classification made:
1VS23,$h_{\theta }^{1}(x)$
2VS13,$h_{\theta }^{2}(x)$
3VS12;$h_{\theta }^{3}(x)$
Seeking maximum $ h _ {\ theta} ^ {i} (x) $ corresponding category;
Method two: into the sigmoid function softmax
At this time, the cost function softmax regression algorithm is shown below (wherein ):
Obviously, the above formula is a generalization of logistic regression loss function.
We can lose logistic regression function of the form to read as follows:
Then use gradient descent find it.
4, logistic regression linear regression VS
Linear Regression:
$f(x_{i})=\omega x_{i}+b$,使得$y_{i}\approx f(x_{i})$
Seeking least-squares method
Generally linear regression error function squared difference (from maximum likelihood estimation),
Logistic regression but it is non-convex function, only the local optimal solution;
Logistic regression using cross entropy cost function (derived from the maximum likelihood estimation), because: convex function, there is a global optimal solution;
difference:
1, logistic regression classification, regression is a linear regression;
2, because of logistic regression variables are discrete, linear regression of the dependent variable is continuous;
Logistic regression is not linear regression plus activation function
$h_{\theta }(x)=g(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\theta _{3}x_{1}^{2}+\theta _{3}x_{2}^{2})$
Characterized by using a polynomial can be obtained more complex boundary, not just linear division;
Decision boundary is not a training set of attributes, but assumes itself $ h {\ theta} (x) $ _ parameters and their properties;
Nonlinear model, but is essentially linear classification model;
Add sigmoid mapping on linear regression, the estimated $ P (y = 1 | x) $ probability to classify;
$ \ Omega x + b = 0 $ as the decision boundary, into achieve linear;
E.g:
$\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}=0$
$\theta _{0}+\theta _{1}x_{1}^{2}+\theta _{2}\sqrt{x_{2}}=0$
Seemingly not linear, but only on the variables do change;
The $ t_ {1} = x_ {1} ^ {2}, t_ {2} = \ sqrt {x_ {2}} $
$\theta _{0}+\theta _{1}t_{1}+\theta _{2}t_{2}=0$
5, logistic regression VS SVM
$$ g (z) = \ frac {1} {1 + e ^ {-} z} $$
$$P(y=0|x;\theta )=\frac{1}{1+e^{-(wx+b)}}$$
$ H (x) = $ respect to the input x, the probability of the prediction result (parameter $ \ $ when theta) $ = P (y = 1 | x; \ theta) $
LR also be the same as SVM, a variable conversion kernel, sub-linear problem solving;
But LR easy to over-fitting, because LR VC dimension with variable linear growth;
SVM is not easy to over-fitting, because growth in the number of class SVM VC dimension with variables;