## Logistic Regression (Logistic Regression, LR)

1. What is the logistic regression;

2, the logistic regression to derive and process;

3, multi-logistic regression classification

4, logistic regression linear regression VS

5, logistic regression VS SVM

1. What makes logistic regression;

Called regression, is actually classified, by calculating the $P (y = 0 | x; \ theta)$ size to predict the classification categories, the predicted category is 0, 1, rather than probability, but the probability is calculated; $0 \ leq P (y = 0 | x; \ theta) \ leq 1$, a probability value itself does not have the concept of non-1, i.e., 0;

Two logistic regression: x is the input, y is the output;

$P (y = 0 | x) = \ frac {1} {1 + e ^ {-} z}$

$P(y=0|x)=\frac{1}{1+e^{-(\omega x+b)}}$

$P(y=1|x;\theta )+P(y=0|x;\theta )=1$

$z=\omega x+b$

$\frac{\partial z}{\partial w}=x$
$\frac{\partial z}{\partial b}=1$

logistic: logarithmic probability function

Probability: probability p / occurrence does not occur 1-p $\ frac {p} {1-p}$

Logarithmic probability: $logit (p) = \ log \ frac {p} {1-p} = \ omega x$

$\ Omega x + b = 0$ That decision boundary;

When p> 1-p, $\ frac {p} {1-p}> 1$, $\ log \ frac {p} {1-p}> 0$, i.e. $\ omega x> 0$;

When p <1-p, $\ frac {p} {1-p} <1$, $\ log \ frac {p} {1-p} <0$, i.e. $\ omega x <0$;

$z=\omega x+b$

When the time $0$ z \ geq, $g (z) \ geq 0.5$, $y = 1$

When $z <0$ time, $g (z) <0.5$, $y = 0$

2, the logistic regression to derive and process;

:( model parameter estimation using likelihood function to estimate model parameters)

Loss function: $\ prod_ {i = 1} ^ {N} p ^ {y_ {i}} (1-p) ^ {1-y_ {i}}$ (herein referred to as cross-entropy loss form)

Write log-likelihood function, find the maximum;

Into P (y = k | x; \ theta) classification;

Derivation:

Likelihood function: $L = p ^ {y_ {i}} (1-p) ^ {1-y_ {i}}$

Log-likelihood function: $L = p {y_ {i}} + (1-p) ({1-y_ {i}})$

From the maximum demand log- likelihood function, for the minimum converted to:

$L=-(p{y_{i}}+(1-p)({1-y_{i}}))$

$\frac{\partial L}{\partial p}=-\frac{y_{i}}{p}+\frac{1-y_{i}}{1-p}$

$p=\frac{1}{1+e^{-z}}$

$\frac{\partial p}{\partial z}=\frac{e^{-z}}{(1+e^{-z})^{2}}=p(1-p)$

$\frac{\partial z}{\partial w}=x$
$\frac{\partial z}{\partial b}=1$

$dw=(p-y_{i})x$

$db=(p-y_{i})$

3, multi-logistic regression classification

Method a: N corresponds to two-classification made:

1VS23,$h_{\theta }^{1}(x)$

2VS13,$h_{\theta }^{2}(x)$

3VS12;$h_{\theta }^{3}(x)$

Seeking maximum $h _ {\ theta} ^ {i} (x)$ corresponding category;

Method two: into the sigmoid function softmax

At this time, the cost function softmax regression algorithm is shown below (wherein $\text{sign}(expression\ is\ true)=1$):

$J(\theta)=-\sum_{i=1}^m\sum_{c=1}^{k}\text{sign}(y^{(i)}=c)\log p(y^{(i)}=c|x^{(i)},\theta)=-\sum_{i=1}^m\sum_{c=1}^{k}\text{sign}(y^{(i)}=c)\log\frac{e^{\theta_c^Tx{(i)}}}{\sum_{l=1}^ke^{\theta_l^Tx^{(i)}}}$

Obviously, the above formula is a generalization of logistic regression loss function.

We can lose logistic regression function of the form to read as follows:

$J(\theta)=-\sum_{i=1}^my^{(i)}\log h_{\theta}(x^{(i)})-(1-y^{(i)}(1-h_{\theta}x^{(i)}))=-\sum_{i=1}^m\sum_{c=0}^1\text{sign}(y^{(i)}=c)\log p(y^{(i)}=c|x^{(i)},\theta)$

Then use gradient descent find it.

4, logistic regression linear regression VS

Linear Regression:

$f(x_{i})=\omega x_{i}+b$，使得$y_{i}\approx f(x_{i})$

Seeking least-squares method

Generally linear regression error function squared difference (from maximum likelihood estimation),

Logistic regression but it is non-convex function, only the local optimal solution;

Logistic regression using cross entropy cost function (derived from the maximum likelihood estimation), because: convex function, there is a global optimal solution;

difference:

1, logistic regression classification, regression is a linear regression;

2, because of logistic regression variables are discrete, linear regression of the dependent variable is continuous;

Logistic regression is not linear regression plus activation function

$h_{\theta }(x)=g(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\theta _{3}x_{1}^{2}+\theta _{3}x_{2}^{2})$

Characterized by using a polynomial can be obtained more complex boundary, not just linear division;

Decision boundary is not a training set of attributes, but assumes itself $h {\ theta} (x)$ _ parameters and their properties;

Nonlinear model, but is essentially linear classification model;

Add sigmoid mapping on linear regression, the estimated $P (y = 1 | x)$ probability to classify;

$\ Omega x + b = 0$ as the decision boundary, into achieve linear;

E.g:

$\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}=0$

$\theta _{0}+\theta _{1}x_{1}^{2}+\theta _{2}\sqrt{x_{2}}=0$

Seemingly not linear, but only on the variables do change;

The $t_ {1} = x_ {1} ^ {2}, t_ {2} = \ sqrt {x_ {2}}$

$\theta _{0}+\theta _{1}t_{1}+\theta _{2}t_{2}=0$

5, logistic regression VS SVM

$$g (z) = \ frac {1} {1 + e ^ {-} z}$$

$$P(y=0|x;\theta )=\frac{1}{1+e^{-(wx+b)}}$$

$H (x) =$ respect to the input x, the probability of the prediction result (parameter $\$ when theta) $= P (y = 1 | x; \ theta)$

LR also be the same as SVM, a variable conversion kernel, sub-linear problem solving;

But LR easy to over-fitting, because LR VC dimension with variable linear growth;

SVM is not easy to over-fitting, because growth in the number of class SVM VC dimension with variables;

### Guess you like

Origin www.cnblogs.com/danniX/p/10720198.html
Recommended
Ranking
Daily