[Notes] logistic regression Logistic Regression

Logistic regression Logistic Regression


\[P(Y=1|x)=\frac{1}{1+e^{-(w\cdot{x}+b)}} \]

Parameter Estimation

Using maximum likelihood estimation

\[\begin{equation} \begin{aligned} L(w) &= \Pi_{i=1}^N\sigma(z)^{y_i}(1-\sigma(z))^{1-y_i} \\ &\Rightarrow^{取对数} \Sigma^{N}_{i=1} y_ilog\sigma(z)+(1-y_i)log(1-\sigma(z)) \\ &=\Sigma_{i=1}^Ny_ilog\frac{\sigma(z)}{1-\sigma(z)}+log(1-\sigma(z)) \\ &= \Sigma_{i=1}^Ny_iz+log(1-\sigma(z)) \\ &其中,z=w\cdot x,w=(x^{(1)},x^{(2)},...,x^{(k)},b) \end{aligned} \end{equation}\]

For \ (L (w) \) find the maximum value, to give \ (W \) of the estimated value.


  1. Logistic regression is a classification model, why is it called logical "return"?
    The probability of an event refers to the ratio between the probability of the occurrence probability of the event does not occur, the event log logit probability is expressed as \ (logit (p) = log \ frac {p} {1-p} \) . For purposes of logistic regression, \ (Logit (P) = W \ X CDOT} + B {\) , the output \ (Y = 1 \) is the probability by logarithmic input \ (X \) is represented by a linear function model, namely a logistic regression model. On the other hand, a logistic regression model is converted into the logarithmic chance of probability. [Perceptron using the threshold is classified as an interval; Logistic regression is converted to a probability]
  2. Differences and relations between logistic regression and linear regression?
    Difference: In the logistic regression, \ (Y \) dependent variable discrete value; a linear regression, \ (Y \) is a continuous value. That is, the logistic regression model for the classification, regression and linear regression model.
    • Both belong generalized linear model. Logistic regression assumptions as \ (P (y | x; \ theta ) \ SIM Bernoulli (\ Phi) \) ; solving linear regression using the least squares method, the assumption is \ (P (y | x; \ theta ) \ N SIM (\ MU, \ Sigma ^ 2) \) ;
    • Both the optimal parameters can be solved using a gradient descent method.

Generalized linear models (Generalized Linear Models)
set up conditions

  1. \ (P (y | x; \ theta) \ sim exponential family distribution \)
  2. \(h_\theta(x) = E[y|x;\theta]\)
  3. Parameters \ (\ ETA \) input \ (X \) is linearly related to the

Exponential family of distributions
\ (p (y; \ eta ) = b (y) exp (\ eta ^ TT (y) -a (\ eta)), where \ ETA is a natural parameter, T (y) is sufficient statistics \ )

  1. Logistic regression why cross-entropy rather than as a squared error loss function (MSE)?
    \ (\ frac {\ partial \ sigma (x)} {\ partial x} = \ sigma (x) (1- \ sigma (x)), when x = 0, the maximum value 0.25. \) when using square error as a loss function, a gradient value obtained will be small (gradient containing \ (\ FRAC {\ partial \ Sigma (X)} {\ partial X} \) ), a lack of rapid convergence error back-propagation; using cross entropy as a loss function, gradient-free \ (\ FRAC {\ partial \ Sigma (X)} {\ partial X} \) , can quickly find the optimal value.
  2. Why use logistic regression Sigmoid function?
    Due to the nature of the principle of maximum entropy, exponential family distribution is given maximum entropy distribution of some statistics. For example, the Bernoulli distribution is given only two values and expectations for the \ (\ phi \) of maximum entropy distribution. Therefore, according to the definition of generalized linear models, logistic regression model

\[\begin{equation} \begin{aligned} h_{\theta}(x) &= E[y|x;\theta] \\ &=\phi \\ &=\frac{1}{1+e^{-\eta}} \\ & = \frac{1}{1+e^{-w \cdot x}} \end{aligned} \end{equation} \]

The maximum entropy principle: learning probabilistic model, in all possible probability distribution model, maximum entropy model is the best model. Popular terms, the maximum entropy model while meeting the existing facts (constraints) under, in the absence of more information, the uncertainty considered part are equally likely.

  1. Why logistic regression objective function is convex?
    If it turns out the function of a single variable \ (f (x) \) is a convex function, it suffices to show \ (\ frac {\ partial ^ 2 {f (x)}} {\ partial (x) \ partial (x)} \ 0 geq \) . So the objective function for the logistic regression, the argument of a vector form, it is necessary to prove Hessian Hessian matrix of second partial derivatives of all of the composition can be a semi-positive definite matrix.

Convex function is defined \ (f (\ frac {x_1 + x_2} {2}) \ leq \ frac {f (x_1) + f (x_2)} {2}, i.e., the obtained local optimal extremely global optimum. \ )

Guess you like

Origin www.cnblogs.com/mrdragonma/p/12570268.html